P287Use of machine learning to determine stroke severity of patients diagnosed with stroke in claims data

2018 ◽  
Vol 39 (suppl_1) ◽  
Author(s):  
E Kogan ◽  
K Twyman ◽  
J Heap ◽  
D Milentijevic ◽  
J H Lin ◽  
...  
Stroke ◽  
2021 ◽  
Vol 52 (Suppl_1) ◽  
Author(s):  
Shima Shahjouei ◽  
Georgios K Tsivgoulis ◽  
Ghasem Farahmand ◽  
Eric Koza ◽  
Ashkan Mowla ◽  
...  

Objective and Design: We conducted a multinational observational study on features of consecutive acute ischemic stroke (AIS), intracranial hemorrhage (ICH), and cerebral venous or sinus thrombosis (CVST) among SARS-CoV-2 infected patients. Main Outcome Measures: We investigated the association of demographics, clinical data, geographical regions, and countries’ health expenditure among AIS patients with the risk of large vessel occlusion (LVO), stroke severity as measured by National Institute of Health stroke scale (NIHSS), and stroke subtype as measured by the TOAST criteria. Additionally, we applied unsupervised machine learning algorithms to uncover possible similarities among stroke patients. Results: Among the 136 tertiary centers of 32 countries who participated in this study, 71 centers from 17 countries had at least one eligible stroke patient. Out of 432 patients included, 323(74.8%) had AIS, 91(21.1%) ICH, and 18(4.2%) CVST. Among 23 patients with subarachnoid hemorrhage, 16(69.5%) had no evidence of aneurysm. A total of 183(42.4%) patients were women, 104(24.1%) patients were younger than 55 years, and 105(24.4%) patients had no identifiable vascular risk factors. Among 380 patients who had known interval onset of the SARS-CoV-2 and stroke, 144(37.8%) presented to the hospital with chief complaints of stroke-related symptoms, with asymptomatic or undiagnosed SARS-CoV-2 infection. Among AIS patients 44.5% had LVO; 10% had small artery occlusion according to the TOAST criteria. We observed a lower median NIHSS (8[3-17], versus 11[5-17]; p=0.02) and higher rate of mechanical thrombectomy (12.4% versus 2%; p<0.001) in countries with middle to high-health expenditure when compared to countries with lower health expenditure. The unsupervised machine learning identified 4 subgroups, with a relatively large group with no or limited comorbidities. Conclusions and Relevance: We observed a relatively high number of young, and asymptomatic SARS-CoV-2 infections among stroke patients. Traditional vascular risk factors were absent among a relatively large cohort of patients. The stroke severity was lower and rate of mechanical thrombectomy was higher among countries with middle to high-health expenditure.


Author(s):  
Emily Kogan ◽  
Kathryn Twyman ◽  
Jesse Heap ◽  
Dejan Milentijevic ◽  
Jennifer H. Lin ◽  
...  

Abstract Background Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. Methods NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. Results Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. Conclusions Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.


2021 ◽  
Author(s):  
Kenneth John Locey ◽  
Thomas A Webb ◽  
Sana Farooqui ◽  
Bala Hota

Background: US hospital safety is routinely measured via patient safety indicators (PSIs). Receiving a score for most PSIs requires a minimum number of qualifying cases, which are partly determined by whether the associated diagnosis-related group (DRG) was surgical and whether the surgery was elective. While these criteria can exempt hospitals from PSIs, it remains to be seen whether exemption is driven by low volume, small numbers of DRGs, or perhaps, policies that determine how procedures are classified as elective. Methods: Using Medicare inpatient claims data from 4,069 hospitals between 2015 and 2017, we examined how percentages of elective procedures relate to numbers of surgical claims and surgical DRGs. We used a combination of quantile regression and machine learning based anomaly detection to characterize these relationships and identify outliers. We then used a set of machine learning algorithms to test whether outliers were explained by the DRGs they reported. Results: Average percentages of elective procedures generally decreased from 100% to 60% in relation to the number of surgical claims and the number of DRGs among them. Some providers with high volumes of claims had anomalously low percentages of elective procedures (5% to 40%). These low elective outliers were not explained by the particular surgical DRGs among their claims. However, among hospitals exempted from PSIs, those with the greatest volume of claims were always low elective outliers. Conclusion: Some hospitals with relatively high numbers of surgical claims may have classified procedures as non-elective in a way that ultimately exempted them from certain PSIs.


2020 ◽  
Author(s):  
Shima Shahjouei ◽  
Georgios Tsivgoulis ◽  
Ghasem Farahmand ◽  
Eric Koza ◽  
Ashkhan Mowla ◽  
...  

Background: Stroke is reported as a consequence of SARS-CoV-2 infection. However, there is a lack of regarding comprehensive stroke phenotype and characteristics Methods: We conducted a multinational observational study on features of consecutive acute ischemic stroke (AIS), intracranial hemorrhage (ICH), and cerebral venous or sinus thrombosis (CVST) among SARS-CoV-2 infected patients. We further investigated the association of demographics, clinical data, geographical regions, and countrie's health expenditure among AIS patients with the risk of large vessel occlusion (LVO), stroke severity as measured by National Institute of Health stroke scale (NIHSS), and stroke subtype as measured by the TOAST criteria. Additionally, we applied unsupervised machine learning algorithms to uncover possible similarities among stroke patients. Results: Among the 136 tertiary centers of 32 countries who participated in this study, 71 centers from 17 countries had at least one eligible stroke patient. Out of 432 patients included, 323(74.8%) had AIS, 91(21.1%) ICH, and 18(4.2%) CVST. Among 23 patients with subarachnoid hemorrhage, 16(69.5%) had no evidence of aneurysm. A total of 183(42.4%) patients were women, 104(24.1%) patients were younger than 55 years, and 105(24.4%) patients had no identifiable vascular risk factors. Among 380 patients who had known interval onset of the SARS-CoV-2 and stroke, 144(37.8%) presented to the hospital with chief complaints of stroke-related symptoms, with asymptomatic or undiagnosed SARS-CoV-2 infection. Among AIS patients 44.5% had LVO; 10% had small artery occlusion according to the TOAST criteria. We observed a lower median NIHSS (8[3-17], versus 11[5-17]; p=0.02) and higher rate of mechanical thrombectomy (12.4% versus 2%; p<0.001) in countries with middle to high-health expenditure when compared to countries with lower health expenditure. The unsupervised machine learning identified 4 subgroups, with a relatively large group with no or limited comorbidities. Conclusions: We observed a relatively high number of young, and asymptomatic SARS-CoV-2 infections among stroke patients. Traditional vascular risk factors were absent among a relatively large cohort of patients. Among hospitalized patients, the stroke severity was lower and rate of mechanical thrombectomy was higher among countries with middle to high-health expenditure.


Stroke ◽  
2017 ◽  
Vol 48 (suppl_1) ◽  
Author(s):  
Satoru Kamitani ◽  
Kunihiro Nishimura ◽  
Akiko Kada ◽  
Tetsurou Sayama ◽  
Kouichi Arimura ◽  
...  

Introduction: Reports on hospital-specific, risk-standardized outcomes using claims data on acute ischemic stroke are increasing. However, these reports sometimes fail to account for stroke severity. Hypothesis: Hospital-specific, risk-adjusted mortality rating without accounting for stroke severity are altered after including initial severity for ischemic strokes. Methods: The health insurance claims data known as the Japanese Diagnosis Procedure Combination/Per Diem Payment Systems between April 1, 2013 and May 31, 2014 was obtained from 332 certified training institutions in Japan. The hospital-specific, risk-adjusted 30-day mortality rate was calculated using a hierarchical logistic regression model. We developed two models, with and without initial levels of consciousness (LOC), and compared them to assess the impact of stroke severities on hospital-specific mortalities. The hospital-specific mortalities with and without LOC were ranked and groped into 3 categories (top 20%, middle 60%, and bottom 20%), and then compared across the two models. We used an integrated discrimination improvement (IDI) index to measure how the model with LOC reclassified patients compared with the model without LOC. Patients with deep comas were excluded from the analyses. Results: We analyzed 64,569 acute ischemic stroke patients. Crude 30-day mortality was 3.9% , the mean age was 74.1±1.3 years, 41.2% were women, 70.8% had hypertension, 29.2% had diabetes mellitus, 79.9% had a Charlson comorbidity index score greater than 5, 3.7% had severe LOC (coma/semi-coma) and 8.1% had modestly impaired LOC. Among hospitals ranked in the top 20% of performers without LOC, 26.9% were ranked in the middle 60% when LOC was adjusted. Among the bottom 20% of performers without LOC, 21.2% were ranked in the middle 60% when LOC was adjusted. The hospital-specific, risk-adjusted 30-days mortality model with LOC had a significantly better IDI index score than the model without LOC (IDI, 0.09; P<0.001). Conclusions: Adding the metric of stroke severity to a hospital-specific, risk-adjusted 30-day mortality model based on claims data was associated with model improvement and changes of mortality-based performance rankings.


Stroke ◽  
2017 ◽  
Vol 48 (suppl_1) ◽  
Author(s):  
Charles Esenwa ◽  
Jorge Luna ◽  
Benjamin Kummer ◽  
Hojjat Salmasian ◽  
Hooman Kamel ◽  
...  

Introduction: Stroke research using widely available institutional, state-wide and national retrospective data is dependent on accurate identification of stroke subtypes using claims data. Despite the abundance of such data and the advances in clinical informatics, there is limited published data on the application of machine learning models to improve previously reported administrative stroke identification algorithms. Hypothesis: We hypothesized that machine learning models can be applied to claims data coded using the International Classification of Disease, version 9 (ICD-9), to accuracy identify patients with ischemic stroke (IS), intracerebral hemorrhage (ICH), and subarachnoid hemorrhage (SAH), and these models would outperform previously published algorithms in our patient cohort. Methods: We developed a gold standard list of 427 stroke patients continuously admitted to our institution from 1/1/2015 to 9/30/2015 using an internal stroke database and applied 75% of it to train and 25% to test two machine learning models: one using classification and regression tree (CART) and another using regularized logistic regression. There were 2,241 negative controls. We further applied a previously reported stroke detection algorithm, by Tirschwell and Longstreth, to our cohort for comparison. Results: The CART model had a κ of 0.72, 0.82, 0.59; sensitivity of 95%, 99%, 99%; and a specificity of 88%, 78%, 75%; for IS, ICH and SAH respectively. The regularized logistic regression model had a κ of 0.73, 0.80, 0.59; sensitivity of 95%, 99%, 99%, and a specificity of 89%, 78%, 75%; for IS, ICH and SAH respectively. The previously reported algorithm by Tirschwell et al, had a κ of 0.71,0.56, 0.64; sensitivity of 98%, 99%, 99%; and a specificity of 64%, 52%, 50%; for IS, ICH and SAH. Conclusion: Compared with the previously reported ICD 9 based detection algorithm, the machine learning models had a higher κ for diagnosis of IS and ICH, similar sensitivity for all subtypes, and higher specificity for all stroke subtypes in our cohort. Applying machine learning models to identify stroke subtypes from administrative data sets, can lead to highly accurate models of stroke subtype identification for health services researchers.


Stroke ◽  
2017 ◽  
Vol 48 (suppl_1) ◽  
Author(s):  
Charles Esenwa ◽  
Jorge Luna ◽  
Benjamin Kummer ◽  
Hojjat Salmasian ◽  
David Vawdrey ◽  
...  

Introduction: Retrospective identification of patients hospitalized with new diagnosis of acute ischemic stroke is important for administrative quality assurance, post-discharge clinical management, and stroke research. The benefit of using administrative claims data is its widespread availability, but the disadvantage is in the inability to accurately and consistently identify the clinical diagnosis of interest. Hypothesis: We hypothesized that decision tree and logistic regression models could be applied to administrative claims data coded using International Classification of Diseases, version 10 (ICD-10) to create algorithms that could accurately identify patients with acute ischemic stroke. Methods: We used hospital records from our institution to develop a gold standard list of 243 patients, continuously hospitalized with a new diagnosis of stroke from 10/1/2015 to 3/31/2016. We used 1,393 neurological patients without a diagnosis of stroke as negative controls. This list was used to train and test two machine learning methods of diagnosis and procedure codes analysis, for the purpose of ischemic stroke identification: one using classification and regression tree (CART) and another using regularized logistic regression. We trained the models using 75% of the data and performed the evaluation using the remaining 25%. Results: The CART model had a κ=0.78, sensitivity of 96%, specificity of 90%, and a positive predictive value of 99%. The regularized logistic regression model had a κ=0.73, sensitivity of 97%, specificity of 81%, and a positive predictive value of 98%. Conclusion: Both the decision tree and logistic regression machine based learning models showed very high accuracy in identifying patients with a new diagnosis of ischemic stroke, using ICD-10 code claims data, when compared to our gold standard. Applying these machine learning models to identify patients with ischemic stroke has widespread applications, especially in this period where national billing data has transitioned from ICD-9 to ICD-10 codes.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Elisabetta Patorno ◽  
Sebastian Schneeweiss ◽  
Ajinkya Pawar ◽  
Helen Mogun ◽  
Lee Schwamm

Background: Non-interventional large-scale research on patients with stroke requires the use of data sources ensuring access to large populations with clinically detailed and longitudinally available real-world healthcare information. We linked the Paul Coverdell National Acute Stroke Program registry (PCNASP) to commercial longitudinal claims data to assess long-term medication adherence post discharge. Methods: All ischemic stroke (IS) admissions in PCNASP between 2008-2015 were considered for linkage to longitudinal patient claims records from a commercial health insurer using a probabilistic algorithm. We assessed the linkage quality via the percentage of unique records among the linked subset, evaluated the representativeness of the linked population via standardized differences (SD), and described medical history, stroke severity and disability, and patterns of medication use before and after the stroke hospitalization among linked patients. Results: The linkage produced uniqueness equal to 99.1%. Overall, we linked 5,644 out of 104,540 patients with an IS hospitalization in claims data. Linked patients were similar to unlinked except for mean age (69.7 vs 72.5 yr, SD 0.23) and % home discharge (59.8 vs. 52.2, SD 0.14) with mild strokes (median NIHSS 3). Medication information from the PCNASP registry often differed from claims-based out-of-hospital drug utilization patterns, particularly after discharge, with prescriptions at discharge largely overestimating the real-world use of medications as measured by filled prescriptions. (Table) Conclusions: In a large cohort of hospitalized IS patients, high-quality probabilistic linkage between the PCNASP stroke registry and commercial claims data is feasible. Differences between predicted and actual post discharge medication utilization highlight the challenges of assuming long-term medication adherence based on discharge prescriptions. Further research is warranted.


Sign in / Sign up

Export Citation Format

Share Document