Abstract 15999: Machine Learning to Identify Patients at High Risk for Peripheral Arterial Disease From Electronic Health Record Data

Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Mark Sonderman ◽  
Eric Farber-Eger ◽  
Aaron W Aday ◽  
Matthew S Freiberg ◽  
Joshua A Beckman ◽  
...  

Introduction: Peripheral arterial disease (PAD) is a common and underdiagnosed disease associated with significant morbidity and increased risk of major adverse cardiovascular events. Targeted screening of individuals at high risk for PAD could facilitate early diagnosis and allow for prompt initiation of interventions aimed at reducing cardiovascular and limb events. However, no widely accepted PAD risk stratification tools exist. Hypothesis: We hypothesized that machine learning algorithms can identify patients at high risk for PAD, defined by ankle-brachial index (ABI) <0.9, from electronic health record (EHR) data. Methods: Using data from the Vanderbilt University Medical Center EHR, ABIs were extracted for 8,093 patients not previously diagnosed with PAD at the time of initial testing. A total of 76 patient characteristics, including demographics, vital signs, lab values, diagnoses, and medications were analyzed using both a random forest and least absolute shrinkage and selection operator (LASSO) regression to identify features most predictive of ABI <0.9. The most significant features were used to build a logistic regression based predictor that was validated in a separate group of individuals with ABI data. Results: The machine learning models identified several features independently correlated with PAD (age, BMI, SBP, DBP, pulse pressure, anti-hypertensive medication, diabetes medication, smoking, and statin use). The test statistic produced by the logistic regression model was correlated with PAD status in our validation set. At a chosen threshold, the specificity was 0.92 and the positive predictive value was 0.73 in this high-risk population. Conclusions: Machine learning can be applied to build unbiased models that identify individuals at risk for PAD using easily accessible information from the EHR. This model can be implemented either through a high-risk flag within the medical record or an online calculator available to clinicians.

Author(s):  
Jeffrey G Klann ◽  
Griffin M Weber ◽  
Hossein Estiri ◽  
Bertrand Moal ◽  
Paul Avillach ◽  
...  

Abstract Introduction The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data. Objective We sought to develop and validate a computable phenotype for COVID-19 severity. Methods Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions. Conclusion We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.


Author(s):  
Emily Kogan ◽  
Kathryn Twyman ◽  
Jesse Heap ◽  
Dejan Milentijevic ◽  
Jennifer H. Lin ◽  
...  

Abstract Background Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. Methods NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. Results Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. Conclusions Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.


2019 ◽  
Vol 6 (10) ◽  
pp. e688-e695 ◽  
Author(s):  
Julia L Marcus ◽  
Leo B Hurley ◽  
Douglas S Krakower ◽  
Stacey Alexeeff ◽  
Michael J Silverberg ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document