Abstract 17106: Prediction of Recurrent Atherosclerotic Cardiovascular Disease Risk Using Machine Learning and Electronic Health Record Data

Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Ashish Sarraju ◽  
Andrew Ward ◽  
Sukyung Chung ◽  
Jiang Li ◽  
David Scheinker ◽  
...  

Introduction: Patients with atherosclerotic cardiovascular disease (ASCVD) have high risk for recurrent ASCVD events despite statin use. Pooled cohort equations (PCE) are used for ASCVD risk prediction in primary prevention but there are no validated models for recurrent risk prediction in secondary prevention. Machine learning (ML) demonstrates promise in developing novel risk prediction models using electronic health record (EHR) data. Methods: We included adults with prior ASCVD from EHR data from an outpatient Northern California system between January 1, 2009 and December 31, 2018 with at least 2 visits at least 1 year apart and 5 years of follow up. The outcome was a recurrent ASCVD event defined as the first myocardial infarction, stroke, or fatal coronary artery disease in the 5 year follow-up period. We trained ML models to predict recurrent ASCVD risk: random forests (RF), gradient boosted machines (GBM), extreme gradient boosted models (XGBoost), and logistic regression with a standard L 2 penalty (LR) and an L 1 penalty (Lasso). We evaluated performance of ML models and the PCE on a 20% held-out test cohort using the areas under the receiver operating characteristic curves (AUCs). Results: Our cohort consisted of 32,192 patients with ASCVD (Mean age 70 years, 46% women, 12% Asian and 6% Hispanic). Less than half (49%) were on guideline directed statins. XGBoost and GBM were the best performing models for recurrent ASCVD risk prediction, while the PCE performed poorly (Figure). The top 20 predictive variables for recurrent ASCVD risk included prior events (ischemic stroke, myocardial infarction), traditional risk factors (age, blood pressure, lipid levels) and socioeconomic factors (income, education). Conclusions: EHR-trained machine learning models facilitated recurrent ASCVD risk prediction in real-world secondary prevention patients. Machine learning models developed from large datasets may help bridge contemporary gaps in ASCVD risk prediction.

Circulation ◽  
2020 ◽  
Vol 141 (Suppl_1) ◽  
Author(s):  
WENJUN FAN

Background: The AHA/ACC published a pooled cohort 10-year atherosclerotic cardiovascular disease (ASCVD) risk calculator to estimate the probability of initial ASCVD events based on statistical modeling. Our current study aimed to develop machine-learning models with the identical predictors among T2DM patients from the Action to Control Cardiovascular Risk in Diabetes (ACCORD) trial and further compared their performance in predicting composite outcome of myocardial infarction, non-fatal stroke and cardiovascular death. Methods: The guideline risk calculator provided 9 predictors including baseline age, gender, race, SBP, antihypertensive medication use, total cholesterol, high-density lipoprotein cholesterol, current smoking status and diabetes mellitus status. We developed three ML ASCVD Risk Calculators based on Linear Model (LM), Supporting Vector Machine (SVM) and Random Forest (RF) algorithms using 10-year follow-up data from ACCORD with the same 9 predictors. T2DM patients with prior ASCVD or with invalid follow-up time were excluded in our analysis. Those who had not experienced the composite outcome by the end of year 10 would be labeled as censored. 5-fold stratified random split was applied as a cross-validation strategy. Results: A total of 6581 T2DM participants were included in our final sample, with a mean age of 62.9±5.9 years old (range 51-79 yr), 44.1% female and 60.8% white. Among those, 12.2% (n=802) had developed composite ASCVD during a median follow up of 9.1 years. The performance AHA/ACC 10-year Risk Calculator was modest with AUC=0.604. In contrast, ML models showed better performance from validation data with LM AUC=0.854, SVM AUC=0.848, and RF AUC=0.866 (Figure). Conclusion: The ML ASCVD Risk Calculator outperforms the AHA/ACC pooled 10-year ASCVD risk calculator in predicting the composite ASCVD outcomes among those with DM from ACCORD trial. Future studies need to validate ML algorithms in other cohorts and further explore other potential valuable predictors.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
I Korsakov ◽  
A Gusev ◽  
T Kuznetsova ◽  
D Gavrilov ◽  
R Novitskiy

Abstract Abstract Background Advances in precision medicine will require an increasingly individualized prognostic evaluation of patients in order to provide the patient with appropriate therapy. The traditional statistical methods of predictive modeling, such as SCORE, PROCAM, and Framingham, according to the European guidelines for the prevention of cardiovascular disease, not adapted for all patients and require significant human involvement in the selection of predictive variables, transformation and imputation of variables. In ROC-analysis for prediction of significant cardiovascular disease (CVD), the areas under the curve for Framingham: 0.62–0.72, for SCORE: 0.66–0.73 and for PROCAM: 0.60–0.69. To improve it, we apply for approaches to predict a CVD event rely on conventional risk factors by machine learning and deep learning models to 10-year CVD event prediction by using longitudinal electronic health record (EHR). Methods For machine learning, we applied logistic regression (LR) and recurrent neural networks with long short-term memory (LSTM) units as a deep learning algorithm. We extract from longitudinal EHR the following features: demographic, vital signs, diagnoses (ICD-10-cm: I21-I22.9: I61-I63.9) and medication. The problem in this step, that near 80 percent of clinical information in EHR is “unstructured” and contains errors and typos. Missing data are important for the correct training process using by deep learning & machine learning algorithm. The study cohort included patients between the ages of 21 to 75 with a dynamic observation window. In total, we got 31517 individuals in the dataset, but only 3652 individuals have all features or missing features values can be easy to impute. Among these 3652 individuals, 29.4% has a CVD, mean age 49.4 years, 68,2% female. Evaluation We randomly divided the dataset into a training and a test set with an 80/20 split. The LR was implemented with Python Scikit-Learn and the LSTM model was implemented with Keras using Tensorflow as the backend. Results We applied machine learning and deep learning models using the same features as traditional risk scale and longitudinal EHR features for CVD prediction, respectively. Machine learning model (LR) achieved an AUROC of 0.74–0.76 and deep learning (LSTM) 0.75–0.76. By using features from EHR logistic regression and deep learning models improved the AUROC to 0.78–0.79. Conclusion The machine learning models outperformed a traditional clinically-used predictive model for CVD risk prediction (i.e. SCORE, PROCAM, and Framingham equations). This approach was used to create a clinical decision support system (CDSS). It uses both traditional risk scales and models based on neural networks. Especially important is the fact that the system can calculate the risks of cardiovascular disease automatically and recalculate immediately after adding new information to the EHR. The results are delivered to the user's personal account.


2020 ◽  
Vol 3 (1) ◽  
Author(s):  
Andrew Ward ◽  
Ashish Sarraju ◽  
Sukyung Chung ◽  
Jiang Li ◽  
Robert Harrington ◽  
...  

Abstract The pooled cohort equations (PCE) predict atherosclerotic cardiovascular disease (ASCVD) risk in patients with characteristics within prespecified ranges and has uncertain performance among Asians or Hispanics. It is unknown if machine learning (ML) models can improve ASCVD risk prediction across broader diverse, real-world populations. We developed ML models for ASCVD risk prediction for multi-ethnic patients using an electronic health record (EHR) database from Northern California. Our cohort included patients aged 18 years or older with no prior CVD and not on statins at baseline (n = 262,923), stratified by PCE-eligible (n = 131,721) or PCE-ineligible patients based on missing or out-of-range variables. We trained ML models [logistic regression with L2 penalty and L1 lasso penalty, random forest, gradient boosting machine (GBM), extreme gradient boosting] and determined 5-year ASCVD risk prediction, including with and without incorporation of additional EHR variables, and in Asian and Hispanic subgroups. A total of 4309 patients had ASCVD events, with 2077 in PCE-ineligible patients. GBM performance in the full cohort, including PCE-ineligible patients (area under receiver-operating characteristic curve (AUC) 0.835, 95% confidence interval (CI): 0.825–0.846), was significantly better than that of the PCE in the PCE-eligible cohort (AUC 0.775, 95% CI: 0.755–0.794). Among patients aged 40–79, GBM performed similarly before (AUC 0.784, 95% CI: 0.759–0.808) and after (AUC 0.790, 95% CI: 0.765–0.814) incorporating additional EHR data. Overall, ML models achieved comparable or improved performance compared to the PCE while allowing risk discrimination in a larger group of patients including PCE-ineligible patients. EHR-trained ML models may help bridge important gaps in ASCVD risk prediction.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


Author(s):  
Nghia H Nguyen ◽  
Dominic Picetti ◽  
Parambir S Dulai ◽  
Vipul Jairath ◽  
William J Sandborn ◽  
...  

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.


Author(s):  
Chenxi Huang ◽  
Shu-Xia Li ◽  
César Caraballo ◽  
Frederick A. Masoudi ◽  
John S. Rumsfeld ◽  
...  

Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics. Methods and Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics. Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.


Sign in / Sign up

Export Citation Format

Share Document