Abstract P280: Revisiting CVD Risk Prediction Using Machine Learning Approaches: A Case Study

Circulation ◽  
2020 ◽  
Vol 141 (Suppl_1) ◽  
Author(s):  
Hesam Dashti ◽  
Yanyan Liu ◽  
Robert J Glynn ◽  
Paul M Ridker ◽  
Samia Mora ◽  
...  

Introduction: Applications of machine learning (ML) methods have been demonstrated by the recent FDA approval of new ML-based biomedical image processing methods. In this study, we examine applications of ML, specifically artificial neural networks (ANN), for predicting risk of cardiovascular (CV) events. Hypothesis: We hypothesized that using the same CV risk factors, ML-based CV prediction models can improve the performance of current predictive models. Methods: Justification for the Use of Statins in Prevention: An Intervention Trial Evaluating Rosuvastatin (JUPITER; NCT00239681) is a multi-ethnic trial that randomized non-diabetic participants with LDL-C<130 mg/dL and hsCRP≥2 mg/L to rosuvastatin versus placebo. We restricted the analysis to white and black participants allocated to the placebo arm, and estimated the race- and sex-specific Pooled Cohorts Equations (PCE) 5-year risk score using race, sex, age, HDL-C, total cholesterol, systolic BP, antihypertensive medications, and smoking. A total of 218 incident CV cases occurred (maximum follow-up 5 years). For every participant in the case group, we randomly selected 4 controls from the placebo arm after stratifying for the baseline risk factors (Table 1). The risk factors from a total of n=1,090 participants were used to train and test the ANN model. We used 80% of the participants (n=872) for designing the network and left out 20% of the data (n=218) for testing the predictive model. We used the TensorFlow software to design, train, and evaluate the ANN model. Results: We compared the performances of the ANN and the PCE score on the 218 test subjects (Figure 1). The high AUC of the neural network (0.85; 95% CI 0.78-0.91) on this dataset suggests advantages of machine learning methods compared to the current methods. Conclusions: This result demonstrates the potential of machine learning methods for enhancing and improving the current techniques used in cardiovascular risk prediction and should be evaluated in other cohorts.

2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 601
Author(s):  
Nelson K. Dumakor-Dupey ◽  
Sampurna Arya ◽  
Ankit Jha

Rock fragmentation in mining and construction industries is widely achieved using drilling and blasting technique. The technique remains the most effective and efficient means of breaking down rock mass into smaller pieces. However, apart from its intended purpose of rock breakage, throw, and heave, blasting operations generate adverse impacts, such as ground vibration, airblast, flyrock, fumes, and noise, that have significant operational and environmental implications on mining activities. Consequently, blast impact studies are conducted to determine an optimum blast design that can maximize the desirable impacts and minimize the undesirable ones. To achieve this objective, several blast impact estimation empirical models have been developed. However, despite being the industry benchmark, empirical model results are based on a limited number of factors affecting the outcomes of a blast. As a result, modern-day researchers are employing machine learning (ML) techniques for blast impact prediction. The ML approach can incorporate several factors affecting the outcomes of a blast, and therefore, it is preferred over empirical and other statistical methods. This paper reviews the various blast impacts and their prediction models with a focus on empirical and machine learning methods. The details of the prediction methods for various blast impacts—including their applications, advantages, and limitations—are discussed. The literature reveals that the machine learning methods are better predictors compared to the empirical models. However, we observed that presently these ML models are mainly applied in academic research.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Matthew W Segar ◽  
Byron Jaeger ◽  
Kershaw V Patel ◽  
Vijay Nambi ◽  
Chiadi E Ndumele ◽  
...  

Introduction: Heart failure (HF) risk and the underlying biological risk factors vary by race. Machine learning (ML) may improve race-specific HF risk prediction but this has not been fully evaluated. Methods: The study included participants from 4 cohorts (ARIC, DHS, JHS, and MESA) aged > 40 years, free of baseline HF, and with adjudicated HF event follow-up. Black adults from JHS and white adults from ARIC were used to derive race-specific ML models to predict 10-year HF risk. The ML models were externally validated in subgroups of black and white adults from ARIC (excluding JHS participants) and pooled MESA/DHS cohorts and compared to prior established HF risk scores developed in ARIC and MESA. Harrell’s C-index and Greenwood-Nam-D’Agostino chi-square were used to assess discrimination and calibration, respectively. Results: In the derivation cohorts, 288 of 4141 (7.0%) black and 391 of 8242 (4.7%) white adults developed HF over 10 years. The ML models had excellent discrimination in both black and white participants (C-indices = 0.88 and 0.89). In the external validation cohorts for black participants from ARIC (excluding JHS, N = 1072) and MESA/DHS pooled cohorts (N = 2821), 131 (12.2%) and 115 (4.1%) developed HF. The ML model had adequate calibration and demonstrated superior discrimination compared to established HF risk models (Fig A). A consistent pattern was also observed in the external validation cohorts of white participants from the MESA/DHS pooled cohorts (N=3236; 100 [3.1%] HF events) (Fig A). The most important predictors of HF in both races were NP levels. Cardiac biomarkers and glycemic parameters were most important among blacks while LV hypertrophy and prevalent CVD and traditional CV risk factors were the strongest predictors among whites (Fig B). Conclusions: Race-specific and ML-based HF risk models that integrate clinical, laboratory, and biomarker data demonstrated superior performance when compared to traditional risk prediction models.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


2021 ◽  
Author(s):  
Cathy C. Westhues ◽  
Henner Simianer ◽  
Timothy M. Beissinger

We introduce the R-package learnMET, developed as a flexible framework to enable a collection of analyses on multi-environment trial (MET) breeding data with machine learning-based models. learnMET allows the combination of genomic information with environmental data such as climate and/or soil characteristics. Notably, the package offers the possibility of incorporating weather data from field weather stations, or can retrieve global meteorological datasets from a NASA database. Daily weather data can be aggregated in daily windows based on naive (for instance, daily windows with a fixed number of days) or phenological approaches. Different machine learning methods for genomic prediction are implemented, including gradient boosted trees, random forests, stacked ensemble models, and multi-layer perceptrons. These prediction models can be evaluated via a collection of cross-validation schemes that mimic typical scenarios encountered by plant breeders working with MET experimental data in a user-friendly way. The package is fully open source and accessible on GitHub.


Author(s):  
Matthew W. Segar ◽  
Byron C. Jaeger ◽  
Kershaw V. Patel ◽  
Vijay Nambi ◽  
Chiadi E. Ndumele ◽  
...  

Background: Heart failure (HF) risk and the underlying risk factors vary by race. Traditional models for HF risk prediction treat race as a covariate in risk prediction and do not account for significant parameters such as cardiac biomarkers. Machine learning (ML) may offer advantages over traditional modeling techniques to develop race-specific HF risk prediction models and elucidate important contributors of HF development across races. Methods: We performed a retrospective analysis of four large, community cohort studies (ARIC, DHS, JHS, and MESA) with adjudicated HF events. Participants were aged >40 years and free of HF at baseline. Race-specific ML models for HF risk prediction were developed in the JHS cohort (for Black race-specific model) and White adults from ARIC (for White rate-specific model). The models included 39 candidate variables across demographic, anthropometric, medical history, laboratory, and electrocardiographic domains. The ML models were externally validated and compared with prior established traditional and non-race specific ML models in race-specific subgroups of the pooled MESA/DHS cohort and Black participants of ARIC. Harrell's C-index and Greenwood-Nam-D'Agostino chi-square tests were used to assess discrimination and calibration, respectively. Results: The ML models had excellent discrimination in the derivation cohorts for Black (N=4,141 in JHS, C-index=0.88) and White (N=7,858 in ARIC, C-index=0.89) participants. In the external validation cohorts, the race-specific ML model demonstrated adequate calibration and superior discrimination (C-indices=0.80-0.83 [for Black individuals] and 0.82 [for White individuals]) compared with established HF risk models or with non-race specific ML models derived using race as a covariate. Among the risk factors, natriuretic peptide levels were the most important predictor of HF risk across both races, followed by troponin levels in Black and EKG-based Cornell voltage in White individuals. Other key predictors of HF risk among Black individuals were glycemic parameters and socioeconomic factors. In contrast, prevalent cardiovascular (CV) disease and traditional CV risk factors were stronger predictors of HF risk in White adults. Conclusions: Race-specific and ML-based HF risk models that integrate clinical, laboratory, and biomarker data demonstrated superior performance when compared with traditional HF risk and non-race specific ML models. This approach identifies distinct race-specific contributors of HF.


Sign in / Sign up

Export Citation Format

Share Document