Performance Metrics for the Comparative Analysis of Clinical Risk Prediction Models Employing Machine Learning

Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics. Methods and Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics. Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.

Download Full-text

Machine Learning-based Prediction Models for Diagnosis and Prognosis in Inflammatory Bowel Diseases: A Systematic Review

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab155 ◽

2021 ◽

Author(s):

Nghia H Nguyen ◽

Dominic Picetti ◽

Parambir S Dulai ◽

Vipul Jairath ◽

William J Sandborn ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Risk Prediction ◽

Statistical Models ◽

Prediction Models ◽

Risk Of Bias ◽

Learning Models ◽

Bowel Diseases ◽

Inflammatory Bowel ◽

Machine Learning Models

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.

Download Full-text

Development and validation of a prognostic COVID-19 severity assessment (COSA) score and machine learning models for patient triage at a tertiary hospital

Journal of Translational Medicine ◽

10.1186/s12967-021-02720-w ◽

2021 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Verena Schöning ◽

Evangelia Liakoni ◽

Christine Baumgartner ◽

Aristomenis K. Exadaktylos ◽

Wolf E. Hautz ◽

...

Keyword(s):

Machine Learning ◽

Risk Stratification ◽

Clinical Outcomes ◽

Predictive Value ◽

Validation Cohort ◽

Tertiary Hospital ◽

Learning Models ◽

Clinical Risk ◽

Machine Learning Models ◽

Risk Stratification Score

Abstract Background Clinical risk scores and machine learning models based on routine laboratory values could assist in automated early identification of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) patients at risk for severe clinical outcomes. They can guide patient triage, inform allocation of health care resources, and contribute to the improvement of clinical outcomes. Methods In- and out-patients tested positive for SARS-CoV-2 at the Insel Hospital Group Bern, Switzerland, between February 1st and August 31st (‘first wave’, n = 198) and September 1st through November 16th 2020 (‘second wave’, n = 459) were used as training and prospective validation cohort, respectively. A clinical risk stratification score and machine learning (ML) models were developed using demographic data, medical history, and laboratory values taken up to 3 days before, or 1 day after, positive testing to predict severe outcomes of hospitalization (a composite endpoint of admission to intensive care, or death from any cause). Test accuracy was assessed using the area under the receiver operating characteristic curve (AUROC). Results Sex, C-reactive protein, sodium, hemoglobin, glomerular filtration rate, glucose, and leucocytes around the time of first positive testing (− 3 to + 1 days) were the most predictive parameters. AUROC of the risk stratification score on training data (AUROC = 0.94, positive predictive value (PPV) = 0.97, negative predictive value (NPV) = 0.80) were comparable to the prospective validation cohort (AUROC = 0.85, PPV = 0.91, NPV = 0.81). The most successful ML algorithm with respect to AUROC was support vector machines (median = 0.96, interquartile range = 0.85–0.99, PPV = 0.90, NPV = 0.58). Conclusion With a small set of easily obtainable parameters, both the clinical risk stratification score and the ML models were predictive for severe outcomes at our tertiary hospital center, and performed well in prospective validation.

Download Full-text

Ensemble machine learning models for aviation incident risk prediction

Decision Support Systems ◽

10.1016/j.dss.2018.10.009 ◽

2019 ◽

Vol 116 ◽

pp. 48-63 ◽

Cited By ~ 21

Author(s):

Xiaoge Zhang ◽

Sankaran Mahadevan

Keyword(s):

Machine Learning ◽

Risk Prediction ◽

Learning Models ◽

Ensemble Machine Learning ◽

Machine Learning Models

Download Full-text

Machine learning models predicting returns: why most popular performance metrics are misleading and proposal for an efficient metric

SSRN Electronic Journal ◽

10.2139/ssrn.3927058 ◽

2021 ◽

Author(s):

Jean Dessain

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Learning Models ◽

Popular Performance ◽

Machine Learning Models

Download Full-text

Abstract P4-08-28: Clinical risk prediction models for breast cancer: A review of models developed between 2010 and 2018

10.1158/1538-7445.sabcs18-p4-08-28 ◽

2019 ◽

Author(s):

S Siesling ◽

T Hueting ◽

B Tip ◽

R Mentink ◽

E Koffijberg

Keyword(s):

Breast Cancer ◽

Risk Prediction ◽

Prediction Models ◽

Risk Prediction Models ◽

Clinical Risk

Download Full-text

Development of Combined Heavy Rain Damage Prediction Models with Machine Learning

Water ◽

10.3390/w11122516 ◽

2019 ◽

Vol 11 (12) ◽

pp. 2516 ◽

Cited By ~ 1

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jungwook Kim ◽

Hung Soo Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Prediction Model ◽

Prediction Models ◽

Predictive Performance ◽

Heavy Rain ◽

Learning Models ◽

Damage Prediction ◽

Natural Disaster Management ◽

Machine Learning Models

Adequate forecasting and preparation for heavy rain can minimize life and property damage. Some studies have been conducted on the heavy rain damage prediction model (HDPM), however, most of their models are limited to the linear regression model that simply explains the linear relation between rainfall data and damage. This study develops the combined heavy rain damage prediction model (CHDPM) where the residual prediction model (RPM) is added to the HDPM. The predictive performance of the CHDPM is analyzed to be 4–14% higher than that of HDPM. Through this, we confirmed that the predictive performance of the model is improved by combining the RPM of the machine learning models to complement the linearity of the HDPM. The results of this study can be used as basic data beneficial for natural disaster management.

Download Full-text

Storm-Based Probabilistic Hail Forecasting with Machine Learning Applied to Convection-Allowing Ensembles

Weather and Forecasting ◽

10.1175/waf-d-17-0010.1 ◽

2017 ◽

Vol 32 (5) ◽

pp. 1819-1840 ◽

Cited By ~ 48

Author(s):

David John Gagne ◽

Amy McGovern ◽

Sue Ellen Haupt ◽

Ryan A. Sobash ◽

John K. Williams ◽

...

Keyword(s):

Machine Learning ◽

Size Distribution ◽

Prediction Models ◽

Weather Prediction ◽

Radar Data ◽

Object Identification ◽

Atmospheric Conditions ◽

Learning Models ◽

Probabilistic Machine Learning ◽

Machine Learning Models

Abstract Forecasting severe hail accurately requires predicting how well atmospheric conditions support the development of thunderstorms, the growth of large hail, and the minimal loss of hail mass to melting before reaching the surface. Existing hail forecasting techniques incorporate information about these processes from proximity soundings and numerical weather prediction models, but they make many simplifying assumptions, are sensitive to differences in numerical model configuration, and are often not calibrated to observations. In this paper a storm-based probabilistic machine learning hail forecasting method is developed to overcome the deficiencies of existing methods. An object identification and tracking algorithm locates potential hailstorms in convection-allowing model output and gridded radar data. Forecast storms are matched with observed storms to determine hail occurrence and the parameters of the radar-estimated hail size distribution. The database of forecast storms contains information about storm properties and the conditions of the prestorm environment. Machine learning models are used to synthesize that information to predict the probability of a storm producing hail and the radar-estimated hail size distribution parameters for each forecast storm. Forecasts from the machine learning models are produced using two convection-allowing ensemble systems and the results are compared to other hail forecasting methods. The machine learning forecasts have a higher critical success index (CSI) at most probability thresholds and greater reliability for predicting both severe and significant hail.

Download Full-text

EFFECTIVE COMMUNICATION OF MACHINE LEARNING MODELS IN CLINICAL DECISION SUPPORT TOOLS FOR PAH RISK PREDICTION

CHEST Journal ◽

10.1016/j.chest.2021.07.1277 ◽

2021 ◽

Vol 160 (4) ◽

pp. A1396-A1397

Author(s):

Raymond Benza ◽

Manreet Kanwar ◽

James Antaki ◽

Aditi Dhabalia ◽

Mia Manavalan ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Risk Prediction ◽

Clinical Decision Support ◽

Clinical Decision ◽

Decision Support Tools ◽

Learning Models ◽

Support Tools ◽

Clinical Decision Support Tools ◽

Machine Learning Models

Download Full-text

Epidemic Models for Personalised COVID-19 Isolation and Exit Policies Using Clinical Risk Predictions

10.1101/2020.04.29.20084707 ◽

2020 ◽

Cited By ~ 2

Author(s):

Theodoros Evgeniou ◽

Mathilde Fekom ◽

Anton Ovchinnikov ◽

Raphael Porcher ◽

Camille Pouchol ◽

...

Keyword(s):

Risk Prediction ◽

Prediction Models ◽

Epidemic Models ◽

Sensitivity Analyses ◽

Model Discrimination ◽

Risk Models ◽

Risk Prediction Models ◽

Seir Model ◽

Clinical Risk ◽

Time Sensitivity

Background: In early May 2020, following social distancing measures due to COVID-19, governments consider relaxing lock-down. We combined individual clinical risk predictions with epidemic modelling to examine simulations of risk based differential isolation and exit policies. Methods: We extended a standard susceptible-exposed-infected-removed (SEIR) model to account for personalised predictions of severity, defined by the risk of an individual needing intensive care if infected, and simulated differential isolation policies using COVID-19 data and estimates in France as of early May 2020. We also performed sensitivity analyses. The framework may be used with other epidemic models, with other risk predictions, and for other epidemic outbreaks. Findings: Simulations indicated that, assuming everything else the same, an exit policy considering clinical risk predictions starting on May 11, as planned by the French government, could enable to immediately relax restrictions for an extra 10% (6 700 000 people) or more of the lowest-risk population, and consequently relax the restrictions on the remaining population significantly faster -- while abiding to the current ICU capacity. Similar exit policies without risk predictions would exceed the ICU capacity by a multiple. Sensitivity analyses showed that when the assumed percentage of severe patients among the population decreased, or the prediction model discrimination improved, or the ICU capacity increased, policies based on risk models had a greater impact on the results of epidemic simulations. At the same time, sensitivity analyses also showed that differential isolation policies require the higher risk individuals to comply with recommended restrictions. In general, our simulations demonstrated that risk prediction models could improve policy effectiveness, keeping everything else constant. Interpretation: Clinical risk prediction models can inform new personalised isolation and exit policies, which may lead to both safer and faster outcomes than what can be achieved without such prediction models.

Download Full-text

Machine Learning Approach to Reduce Alert Fatigue Using a Disease Medication–Related Clinical Decision Support System: Model Development and Validation (Preprint)

10.2196/preprints.19489 ◽

2020 ◽

Author(s):

Tahmina Nasrin Poly ◽

Md.Mohaimenul Islam ◽

Muhammad Solihuddin Muhtar ◽

Hsuan-Chia Yang ◽

Phung Anh (Alex) Nguyen ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Clinical Decision Support ◽

Prediction Models ◽

Clinical Decision ◽

Gradient Boosting ◽

Support Vector ◽

Learning Models ◽

Alert Fatigue ◽

Machine Learning Models

BACKGROUND Computerized physician order entry (CPOE) systems are incorporated into clinical decision support systems (CDSSs) to reduce medication errors and improve patient safety. Automatic alerts generated from CDSSs can directly assist physicians in making useful clinical decisions and can help shape prescribing behavior. Multiple studies reported that approximately 90%-96% of alerts are overridden by physicians, which raises questions about the effectiveness of CDSSs. There is intense interest in developing sophisticated methods to combat alert fatigue, but there is no consensus on the optimal approaches so far. OBJECTIVE Our objective was to develop machine learning prediction models to predict physicians’ responses in order to reduce alert fatigue from disease medication–related CDSSs. METHODS We collected data from a disease medication–related CDSS from a university teaching hospital in Taiwan. We considered prescriptions that triggered alerts in the CDSS between August 2018 and May 2019. Machine learning models, such as artificial neural network (ANN), random forest (RF), naïve Bayes (NB), gradient boosting (GB), and support vector machine (SVM), were used to develop prediction models. The data were randomly split into training (80%) and testing (20%) datasets. RESULTS A total of 6453 prescriptions were used in our model. The ANN machine learning prediction model demonstrated excellent discrimination (area under the receiver operating characteristic curve [AUROC] 0.94; accuracy 0.85), whereas the RF, NB, GB, and SVM models had AUROCs of 0.93, 0.91, 0.91, and 0.80, respectively. The sensitivity and specificity of the ANN model were 0.87 and 0.83, respectively. CONCLUSIONS In this study, ANN showed substantially better performance in predicting individual physician responses to an alert from a disease medication–related CDSS, as compared to the other models. To our knowledge, this is the first study to use machine learning models to predict physician responses to alerts; furthermore, it can help to develop sophisticated CDSSs in real-world clinical settings.

Download Full-text