Machine Learning-based in-hospital Mortality Prediction Models for Patients With Acute Coronary Syndrome

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.

Download Full-text

A comparison of logistic regression models with alternative machine learning methods to predict the risk of in-hospital mortality in emergency medical admissions via external validation

Health Informatics Journal ◽

10.1177/1460458218813600 ◽

2018 ◽

Vol 26 (1) ◽

pp. 34-44 ◽

Cited By ~ 1

Author(s):

Muhammad Faisal ◽

Andy Scally ◽

Robin Howes ◽

Kevin Beatson ◽

Donald Richardson ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Hospital Mortality ◽

Receiver Operating Characteristic Curve ◽

Operating Characteristic ◽

Characteristic Curve ◽

External Validation ◽

Learning Methods ◽

Machine Learning Methods ◽

Operating Characteristic Curve

We compare the performance of logistic regression with several alternative machine learning methods to estimate the risk of death for patients following an emergency admission to hospital based on the patients’ first blood test results and physiological measurements using an external validation approach. We trained and tested each model using data from one hospital ( n = 24,696) and compared the performance of these models in data from another hospital ( n = 13,477). We used two performance measures – the calibration slope and area under the receiver operating characteristic curve. The logistic model performed reasonably well – calibration slope: 0.90, area under the receiver operating characteristic curve: 0.847 compared to the other machine learning methods. Given the complexity of choosing tuning parameters of these methods, the performance of logistic regression with transformations for in-hospital mortality prediction was competitive with the best performing alternative machine learning methods with no evidence of overfitting.

Download Full-text

Can machine learning improve mortality prediction following cardiac surgery?

European Journal of Cardio-Thoracic Surgery ◽

10.1093/ejcts/ezaa229 ◽

2020 ◽

Vol 58 (6) ◽

pp. 1130-1136

Author(s):

Umberto Benedetto ◽

Shubhra Sinha ◽

Matt Lyon ◽

Arnaldo Dimagli ◽

Tom R Gaunt ◽

...

Keyword(s):

Machine Learning ◽

Cardiac Surgery ◽

Random Forest ◽

Hospital Mortality ◽

Receiver Operating Characteristic Curve ◽

Receiver Operating Characteristic ◽

Operating Characteristic ◽

Characteristic Curve ◽

Operating Characteristic Curve ◽

Receiver Operating

Abstract OBJECTIVES Interest in the clinical usefulness of machine learning for risk prediction has bloomed recently. Cardiac surgery patients are at high risk of complications and therefore presurgical risk assessment is of crucial relevance. We aimed to compare the performance of machine learning algorithms over traditional logistic regression (LR) model to predict in-hospital mortality following cardiac surgery. METHODS A single-centre data set of prospectively collected information from patients undergoing adult cardiac surgery from 1996 to 2017 was split into 70% training set and 30% testing set. Prediction models were developed using neural network, random forest, naive Bayes and retrained LR based on features included in the EuroSCORE. Discrimination was assessed using area under the receiver operating characteristic curve, and calibration analysis was undertaken using the calibration belt method. Model calibration drift was assessed by comparing Goodness of fit χ2 statistics observed in 2 equal bins from the testing sample ordered by procedure date. RESULTS A total of 28 761 cardiac procedures were performed during the study period. The in-hospital mortality rate was 2.7%. Retrained LR [area under the receiver operating characteristic curve 0.80; 95% confidence interval (CI) 0.77–0.83] and random forest model (0.80; 95% CI 0.76–0.83) showed the best discrimination. All models showed significant miscalibration. Retrained LR proved to have the weakest calibration drift. CONCLUSIONS Our findings do not support the hypothesis that machine learning methods provide advantage over LR model in predicting operative mortality after cardiac surgery.

Download Full-text

Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML)

Translational Stroke Research ◽

10.1007/s12975-021-00937-x ◽

2021 ◽

Author(s):

Kazutaka Uchida ◽

Junichi Kouno ◽

Shinichi Yoshimura ◽

Norito Kinjo ◽

Fumihiro Sakakibara ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Prediction Models ◽

Characteristic Curve ◽

Predictive Performance ◽

Vessel Occlusion ◽

Predictive Values ◽

Training Cohort ◽

Sensitivity Specificity

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.

Download Full-text

Machine learning for identification of surgeries with high risks of cancellation

Health Informatics Journal ◽

10.1177/1460458218813602 ◽

2018 ◽

Vol 26 (1) ◽

pp. 141-155 ◽

Cited By ~ 2

Author(s):

Li Luo ◽

Fengyi Zhang ◽

Yao Yao ◽

RenRong Gong ◽

Martina Fu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Predictive Value ◽

Operating Characteristic ◽

Sampling Methods ◽

Characteristic Curve ◽

Support Vector ◽

Chi Square ◽

Stable Performance ◽

Operating Characteristic Curve

Surgery cancellations waste scarce operative resources and hinder patients’ access to operative services. In this study, the Wilcoxon and chi-square tests were used for predictor selection, and three machine learning models – random forest, support vector machine, and XGBoost – were used for the identification of surgeries with high risks of cancellation. The optimal performances of the identification models were as follows: sensitivity − 0.615; specificity − 0.957; positive predictive value − 0.454; negative predictive value − 0.904; accuracy − 0.647; and area under the receiver operating characteristic curve − 0.682. Of the three models, the random forest model achieved the best performance. Thus, the effective identification of surgeries with high risks of cancellation is feasible with stable performance. Models and sampling methods significantly affect the performance of identification. This study is a new application of machine learning for the identification of surgeries with high risks of cancellation and facilitation of surgery resource management.

Download Full-text

Applicability of an Automated Model and Parameter Selection in the Prediction of Screening-Level PTSD in Danish Soldiers Following Deployment: Development Study of Transferable Predictive Models Using Automated Machine Learning (Preprint)

10.2196/preprints.17119 ◽

2019 ◽

Author(s):

Karen-Inge Karstoft ◽

Ioannis Tsamardinos ◽

Kasper Eskelund ◽

Søren Bo Andersen ◽

Lars Ravnborg Nissen

Keyword(s):

Machine Learning ◽

Operating Characteristic ◽

Linear Models ◽

Prediction Models ◽

Characteristic Curve ◽

Ptsd Symptoms ◽

Forest Models ◽

Random Forest Models ◽

Automated Machine Learning ◽

Military Rank

BACKGROUND Posttraumatic stress disorder (PTSD) is a relatively common consequence of deployment to war zones. Early postdeployment screening with the aim of identifying those at risk for PTSD in the years following deployment will help deliver interventions to those in need but have so far proved unsuccessful. OBJECTIVE This study aimed to test the applicability of automated model selection and the ability of automated machine learning prediction models to transfer across cohorts and predict screening-level PTSD 2.5 years and 6.5 years after deployment. METHODS Automated machine learning was applied to data routinely collected 6-8 months after return from deployment from 3 different cohorts of Danish soldiers deployed to Afghanistan in 2009 (cohort 1, N=287 or N=261 depending on the timing of the outcome assessment), 2010 (cohort 2, N=352), and 2013 (cohort 3, N=232). RESULTS Models transferred well between cohorts. For screening-level PTSD 2.5 and 6.5 years after deployment, random forest models provided the highest accuracy as measured by area under the receiver operating characteristic curve (AUC): 2.5 years, AUC=0.77, 95% CI 0.71-0.83; 6.5 years, AUC=0.78, 95% CI 0.73-0.83. Linear models performed equally well. Military rank, hyperarousal symptoms, and total level of PTSD symptoms were highly predictive. CONCLUSIONS Automated machine learning provided validated models that can be readily implemented in future deployment cohorts in the Danish Defense with the aim of targeting postdeployment support interventions to those at highest risk for developing PTSD, provided the cohorts are deployed on similar missions.

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text

Predicting mortality in hemodialysis patients using machine learning analysis

Clinical Kidney Journal ◽

10.1093/ckj/sfaa126 ◽

2020 ◽

Author(s):

Victoria Garcia-Montemayor ◽

Alejandro Martin-Malo ◽

Carlo Barbieri ◽

Francesco Bellocchio ◽

Sagrario Soriano ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Prediction Models ◽

Initial Period ◽

Area Under The Curve ◽

Mortality Prediction ◽

Machine Learning Techniques ◽

Prediction Of Mortality ◽

Mortality Prediction Models

Abstract Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.

Download Full-text

Mortality Prediction in Cerebral Hemorrhage Patients Using Machine Learning Algorithms in Intensive Care Units

Frontiers in Neurology ◽

10.3389/fneur.2020.610531 ◽

2021 ◽

Vol 11 ◽

Author(s):

Ximing Nie ◽

Yuan Cai ◽

Jingyi Liu ◽

Xiran Liu ◽

Jiahui Zhao ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Random Forest ◽

Hospital Mortality ◽

Intensive Care Units ◽

Cerebral Hemorrhage ◽

Characteristic Curve ◽

Learning Algorithms ◽

Mortality Prediction ◽

Machine Learning Algorithms

Objectives: This study aims to investigate whether the machine learning algorithms could provide an optimal early mortality prediction method compared with other scoring systems for patients with cerebral hemorrhage in intensive care units in clinical practice.Methods: Between 2008 and 2012, from Intensive Care III (MIMIC-III) database, all cerebral hemorrhage patients monitored with the MetaVision system and admitted to intensive care units were enrolled in this study. The calibration, discrimination, and risk classification of predicted hospital mortality based on machine learning algorithms were assessed. The primary outcome was hospital mortality. Model performance was assessed with accuracy and receiver operating characteristic curve analysis.Results: Of 760 cerebral hemorrhage patients enrolled from MIMIC database [mean age, 68.2 years (SD, ±15.5)], 383 (50.4%) patients died in hospital, and 377 (49.6%) patients survived. The area under the receiver operating characteristic curve (AUC) of six machine learning algorithms was 0.600 (nearest neighbors), 0.617 (decision tree), 0.655 (neural net), 0.671(AdaBoost), 0.819 (random forest), and 0.725 (gcForest). The AUC was 0.423 for Acute Physiology and Chronic Health Evaluation II score. The random forest had the highest specificity and accuracy, as well as the greatest AUC, showing the best ability to predict in-hospital mortality.Conclusions: Compared with conventional scoring system and the other five machine learning algorithms in this study, random forest algorithm had better performance in predicting in-hospital mortality for cerebral hemorrhage patients in intensive care units, and thus further research should be conducted on random forest algorithm.

Download Full-text

Abstract 13455: Predicting Emergency Department Disposition at Triage for Suspected Patients With Acute Coronary Syndrome Using Machine Learning Algorithms

Circulation ◽

10.1161/circ.142.suppl_3.13455 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Stephanie O Frisch ◽

Zeineb Bouzid ◽

Jessica Zègre-Hemsey ◽

Clifton W CALLAWAY ◽

Holli A Devon ◽

...

Keyword(s):

Machine Learning ◽

Acute Coronary Syndrome ◽

Critical Care ◽

Random Forest ◽

Characteristic Curve ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Coronary Syndrome ◽

Critical Care Admission

Introduction: Overcrowded emergency departments (ED) and undifferentiated patients make the provision of care and resources challenging. We examined whether machine learning algorithms could identify ED patients’ disposition (hospitalization and critical care admission) using readily available objective triage data among patients with symptoms suggestive of acute coronary syndrome (ACS). Methods: This was a retrospective observational cohort study of adult patients who were triaged at the ED for a suspected coronary event. A total of 162 input variables (k) were extracted from the electronic health record: demographics (k=3), mode of transportation (k=1), past medical/surgical history (k=57), first ED vital signs (k=7), home medications (k=31), symptomology (k=40), and the computer generated automatic interpretation of 12-lead electrocardiogram (k=23). The primary outcomes were hospitalization and critical care admission (i.e., admission to intensive or step-down care unit). We used 10-fold stratified cross validation to evaluate the performance of five machine learning algorithms to predict the study outcomes: logistic regression, naïve Bayes, random forest, gradient boosting and artificial neural network classifiers. We determined the best model by comparing the area under the receiver operating characteristic curve (AUC) of all models. Results: Included were 1201 patients (age 64±14, 39% female; 10% Black) with a total of 956 hospitalizations, and 169 critical care admissions. The best performing machine learning classifier for the outcome of hospitalization was gradient boosting machine with an AUC of 0.85 (95% CI, 0.82–0.89), 89% sensitivity, and F-score of 0.83; random forest classifier performed the best for the outcome of critical care admission with an AUC of 0.73 (95% CI, 0.70–0.77), 76% sensitivity, and F-score of 0.56. Conclusion: Predictive machine learning algorithms demonstrate excellent to good discriminative power to predict hospitalization and critical care admission, respectively. Administrators and clinicians could benefit from machine learning approaches to predict hospitalization and critical care admission, to optimize and allocate scarce ED and hospital resources and provide optimal care.

Download Full-text

Development and Validation of an Insulin Resistance Predicting Model Using a Machine-Learning Approach in a Population-Based Cohort in Korea

Diagnostics ◽

10.3390/diagnostics12010212 ◽

2022 ◽

Vol 12 (1) ◽

pp. 212

Author(s):

Sunmin Park ◽

Chaeyeon Kim ◽

Xuangao Wu

Keyword(s):

Machine Learning ◽

Insulin Resistance ◽

Metabolic Syndrome ◽

Logistic Regression ◽

Random Forest ◽

Roc Curve ◽

Genome Wide Association Study ◽

Prediction Models ◽

Risk Scores ◽

A Genome

Background: Insulin resistance is a common etiology of metabolic syndrome, but receiver operating characteristic (ROC) curve analysis shows a weak association in Koreans. Using a machine learning (ML) approach, we aimed to generate the best model for predicting insulin resistance in Korean adults aged > 40 of the Ansan/Ansung cohort using a machine learning (ML) approach. Methods: The demographic, anthropometric, biochemical, genetic, nutrient, and lifestyle variables of 8842 participants were included. The polygenetic risk scores (PRS) generated by a genome-wide association study were added to represent the genetic impact of insulin resistance. They were divided randomly into the training (n = 7037) and test (n = 1769) sets. Potentially important features were selected in the highest area under the curve (AUC) of the ROC curve from 99 features using seven different ML algorithms. The AUC target was ≥0.85 for the best prediction of insulin resistance with the lowest number of features. Results: The cutoff of insulin resistance defined with HOMA-IR was 2.31 using logistic regression before conducting ML. XGBoost and logistic regression algorithms generated the highest AUC (0.86) of the prediction models using 99 features, while the random forest algorithm generated a model with 0.82 AUC. These models showed high accuracy and k-fold values (>0.85). The prediction model containing 15 features had the highest AUC of the ROC curve in XGBoost and random forest algorithms. PRS was one of 15 features. The final prediction models for insulin resistance were generated with the same nine features in the XGBoost (AUC = 0.86), random forest (AUC = 0.84), and artificial neural network (AUC = 0.86) algorithms. The model included the fasting serum glucose, ALT, total bilirubin, HDL concentrations, waist circumference, body fat, pulse, season to enroll in the study, and gender. Conclusion: The liver function, regular pulse checking, and seasonal variation in addition to metabolic syndrome components should be considered to predict insulin resistance in Koreans aged over 40 years.

Download Full-text