Predicting mortality in hemodialysis patients using machine learning analysis

Abstract Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.

Download Full-text

Comparing Logistic Regression Models with Alternative Machine Learning Methods to Predict the Risk of Drug Intoxication Mortality

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17030897 ◽

2020 ◽

Vol 17 (3) ◽

pp. 897

Author(s):

YoungJin Choi ◽

YooKyung Boo

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Toxic Substance ◽

Area Under The Curve ◽

Mortality Prediction ◽

Brier Score ◽

Tree Model ◽

Drug Intoxication ◽

Testing Phase

(1) Medical research has shown an increasing interest in machine learning, permitting massive multivariate data analysis. Thus, we developed drug intoxication mortality prediction models, and compared machine learning models and traditional logistic regression. (2) Categorized as drug intoxication, 8,937 samples were extracted from the Korea Centers for Disease Control and Prevention (2008-2017). We trained, validated, and tested each model through data and compared their performance using three measures: Brier score, calibration slope, and calibration-in-the-large. (3) A chi-square test demonstrated that mortality risk statistically significantly differed according to severity, intent, toxic substance, age, and sex. The multilayer perceptron model (MLP) had the highest area under the curve (AUC), and lowest Brier score in training and validation phases, while the logistic regression model (LR) showed the highest AUC (0.827) and lowest Brier score (0.0307) in the testing phase. MLP also had the second-highest AUC (0.816) and second-lowest Brier score (0.003258) in the testing phase, demonstrating better performance than the decision-making tree model. (4) Given the complexity of choosing tuning parameters, LR proved competitive when using medical datasets, which require strict accuracy.

Download Full-text

Fluid Overload Phenotypes in Critical Illness—A Machine Learning Approach

Journal of Clinical Medicine ◽

10.3390/jcm11020336 ◽

2022 ◽

Vol 11 (2) ◽

pp. 336

Author(s):

Anna S. Messmer ◽

Michel Moser ◽

Patrick Zuercher ◽

Joerg C. Schefold ◽

Martin Müller ◽

...

Keyword(s):

Machine Learning ◽

Septic Shock ◽

Logistic Regression ◽

Intensive Care ◽

Random Forest ◽

Fluid Overload ◽

Area Under The Curve ◽

Machine Learning Techniques ◽

Intensive Care Patients ◽

Multivariable Logistic Regression

Background: The detrimental impact of fluid overload (FO) on intensive care unit (ICU) morbidity and mortality is well known. However, research to identify subgroups of patients particularly prone to fluid overload is scarce. The aim of this cohort study was to derive “FO phenotypes” in the critically ill by using machine learning techniques. Methods: Retrospective single center study including adult intensive care patients with a length of stay of ≥3 days and sufficient data to compute FO. Data was analyzed by multivariable logistic regression, fast and frugal trees (FFT), classification decision trees (DT), and a random forest (RF) model. Results: Out of 1772 included patients, 387 (21.8%) met the FO definition. The random forest model had the highest area under the curve (AUC) (0.84, 95% CI 0.79–0.86), followed by multivariable logistic regression (0.81, 95% CI 0.77–0.86), FFT (0.75, 95% CI 0.69–0.79) and DT (0.73, 95% CI 0.68–0.78) to predict FO. The most important predictors identified in all models were lactate and bicarbonate at admission and postsurgical ICU admission. Sepsis/septic shock was identified as a risk factor in the MV and RF analysis. Conclusion: The FO phenotypes consist of patients admitted after surgery or with sepsis/septic shock with high lactate and low bicarbonate.

Download Full-text

Machine Learning-based in-hospital Mortality Prediction Models for Patients With Acute Coronary Syndrome

10.21203/rs.3.rs-134944/v1 ◽

2020 ◽

Author(s):

Jun Ke ◽

Yiwei Chen ◽

Xiaoping Wang ◽

Zhiyong Wu ◽

qiongyao Zhang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Hospital Mortality ◽

Operating Characteristic ◽

Prediction Models ◽

Characteristic Curve ◽

Multivariate Logistic Regression Analysis ◽

Hdl Cholesterol ◽

Coronary Syndrome

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.

Download Full-text

Clinical and Laboratory Predictors of In-hospital Mortality in Patients With Coronavirus Disease-2019: A Cohort Study in Wuhan, China

Clinical Infectious Diseases ◽

10.1093/cid/ciaa538 ◽

2020 ◽

Vol 71 (16) ◽

pp. 2079-2088 ◽

Cited By ~ 52

Author(s):

Kun Wang ◽

Peiyuan Zuo ◽

Yuwei Liu ◽

Meng Zhang ◽

Xiaofang Zhao ◽

...

Keyword(s):

Hospital Mortality ◽

Prediction Models ◽

Area Under The Curve ◽

Mortality Prediction ◽

Gradient Boosting ◽

Laboratory Model ◽

Training Cohort ◽

Clinical Model ◽

Extreme Gradient Boosting ◽

Mortality Prediction Models

Abstract Background This study aimed to develop mortality-prediction models for patients with coronavirus disease-2019 (COVID-19). Methods The training cohort included consecutive COVID-19 patients at the First People’s Hospital of Jiangxia District in Wuhan, China, from 7 January 2020 to 11 February 2020. We selected baseline data through the stepwise Akaike information criterion and ensemble XGBoost (extreme gradient boosting) model to build mortality-prediction models. We then validated these models by randomly collected COVID-19 patients in Union Hospital, Wuhan, from 1 January 2020 to 20 February 2020. Results A total of 296 COVID-19 patients were enrolled in the training cohort; 19 died during hospitalization and 277 discharged from the hospital. The clinical model developed using age, history of hypertension, and coronary heart disease showed area under the curve (AUC), 0.88 (95% confidence interval [CI], .80–.95); threshold, −2.6551; sensitivity, 92.31%; specificity, 77.44%; and negative predictive value (NPV), 99.34%. The laboratory model developed using age, high-sensitivity C-reactive protein, peripheral capillary oxygen saturation, neutrophil and lymphocyte count, d-dimer, aspartate aminotransferase, and glomerular filtration rate had a significantly stronger discriminatory power than the clinical model (P = .0157), with AUC, 0.98 (95% CI, .92–.99); threshold, −2.998; sensitivity, 100.00%; specificity, 92.82%; and NPV, 100.00%. In the subsequent validation cohort (N = 44), the AUC (95% CI) was 0.83 (.68–.93) and 0.88 (.75–.96) for the clinical model and laboratory model, respectively. Conclusions We developed 2 predictive models for the in-hospital mortality of patients with COVID-19 in Wuhan that were validated in patients from another center.

Download Full-text

Evaluation of crowdsourced mortality prediction models as a framework for assessing AI in medicine

10.1101/2021.01.18.21250072 ◽

2021 ◽

Author(s):

Timothy Bergquist ◽

Thomas Schaffter ◽

Yao Yan ◽

Thomas Yu ◽

Justin Prosser ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Improve Patient Care ◽

Mortality Prediction ◽

Direct Access ◽

Healthcare Outcomes ◽

Patient Privacy ◽

Mortality Prediction Models ◽

And Performance ◽

Similar Accuracy

AbstractApplications of machine learning in healthcare are of high interest and have the potential to significantly improve patient care. Yet, the real-world accuracy and performance of these models on different patient subpopulations remains unclear. To address these important questions, we hosted a community challenge to evaluate different methods that predict healthcare outcomes. To overcome patient privacy concerns, we employed a Model-to-Data approach, allowing citizen scientists and researchers to train and evaluate machine learning models on private health data without direct access to that data. We focused on the prediction of all-cause mortality as the community challenge question. In total, we had 345 registered participants, coalescing into 25 independent teams, spread over 3 continents and 10 countries. The top performing team achieved a final area under the receiver operator curve of 0.947 (95% CI 0.942, 0.951) and an area under the precision-recall curve of 0.487 (95% CI 0.458, 0.499) on patients prospectively collected over a one year observation of a large health system. Post-hoc analysis after the challenge revealed that models differ in accuracy on subpopulations, delineated by race or gender, even when they are trained on the same data and have similar accuracy on the population. This is the largest community challenge focused on the evaluation of state-of-the-art machine learning methods in a healthcare system performed to date, revealing both opportunities and pitfalls of clinical AI.

Download Full-text

Improving Earnings Predictions and Abnormal Returns with Machine Learning

Accounting Horizons ◽

10.2308/horizons-19-125 ◽

2021 ◽

Author(s):

Joshua O.S. Hunt ◽

James N. Myers ◽

Linda A. Myers

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Abnormal Returns ◽

Forecast Accuracy ◽

Trading Strategy ◽

Machine Learning Techniques ◽

Binary Outcomes ◽

High Tech ◽

Out Of Sample

Using use stepwise logit regression, Ou and Penman (1989) predicts the sign of future earnings changes and uses these predictions to form a profitable hedge portfolio. Dramatic increases in computing power and recent advances in machine learning allow us to extend Ou and Penman (1989) using a larger dataset, more computer intensive forecasting algorithms, and modern prediction models. We find that stepwise logit continues to provide good out-of-sample predictions and can be used to form a trading strategy that generates small abnormal returns, but a nonparametric machine learning technique (random forest) significantly improves out-of-sample forecast accuracy and trading strategy returns. We also find that that the models identify different independent variables as being important for prediction in the High Tech and Manufacturing industries, but this does not lead to better predictions or higher trading strategy returns. Overall, the most profitable strategy is based on earnings predictions from a random forest model using our full sample. Our results confirm the Ou and Penman (1989) finding that financial statement information can be useful for investment decisions, and suggest that recent nonparametric machine learning techniques could be useful in a variety of accounting contexts where predictions of binary outcomes are needed.

Download Full-text

Predicting lethal courses in critically ill COVID-19 patients using a machine learning model trained on patients with non-COVID-19 viral pneumonia

Scientific Reports ◽

10.1038/s41598-021-92475-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Gregor Lichtner ◽

Felix Balzer ◽

Stefan Haufe ◽

Niklas Giesa ◽

Fridtjof Schiefenhövel ◽

...

Keyword(s):

Machine Learning ◽

Critically Ill ◽

Prediction Models ◽

Predictive Performance ◽

Learning Model ◽

Mortality Prediction ◽

Viral Pneumonia ◽

Machine Learning Model ◽

Mortality Prediction Models ◽

Time Courses

AbstractIn a pandemic with a novel disease, disease-specific prognosis models are available only with a delay. To bridge the critical early phase, models built for similar diseases might be applied. To test the accuracy of such a knowledge transfer, we investigated how precise lethal courses in critically ill COVID-19 patients can be predicted by a model trained on critically ill non-COVID-19 viral pneumonia patients. We trained gradient boosted decision tree models on 718 (245 deceased) non-COVID-19 viral pneumonia patients to predict individual ICU mortality and applied it to 1054 (369 deceased) COVID-19 patients. Our model showed a significantly better predictive performance (AUROC 0.86 [95% CI 0.86–0.87]) than the clinical scores APACHE2 (0.63 [95% CI 0.61–0.65]), SAPS2 (0.72 [95% CI 0.71–0.74]) and SOFA (0.76 [95% CI 0.75–0.77]), the COVID-19-specific mortality prediction models of Zhou (0.76 [95% CI 0.73–0.78]) and Wang (laboratory: 0.62 [95% CI 0.59–0.65]; clinical: 0.56 [95% CI 0.55–0.58]) and the 4C COVID-19 Mortality score (0.71 [95% CI 0.70–0.72]). We conclude that lethal courses in critically ill COVID-19 patients can be predicted by a machine learning model trained on non-COVID-19 patients. Our results suggest that in a pandemic with a novel disease, prognosis models built for similar diseases can be applied, even when the diseases differ in time courses and in rates of critical and lethal courses.

Download Full-text

Predicting in-Hospital Mortality of Patients with COVID-19 Using Machine Learning Techniques

Journal of Personalized Medicine ◽

10.3390/jpm11050343 ◽

2021 ◽

Vol 11 (5) ◽

pp. 343

Author(s):

Fabiana Tezza ◽

Giulia Lorenzoni ◽

Danila Azzolina ◽

Sofia Barbar ◽

Lucia Anna Carmela Leone ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Mortality ◽

Learning Algorithm ◽

Vital Signs ◽

Mortality Prediction ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Techniques

The present work aims to identify the predictors of COVID-19 in-hospital mortality testing a set of Machine Learning Techniques (MLTs), comparing their ability to predict the outcome of interest. The model with the best performance will be used to identify in-hospital mortality predictors and to build an in-hospital mortality prediction tool. The study involved patients with COVID-19, proved by PCR test, admitted to the “Ospedali Riuniti Padova Sud” COVID-19 referral center in the Veneto region, Italy. The algorithms considered were the Recursive Partition Tree (RPART), the Support Vector Machine (SVM), the Gradient Boosting Machine (GBM), and Random Forest. The resampled performances were reported for each MLT, considering the sensitivity, specificity, and the Receiving Operative Characteristic (ROC) curve measures. The study enrolled 341 patients. The median age was 74 years, and the male gender was the most prevalent. The Random Forest algorithm outperformed the other MLTs in predicting in-hospital mortality, with a ROC of 0.84 (95% C.I. 0.78–0.9). Age, together with vital signs (oxygen saturation and the quick SOFA) and lab parameters (creatinine, AST, lymphocytes, platelets, and hemoglobin), were found to be the strongest predictors of in-hospital mortality. The present work provides insights for the prediction of in-hospital mortality of COVID-19 patients using a machine-learning algorithm.

Download Full-text

Development and Validation of an Insulin Resistance Predicting Model Using a Machine-Learning Approach in a Population-Based Cohort in Korea

Diagnostics ◽

10.3390/diagnostics12010212 ◽

2022 ◽

Vol 12 (1) ◽

pp. 212

Author(s):

Sunmin Park ◽

Chaeyeon Kim ◽

Xuangao Wu

Keyword(s):

Machine Learning ◽

Insulin Resistance ◽

Metabolic Syndrome ◽

Logistic Regression ◽

Random Forest ◽

Roc Curve ◽

Genome Wide Association Study ◽

Prediction Models ◽

Risk Scores ◽

A Genome

Background: Insulin resistance is a common etiology of metabolic syndrome, but receiver operating characteristic (ROC) curve analysis shows a weak association in Koreans. Using a machine learning (ML) approach, we aimed to generate the best model for predicting insulin resistance in Korean adults aged > 40 of the Ansan/Ansung cohort using a machine learning (ML) approach. Methods: The demographic, anthropometric, biochemical, genetic, nutrient, and lifestyle variables of 8842 participants were included. The polygenetic risk scores (PRS) generated by a genome-wide association study were added to represent the genetic impact of insulin resistance. They were divided randomly into the training (n = 7037) and test (n = 1769) sets. Potentially important features were selected in the highest area under the curve (AUC) of the ROC curve from 99 features using seven different ML algorithms. The AUC target was ≥0.85 for the best prediction of insulin resistance with the lowest number of features. Results: The cutoff of insulin resistance defined with HOMA-IR was 2.31 using logistic regression before conducting ML. XGBoost and logistic regression algorithms generated the highest AUC (0.86) of the prediction models using 99 features, while the random forest algorithm generated a model with 0.82 AUC. These models showed high accuracy and k-fold values (>0.85). The prediction model containing 15 features had the highest AUC of the ROC curve in XGBoost and random forest algorithms. PRS was one of 15 features. The final prediction models for insulin resistance were generated with the same nine features in the XGBoost (AUC = 0.86), random forest (AUC = 0.84), and artificial neural network (AUC = 0.86) algorithms. The model included the fasting serum glucose, ALT, total bilirubin, HDL concentrations, waist circumference, body fat, pulse, season to enroll in the study, and gender. Conclusion: The liver function, regular pulse checking, and seasonal variation in addition to metabolic syndrome components should be considered to predict insulin resistance in Koreans aged over 40 years.

Download Full-text

Prediction Models of Early Childhood Caries Based on Machine Learning Algorithms

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18168613 ◽

2021 ◽

Vol 18 (16) ◽

pp. 8613

Author(s):

You-Hyun Park ◽

Sung-Hwa Kim ◽

Yoon-Young Choi

Keyword(s):

Machine Learning ◽

Early Childhood ◽

Logistic Regression ◽

Random Forest ◽

Early Childhood Caries ◽

Prediction Models ◽

Risk Groups ◽

Machine Learning Algorithms ◽

Significant Difference ◽

Childhood Caries

In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1–5 years from the Korea National Health and Nutrition Examination Survey data (2007–2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.

Download Full-text