scholarly journals 554 Machine learning for prediction of in-hospital mortality in COVID-19 patients: results from an Italian multicentre study

2021 ◽  
Vol 23 (Supplement_G) ◽  
Author(s):  
Sara Paris ◽  
Riccardo Maria Inciardi ◽  
Claudia Specchia ◽  
Marika Vezzoli ◽  
Chiara Oriecuia ◽  
...  

Abstract Aims Several risk factors have been identified to predict worse outcomes in patients affected by SARS-CoV-2 infection. Prediction models are needed to optimize clinical management and to early stratify patients at a higher mortality risk. Machine learning (ML) algorithms represent a novel approach to identify a prediction model with a good discriminatory capacity to be easily used in clinical practice. Methods and results The Cardio-COVID is a multicentre observational study that involved a cohort of consecutive adult Caucasian patients with laboratory-confirmed COVID-19 [by real time reverse transcriptase—polymerase chain reaction (RT-PCR)] who were hospitalized in 13 Italian cardiology units from 1 March to 9 April 2020. Patients were followed-up after the COVID-19 diagnosis and all causes in-hospital mortality or discharge were ascertained until 23 April 2020. Variables with more than 20% of missing values were excluded. The Lasso procedure was used with a λ = 0.07 for reducing the covariates number. Mortality was estimated by means of a Random Forest (RF). The dataset was randomly divided in two subsamples with the same percentage of death/alive people of the entire sample: training set contained 80% of the data and test set the remaining 20%. The training set was used in the calibration procedure where a RF models in-hospital mortality with the covariates selected by Lasso. Its accuracy was measured by means of the ROC curve, obtaining AUC, sensitivity, specificity, and related 95% confidence interval (CI) computed with 10 000 stratified bootstrap replicates. From the RF the relative Variable Importance Measure (relVIM) was extracted to understand which of the selected variables had the greatest impact on outcome, providing a ranking from the most (relVIM = 100) to the less important variable. The model obtained was compared with the Gradient Boosting Machine (GBM) and with the logistic regression, where the predictions were cross validated. Finally, to understand if each model has the same performance in sample (training) and out of sample (test), the two AUCs were compared by means of the DeLong’s test. Among 701 patients enrolled (mean age 67.2 ± 13.2 years, 69.5% males), 165 (23.5%) died during a median hospitalization of 15 (IQR, 9–24) days. Variables selected by the Lasso were: age, Oxygen saturation, PaO2/FiO2, Creatinine Clearance and elevated Troponin. Compared with those who survived, deceased patients were older, had a lower blood oxygenation, a lower creatinine clearance levels and higher prevalence of elevated Troponin (all P < 0.001). Training set included 561 patients and test set 140 patients. The best performance out of sample was provided by the RF with an AUC of 0.78 (95% CI: 0.68–0.88) and a sensitivity of 0.88 (95% CI: 0.58–1.00). Moreover, RF is the unique methodology that provided similar performance in sample and out of sample (DeLong test P = 0.78). On the contrary, prediction model was less accurate by using GBM and logistic regression. The relVIM ranked the variables from the most to the less important in predicting the outcome as follows: clearance creatinine, PaO2/FiO2, age, oxygen saturation, and elevated Troponin. Conclusions In a large COVID-19 population, we showed that a customizable ML-based score derived from clinical variables, is feasible and effective for the prediction of in-hospital mortality.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Zhixuan Zeng ◽  
Shuo Yao ◽  
Jianfei Zheng ◽  
Xun Gong

Abstract Background Early prediction of hospital mortality is crucial for ICU patients with sepsis. This study aimed to develop a novel blending machine learning (ML) model for hospital mortality prediction in ICU patients with sepsis. Methods Two ICU databases were employed: eICU Collaborative Research Database (eICU-CRD) and Medical Information Mart for Intensive Care III (MIMIC-III). All adult patients who fulfilled Sepsis-3 criteria were identified. Samples from eICU-CRD constituted training set and samples from MIMIC-III constituted test set. Stepwise logistic regression model was used for predictor selection. Blending ML model which integrated nine sorts of basic ML models was developed for hospital mortality prediction in ICU patients with sepsis. Model performance was evaluated by various measures related to discrimination or calibration. Results Twelve thousand five hundred fifty-eight patients from eICU-CRD were included as the training set, and 12,095 patients from MIMIC-III were included as the test set. Both the training set and the test set showed a hospital mortality of 17.9%. Maximum and minimum lactate, maximum and minimum albumin, minimum PaO2/FiO2 and age were important predictors identified by both random forest and extreme gradient boosting algorithm. Blending ML models based on corresponding set of predictors presented better discrimination than SAPS II (AUROC, 0.806 vs. 0.771; AUPRC 0.515 vs. 0.429) and SOFA (AUROC, 0.742 vs. 0.706; AUPRC 0.428 vs. 0.381) on the test set. In addition, calibration curves showed that blending ML models had better calibration than SAPS II. Conclusions The blending ML model is capable of integrating different sorts of basic ML models efficiently, and outperforms conventional severity scores in predicting hospital mortality among septic patients in ICU.


Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Samantha E Berger ◽  
Gordon S Huggins ◽  
Jeanne M McCaffery ◽  
Alice H Lichtenstein

Introduction: The development of type 2 diabetes is strongly associated with excess weight gain and can often be partially ameliorated or reversed by weight loss. While many lifestyle interventions have resulted in successful weight loss, strategies to maintain the weight loss have been considerably less successful. Prior studies have identified multiple predictors of weight regain, but none have synthesized them into one analytic stream. Methods: We developed a prediction model of 4-year weight regain after a one-year lifestyle-induced weight loss intervention followed by a 3 year maintenance intervention in 1791 overweight or obese adults with type 2 diabetes from the Action for Health in Diabetes (Look AHEAD) trial who lost ≥3% of initial weight by the end of year 1. Weight regain was defined as regaining <50% of the weight lost during the intervention by year 4. Using machine learning we integrated factors from several domains, including demographics, psychosocial metrics, health status and behaviors (e.g. physical activity, self-monitoring, medication use and intervention adherence). We used classification trees and stochastic gradient boosting with 10-fold cross validation to develop and internally validate the prediction model. Results: At the end of four years, 928 individuals maintained ≥50% of their initial weight lost (maintainers), whereas 863 did not met that criterion (regainers). We identified an interaction between age and several variables in the model, as well as percent initial weight loss. Several factors were significant predictors of weight regain based on variable importance plots, regardless of age or initial weight loss, such as insurance status, physical function score, baseline BMI, meal replacement use and minutes of exercise recorded during year 1. We also identified several factors that were significant predictors depending on age group (45-55y/ 56-65y/66-76y) and initial weight loss (lost 3-9% vs. ≥10% of initial weight). When the variables identified from machine learning were added to a logistic regression model stratified by age and initial weight loss groups, the models showed good prediction (3-9% initial weight loss, ages 45-55y (n=293): ROC AUC=0.78; ≥10% initial weight loss, ages 45-55y (n=242): ROC AUC=0.78; (3-9% initial weight loss, ages 56-65y (n=484): ROC AUC=0.70; ≥10% initial weight loss, ages 56-65y (n=455): ROC AUC = 0.74; 3-9% initial weight loss, ages 66-76y (n=150): ROC AUC=0.84; ≥10% initial weight loss, ages 66-76y (n=167): ROC AUC=0.86). Conclusion: The combination of machine learning methodology and logistic regression generates a prediction model that can consider numerous factors simultaneously, can be used to predict weight regain in other populations and can assist in the development of better strategies to prevent post-loss regain.


2021 ◽  
Author(s):  
Nicolai Ree ◽  
Andreas H. Göller ◽  
Jan H. Jensen

We present RegioML, an atom-based machine learning model for predicting the regioselectivities of electrophilic aromatic substitution reactions. The model relies on CM5 atomic charges computed using semiempirical tight binding (GFN1-xTB) combined with the ensemble decision tree variant light gradient boosting machine (LightGBM). The model is trained and tested on 21,201 bromination reactions with 101K reaction centers, which is split into a training, test, and out-of-sample datasets with 58K, 15K, and 27K reaction centers, respectively. The accuracy is 93% for the test set and 90% for the out-of-sample set, while the precision (the percentage of positive predictions that are correct) is 88% and 80%, respectively. The test-set performance is very similar to the graph-based WLN method developed by Struble et al. (React. Chem. Eng. 2020, 5, 896) though the comparison is complicated by the possibility that some of the test and out-of-sample molecules are used to train WLN. RegioML out-performs our physics-based RegioSQM20 method (J. Cheminform. 2021, 13:10) where the precision is only 75%. Even for the out-of-sample dataset, RegioML slightly outperforms RegioSQM20. The good performance of RegioML and WLN is in large part due to the large datasets available for this type of reaction. However, for reactions where there is little experimental data, physics-based approaches like RegioSQM20 can be used to generate synthetic data for model training. We demonstrate this by showing that the performance of RegioSQM20 can be reproduced by a ML-model trained on RegioSQM20-generated data.


2020 ◽  
Author(s):  
Jun Ke ◽  
Yiwei Chen ◽  
Xiaoping Wang ◽  
Zhiyong Wu ◽  
qiongyao Zhang ◽  
...  

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.


2021 ◽  
Author(s):  
Ning Zhang ◽  
Rui Fan ◽  
Jing Ke ◽  
Qinghua Cui ◽  
Dong ZHAO

Abstract BackgroundMicroalbuminuria is the main characteristic of Diabetic kidney disease (DKD), but it fluctuates greatly under the influence of blood glucose. Our aim was to establish some common clinical variables which could be easily collected to predict the risk of DKD in patients with type 2 diabetes. Methods and resultsWe build an artificial intelligence (AI) model to quantitively predict the risk of DKD based on the biomedical parameters from 1239 patients. An information entropy-based feature selection method was applied to screen out the risk factors of DKD. The dataset was divided with 4/5 into the training set and 1/5 into the test set. By using the selected risk factors, 5-fold cross-validation is applied to train the prediction model and it finally got AUC of 0.72 and 0.71 in the training set and test set respectively. In addition, we provide a method of calculating risk factors’ contribution for individuals to provide personalized guidance for treatment. We set up web-based application available on http://www.cuilab.cn/dkd for self-check and early warning. ConclusionsWe establish a feasible prediction model for DKD and suggest the degree of risk contribution of each indicator for each individual, which has certain clinical significance for early intervention and prevention.


Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Suleman Ilyas ◽  
Wasiq Sheikh ◽  
Anshul Parulkar ◽  
Malik B Ahmed ◽  
Gerry Ovide ◽  
...  

Introduction: TAVR is increasingly being performed because of several recent large-scale clinical trials supporting its use across a range of patient sub-classes. Although it provides life-altering relief of severe aortic stenosis, adverse outcomes are not uncommon including paravalvular leak, life-threatening bleeding, acute kidney injury, stroke, and PPMI. RFE can help certain classification models like logistic regression in better predicting binary variables such as PPMI by creating a model based upon a set of predictors and then progressively eliminating variables to optimize the model. Objective: To determine whether RFE when applied to logistic regression would result in discriminatory ability in the prediction of PPM implantation in patients undergoing TAVR Methods: Pre- and postoperative data from a single institution were collected for all patients undergoing TAVR without a history of PPMI between January 2016 and December 2019. EKG data obtained included QRS duration, presence of atrioventricular block, left anterior and posterior fasicular block and right bundle branch block(RBBB). Data was imported into Python and a stratified 5 fold cross validation with SMOTE oversampling was run with RFE running at every fold to avoid overfitting. Upon completion, a rank score was tabulated for each predictor and a final logistic regression model with the highest optimized receiver under the operator curve was exported and applied to a test set. The receiver under the operator score was calculated for the training and test sets and the variables of importance were identified using RFE. Results: The total sample size for this cohort was 513 patients, with a PPMI incidence of 8.58%. The training set consisted of 40 variables and 384 patients, and the test set had 129 patients. The final optimized model on the training set had an ROC of 0.75 and utilized three features out of forty. The three features identified by RFE were: implanted TAVR valve size, QRS duration, and presence of RBBB. The model had a ROC of 0.63 on the test set. Conclusions: Our results show that presence of pre-op RBBB, prolonged QRS duration and valve size were risk factors for PPMI. Logistic regression classification had modest ability in predicting the need for PPMI after TAVR.


2018 ◽  
Vol 7 (2.21) ◽  
pp. 339 ◽  
Author(s):  
K Ulaga Priya ◽  
S Pushpa ◽  
K Kalaivani ◽  
A Sartiha

In Banking Industry loan Processing is a tedious task in identifying the default customers. Manual prediction of default customers might turn into a bad loan in future. Banks possess huge volume of behavioral data from which they are unable to make a judgement about prediction of loan defaulters. Modern techniques like Machine Learning will help to do analytical processing using Supervised Learning and Unsupervised Learning Technique. A data model for predicting default customers using Random forest Technique has been proposed. Data model Evaluation is done on training set and based on the performance parameters final prediction is done on the Test set. This is an evident that Random Forest technique will help the bank to predict the loan Defaulters with utmost accuracy.  


2020 ◽  
Vol 10 (12) ◽  
pp. 4199
Author(s):  
Myoung-Young Choi ◽  
Sunghae Jun

It is very difficult for us to accurately predict occurrence of a fire. But, this is very important to protect human life and property. So, we study fire hazard prediction and evaluation methods to cope with fire risks. In this paper, we propose three models based on statistical machine learning and optimized risk indexing for fire risk assessment. We build logistic regression, deep neural networks (DNN) and fire risk indexing models, and verify performances between proposed and traditional models using real investigated data related to fire occurrence in Korea. In general, fire prediction models currently in use do not provide satisfactory levels of accuracy. The reason for this result is that the factors affecting fire occurrence are very diverse and frequency of fire occurrence is very sparse. To improve accuracy of fire occurrence, we first build logistic regression and DNN models. In addition, we construct a fire risk indexing model for a more improved model of fire prediction. To illustrate comparison results between our research models and current fire prediction model, we use real fire data investigated in Korea between 2011 to 2017. From the experimental results of this paper, we can confirm that accuracy of prediction by the proposed method is superior to the existing fire occurrence prediction model. Therefore, we expect the proposed model to contribute to evaluating the possibility of fire risk in buildings and factories in the field of fire insurance and to calculate the fire insurance premium.


Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 332
Author(s):  
Ernest Kwame Ampomah ◽  
Zhiguang Qin ◽  
Gabriel Nyame

Forecasting the direction and trend of stock price is an important task which helps investors to make prudent financial decisions in the stock market. Investment in the stock market has a big risk associated with it. Minimizing prediction error reduces the investment risk. Machine learning (ML) models typically perform better than statistical and econometric models. Also, ensemble ML models have been shown in the literature to be able to produce superior performance than single ML models. In this work, we compare the effectiveness of tree-based ensemble ML models (Random Forest (RF), XGBoost Classifier (XG), Bagging Classifier (BC), AdaBoost Classifier (Ada), Extra Trees Classifier (ET), and Voting Classifier (VC)) in forecasting the direction of stock price movement. Eight different stock data from three stock exchanges (NYSE, NASDAQ, and NSE) are randomly collected and used for the study. Each data set is split into training and test set. Ten-fold cross validation accuracy is used to evaluate the ML models on the training set. In addition, the ML models are evaluated on the test set using accuracy, precision, recall, F1-score, specificity, and area under receiver operating characteristics curve (AUC-ROC). Kendall W test of concordance is used to rank the performance of the tree-based ML algorithms. For the training set, the AdaBoost model performed better than the rest of the models. For the test set, accuracy, precision, F1-score, and AUC metrics generated results significant to rank the models, and the Extra Trees classifier outperformed the other models in all the rankings.


2020 ◽  
Vol 10 (21) ◽  
pp. 7741
Author(s):  
Sang Yeob Kim ◽  
Gyeong Hee Nam ◽  
Byeong Mun Heo

Metabolic syndrome (MS) is an aggregation of coexisting conditions that can indicate an individual’s high risk of major diseases, including cardiovascular disease, stroke, cancer, and type 2 diabetes. We conducted a cross-sectional survey to evaluate potential risk factor indicators by identifying relationships between MS and anthropometric and spirometric factors along with blood parameters among Korean adults. A total of 13,978 subjects were enrolled from the Korea National Health and Nutrition Examination Survey. Statistical analysis was performed using a complex sampling design to represent the entire Korean population. We conducted binary logistic regression analysis to evaluate and compare potential associations of all included factors. We constructed prediction models based on Naïve Bayes and logistic regression algorithms. The performance evaluation of the prediction model improved the accuracy with area under the curve (AUC) and calibration curve. Among all factors, triglyceride exhibited a strong association with MS in both men (odds ratio (OR) = 2.711, 95% confidence interval (CI) [2.328–3.158]) and women (OR = 3.515 [3.042–4.062]). Regarding anthropometric factors, the waist-to-height ratio demonstrated a strong association in men (OR = 1.511 [1.311–1.742]), whereas waist circumference was the strongest indicator in women (OR = 2.847 [2.447–3.313]). Forced expiratory volume in 6s and forced expiratory flow 25–75% strongly associated with MS in both men (OR = 0.822 [0.749–0.903]) and women (OR = 1.150 [1.060–1.246]). Wrapper-based logistic regression prediction model showed the highest predictive power in both men and women (AUC = 0.868 and 0.932, respectively). Our findings revealed that several factors were associated with MS and suggested the potential of employing machine learning models to support the diagnosis of MS.


Sign in / Sign up

Export Citation Format

Share Document