scholarly journals Development and validation of a novel blending machine learning model for hospital mortality prediction in ICU patients with Sepsis

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Zhixuan Zeng ◽  
Shuo Yao ◽  
Jianfei Zheng ◽  
Xun Gong

Abstract Background Early prediction of hospital mortality is crucial for ICU patients with sepsis. This study aimed to develop a novel blending machine learning (ML) model for hospital mortality prediction in ICU patients with sepsis. Methods Two ICU databases were employed: eICU Collaborative Research Database (eICU-CRD) and Medical Information Mart for Intensive Care III (MIMIC-III). All adult patients who fulfilled Sepsis-3 criteria were identified. Samples from eICU-CRD constituted training set and samples from MIMIC-III constituted test set. Stepwise logistic regression model was used for predictor selection. Blending ML model which integrated nine sorts of basic ML models was developed for hospital mortality prediction in ICU patients with sepsis. Model performance was evaluated by various measures related to discrimination or calibration. Results Twelve thousand five hundred fifty-eight patients from eICU-CRD were included as the training set, and 12,095 patients from MIMIC-III were included as the test set. Both the training set and the test set showed a hospital mortality of 17.9%. Maximum and minimum lactate, maximum and minimum albumin, minimum PaO2/FiO2 and age were important predictors identified by both random forest and extreme gradient boosting algorithm. Blending ML models based on corresponding set of predictors presented better discrimination than SAPS II (AUROC, 0.806 vs. 0.771; AUPRC 0.515 vs. 0.429) and SOFA (AUROC, 0.742 vs. 0.706; AUPRC 0.428 vs. 0.381) on the test set. In addition, calibration curves showed that blending ML models had better calibration than SAPS II. Conclusions The blending ML model is capable of integrating different sorts of basic ML models efficiently, and outperforms conventional severity scores in predicting hospital mortality among septic patients in ICU.

2021 ◽  
Vol 8 ◽  
Author(s):  
Yibing Zhu ◽  
Jin Zhang ◽  
Guowei Wang ◽  
Renqi Yao ◽  
Chao Ren ◽  
...  

Background: Mechanically ventilated patients in the intensive care unit (ICU) have high mortality rates. There are multiple prediction scores, such as the Simplified Acute Physiology Score II (SAPS II), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA), widely used in the general ICU population. We aimed to establish prediction scores on mechanically ventilated patients with the combination of these disease severity scores and other features available on the first day of admission.Methods: A retrospective administrative database study from the Medical Information Mart for Intensive Care (MIMIC-III) database was conducted. The exposures of interest consisted of the demographics, pre-ICU comorbidity, ICU diagnosis, disease severity scores, vital signs, and laboratory test results on the first day of ICU admission. Hospital mortality was used as the outcome. We used the machine learning methods of k-nearest neighbors (KNN), logistic regression, bagging, decision tree, random forest, Extreme Gradient Boosting (XGBoost), and neural network for model establishment. A sample of 70% of the cohort was used for the training set; the remaining 30% was applied for testing. Areas under the receiver operating characteristic curves (AUCs) and calibration plots would be constructed for the evaluation and comparison of the models' performance. The significance of the risk factors was identified through models and the top factors were reported.Results: A total of 28,530 subjects were enrolled through the screening of the MIMIC-III database. After data preprocessing, 25,659 adult patients with 66 predictors were included in the model analyses. With the training set, the models of KNN, logistic regression, decision tree, random forest, neural network, bagging, and XGBoost were established and the testing set obtained AUCs of 0.806, 0.818, 0.743, 0.819, 0.780, 0.803, and 0.821, respectively. The calibration curves of all the models, except for the neural network, performed well. The XGBoost model performed best among the seven models. The top five predictors were age, respiratory dysfunction, SAPS II score, maximum hemoglobin, and minimum lactate.Conclusion: The current study indicates that models with the risk of factors on the first day could be successfully established for predicting mortality in ventilated patients. The XGBoost model performs best among the seven machine learning models.


2021 ◽  
Vol 8 (1) ◽  
pp. e000761
Author(s):  
Hao Du ◽  
Kewin Tien Ho Siah ◽  
Valencia Zhang Ru-Yan ◽  
Readon Teh ◽  
Christopher Yu En Tan ◽  
...  

Research objectivesClostriodiodes difficile infection (CDI) is a major cause of healthcare-associated diarrhoea with high mortality. There is a lack of validated predictors for severe outcomes in CDI. The aim of this study is to derive and validate a clinical prediction tool for CDI in-hospital mortality using a large critical care database.MethodologyThe demographics, clinical parameters, laboratory results and mortality of CDI were extracted from the Medical Information Mart for Intensive Care-III (MIMIC-III) database. We subsequently trained three machine learning models: logistic regression (LR), random forest (RF) and gradient boosting machine (GBM) to predict in-hospital mortality. The individual performances of the models were compared against current severity scores (Clostridiodes difficile Associated Risk of Death Score (CARDS) and ATLAS (Age, Treatment with systemic antibiotics, leukocyte count, Albumin and Serum creatinine as a measure of renal function) by calculating area under receiver operating curve (AUROC). We identified factors associated with higher mortality risk in each model.Summary of resultsFrom 61 532 intensive care unit stays in the MIMIC-III database, there were 1315 CDI cases. The mortality rate for CDI in the study cohort was 18.33%. AUROC was 0.69 (95% CI, 0.60 to 0.76) for LR, 0.71 (95% CI, 0.62 to 0.77) for RF and 0.72 (95% CI, 0.64 to 0.78) for GBM, while previously AUROC was 0.57 (95% CI, 0.51 to 0.65) for CARDS and 0.63 (95% CI, 0.54 to 0.70) for ATLAS. Albumin, lactate and bicarbonate were significant mortality factors for all the models. Free calcium, potassium, white blood cell, urea, platelet and mean blood pressure were present in at least two of the three models.ConclusionOur machine learning derived CDI in-hospital mortality prediction model identified pertinent factors that can assist critical care clinicians in identifying patients at high risk of dying from CDI.


2020 ◽  
pp. bmjspcare-2020-002602 ◽  
Author(s):  
Prathamesh Parchure ◽  
Himanshu Joshi ◽  
Kavita Dharmarajan ◽  
Robert Freeman ◽  
David L Reich ◽  
...  

ObjectivesTo develop and validate a model for prediction of near-term in-hospital mortality among patients with COVID-19 by application of a machine learning (ML) algorithm on time-series inpatient data from electronic health records.MethodsA cohort comprised of 567 patients with COVID-19 at a large acute care healthcare system between 10 February 2020 and 7 April 2020 observed until either death or discharge. Random forest (RF) model was developed on randomly drawn 70% of the cohort (training set) and its performance was evaluated on the rest of 30% (the test set). The outcome variable was in-hospital mortality within 20–84 hours from the time of prediction. Input features included patients’ vital signs, laboratory data and ECG results.ResultsPatients had a median age of 60.2 years (IQR 26.2 years); 54.1% were men. In-hospital mortality rate was 17.0% and overall median time to death was 6.5 days (range 1.3–23.0 days). In the test set, the RF classifier yielded a sensitivity of 87.8% (95% CI: 78.2% to 94.3%), specificity of 60.6% (95% CI: 55.2% to 65.8%), accuracy of 65.5% (95% CI: 60.7% to 70.0%), area under the receiver operating characteristic curve of 85.5% (95% CI: 80.8% to 90.2%) and area under the precision recall curve of 64.4% (95% CI: 53.5% to 75.3%).ConclusionsOur ML-based approach can be used to analyse electronic health record data and reliably predict near-term mortality prediction. Using such a model in hospitals could help improve care, thereby better aligning clinical decisions with prognosis in critically ill patients with COVID-19.


2021 ◽  
Author(s):  
Koji Hosokawa ◽  
Nobuaki Shime

Abstract Background: The predictive value of disease severity scores for intensive care unit (ICU) patients is occasionally inaccurate because ICU patients with mild symptoms are also considered. We, thus, aimed to evaluate the accuracy of severity scores in predicting mortality of patients with complicated conditions admitted for > 24 hours. Methods: Overall, 35,353 adult patients using nationwide ICU data were divided into two groups: (1) overnight ICU stay after elective surgery and alive on discharge within 24 hours and (2) death within 24 hours or prolonged stay. The performance and accuracy of Sequential Organ Failure Assessment (SOFA), Acute Physiology and Chronic Health Evaluation (APACHE) II and III, and Simplified Acute Physiology Score (SAPS) II scores in predicting in-hospital mortality were evaluated. Results: In the overnight stay group, the correlation between SOFA and APACHE III scores or SAPS II was low because many had a SOFA score of 0. In the prolonged stay group, the predictive value of SAPS II and APACHE II and III showed high accuracy but that of SOFA was moderate. Conclusions: When overnight ICU stay patients were not included, the high predictive value for in-hospital mortality of SAPS II and APACHE II and III was evident.


2021 ◽  
Vol 23 (Supplement_G) ◽  
Author(s):  
Sara Paris ◽  
Riccardo Maria Inciardi ◽  
Claudia Specchia ◽  
Marika Vezzoli ◽  
Chiara Oriecuia ◽  
...  

Abstract Aims Several risk factors have been identified to predict worse outcomes in patients affected by SARS-CoV-2 infection. Prediction models are needed to optimize clinical management and to early stratify patients at a higher mortality risk. Machine learning (ML) algorithms represent a novel approach to identify a prediction model with a good discriminatory capacity to be easily used in clinical practice. Methods and results The Cardio-COVID is a multicentre observational study that involved a cohort of consecutive adult Caucasian patients with laboratory-confirmed COVID-19 [by real time reverse transcriptase—polymerase chain reaction (RT-PCR)] who were hospitalized in 13 Italian cardiology units from 1 March to 9 April 2020. Patients were followed-up after the COVID-19 diagnosis and all causes in-hospital mortality or discharge were ascertained until 23 April 2020. Variables with more than 20% of missing values were excluded. The Lasso procedure was used with a λ = 0.07 for reducing the covariates number. Mortality was estimated by means of a Random Forest (RF). The dataset was randomly divided in two subsamples with the same percentage of death/alive people of the entire sample: training set contained 80% of the data and test set the remaining 20%. The training set was used in the calibration procedure where a RF models in-hospital mortality with the covariates selected by Lasso. Its accuracy was measured by means of the ROC curve, obtaining AUC, sensitivity, specificity, and related 95% confidence interval (CI) computed with 10 000 stratified bootstrap replicates. From the RF the relative Variable Importance Measure (relVIM) was extracted to understand which of the selected variables had the greatest impact on outcome, providing a ranking from the most (relVIM = 100) to the less important variable. The model obtained was compared with the Gradient Boosting Machine (GBM) and with the logistic regression, where the predictions were cross validated. Finally, to understand if each model has the same performance in sample (training) and out of sample (test), the two AUCs were compared by means of the DeLong’s test. Among 701 patients enrolled (mean age 67.2 ± 13.2 years, 69.5% males), 165 (23.5%) died during a median hospitalization of 15 (IQR, 9–24) days. Variables selected by the Lasso were: age, Oxygen saturation, PaO2/FiO2, Creatinine Clearance and elevated Troponin. Compared with those who survived, deceased patients were older, had a lower blood oxygenation, a lower creatinine clearance levels and higher prevalence of elevated Troponin (all P < 0.001). Training set included 561 patients and test set 140 patients. The best performance out of sample was provided by the RF with an AUC of 0.78 (95% CI: 0.68–0.88) and a sensitivity of 0.88 (95% CI: 0.58–1.00). Moreover, RF is the unique methodology that provided similar performance in sample and out of sample (DeLong test P = 0.78). On the contrary, prediction model was less accurate by using GBM and logistic regression. The relVIM ranked the variables from the most to the less important in predicting the outcome as follows: clearance creatinine, PaO2/FiO2, age, oxygen saturation, and elevated Troponin. Conclusions In a large COVID-19 population, we showed that a customizable ML-based score derived from clinical variables, is feasible and effective for the prediction of in-hospital mortality.


2021 ◽  
Author(s):  
Yanrong Cai ◽  
Xiang Jiang ◽  
Weifan Dai ◽  
Qinyuan Yu

Abstract BackgroundFractures of pelvis and/or Acetabulum are leading risks of death worldwide. However, the capability of in-hospital mortality prediction by conventional system is so far limited. Here, we hypothesis that the use of machine learning (ML) algorithms could provide better performance of prediction than the traditional scoring system Simple Acute Physiologic Score (SAPS) II for patients with pelvic and acetabular trauma in intensive care unit (ICU).MethodsWe developed customized mortality prediction models with ML techniques based on MIMIC-III, an open access de-defined database consisting of data from more than 25,000 patients who were admitted to the Beth Israel Deaconess Medical Center (BIDMC). 307 patients were enrolled with an ICD-9 diagnosis of pelvic, acetabular or combined pelvic and acetabular fractures and who had an ICU stay more than 72 hours. ML models including decision tree, logistic regression and random forest were established by using the SAPS II features from the first 72 hours after ICU admission and the traditional first-24-hours features were used to build respective control models. We evaluated and made a comparison of each model’s performance through the area under the receiver-operating characteristic curve (AUROC). Feature importance method was used to visualize top risk factors for disease mortality.ResultsAll the ML models outperformed the traditional scoring system SAPS II (AUROC=0.73), among which the best fitted random forest model had the supreme performance (AUROC of 0.90). With the use of evolution of physiological features over time rather than 24-hours snapshots, all the ML models performed better than respective controls. Age remained the top of feature importance for all classifiers. Age, BUN (minimum value on day 2), and BUN (maximum value on day 3) were the top 3 predictor variables in the optimal random forest experiment model. In the best decision tree model, the top 3 risk factors, in decreasing order of contribution, were age, the lowest systolic blood pressure on day 1 and the same value on day 3.ConclusionThe results suggested that mortality modeling with ML techniques could aid in better performance of prediction for models in the context of pelvic and acetabular trauma and potentially support decision-making for orthopedics and ICU practitioners.


2020 ◽  
Author(s):  
Xie Wu ◽  
Zhanhao Su ◽  
Qipeng Luo ◽  
Yinan Li ◽  
Hongbai Wang ◽  
...  

Abstract Background: Identifying high-risk patients in intensive care unit (ICU) is very important because of the high mortality rate. Existing scoring systems are numerous but lack effective inflammatory markers. Our objective was to identify and evaluate a low-cost, easily accessible and effective inflammatory marker that can predict mortality in ICU patients.Methods: We conducted a retrospective study using data from the Medical Information Mart for Intensive Care III database. We first divided the patients into the survival group and the death group based on in-hospital mortality. Receiver operating characteristic analyses were performed to find the best inflammatory marker (i.e. neutrophil-to-lymphocyte ratio, NLR). We then re-divided the patients into three groups based on NLR levels. Univariate and multivariate logistic regression were performed to evaluate the association between NLR and mortality. The area under the curve (AUC), Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) were used to assess whether the incorporate of NLR can improve the predictive power of existing predictive model. Results: A total of 21,822 patients were included in this study, with an in-hospital mortality rate of 14.43%. Among all inflammatory marker in routine blood test results, NLR had the best predictive ability, with a median (interquartile range) NLR of 5.40 (2.95, 10.46) in the survival group and 8.32 (4.25, 14.75) in the death group. We then re-divided the patients into low (≤1), medium (1-6) and high (≥6) groups based on NLR levels. Compared with the median NLR group, the in-hospital mortality rates were significantly higher in the low (odds ratio [OR] = 2.09; 95% confidence interval [CI], 1.64 to 2.66) and high (OR=1.64; 95%CI, 1.50-1.80) NLR groups. The addition of NLR to Simplified Acute Physiology Score II (SAPS II) improved the AUC from 0.789 to 0.798 (P<0.001), with NRI of 16.64% (P<0.001) and IDI of 0.27% (P<0.001).Conclusion: NLR is a good predictor of mortality in ICU patients, both low and high levels of NLR are associated with elevated mortality rate. The inclusion of NLR might improve the predictive power of SAPS II.


2018 ◽  
Vol 7 (2.21) ◽  
pp. 339 ◽  
Author(s):  
K Ulaga Priya ◽  
S Pushpa ◽  
K Kalaivani ◽  
A Sartiha

In Banking Industry loan Processing is a tedious task in identifying the default customers. Manual prediction of default customers might turn into a bad loan in future. Banks possess huge volume of behavioral data from which they are unable to make a judgement about prediction of loan defaulters. Modern techniques like Machine Learning will help to do analytical processing using Supervised Learning and Unsupervised Learning Technique. A data model for predicting default customers using Random forest Technique has been proposed. Data model Evaluation is done on training set and based on the performance parameters final prediction is done on the Test set. This is an evident that Random Forest technique will help the bank to predict the loan Defaulters with utmost accuracy.  


Sign in / Sign up

Export Citation Format

Share Document