scholarly journals Development and validation of a machine learning-based prediction model for near-term in-hospital mortality among patients with COVID-19

2020 ◽  
pp. bmjspcare-2020-002602 ◽  
Author(s):  
Prathamesh Parchure ◽  
Himanshu Joshi ◽  
Kavita Dharmarajan ◽  
Robert Freeman ◽  
David L Reich ◽  
...  

ObjectivesTo develop and validate a model for prediction of near-term in-hospital mortality among patients with COVID-19 by application of a machine learning (ML) algorithm on time-series inpatient data from electronic health records.MethodsA cohort comprised of 567 patients with COVID-19 at a large acute care healthcare system between 10 February 2020 and 7 April 2020 observed until either death or discharge. Random forest (RF) model was developed on randomly drawn 70% of the cohort (training set) and its performance was evaluated on the rest of 30% (the test set). The outcome variable was in-hospital mortality within 20–84 hours from the time of prediction. Input features included patients’ vital signs, laboratory data and ECG results.ResultsPatients had a median age of 60.2 years (IQR 26.2 years); 54.1% were men. In-hospital mortality rate was 17.0% and overall median time to death was 6.5 days (range 1.3–23.0 days). In the test set, the RF classifier yielded a sensitivity of 87.8% (95% CI: 78.2% to 94.3%), specificity of 60.6% (95% CI: 55.2% to 65.8%), accuracy of 65.5% (95% CI: 60.7% to 70.0%), area under the receiver operating characteristic curve of 85.5% (95% CI: 80.8% to 90.2%) and area under the precision recall curve of 64.4% (95% CI: 53.5% to 75.3%).ConclusionsOur ML-based approach can be used to analyse electronic health record data and reliably predict near-term mortality prediction. Using such a model in hospitals could help improve care, thereby better aligning clinical decisions with prognosis in critically ill patients with COVID-19.

2021 ◽  
Vol 11 ◽  
Author(s):  
Ximing Nie ◽  
Yuan Cai ◽  
Jingyi Liu ◽  
Xiran Liu ◽  
Jiahui Zhao ◽  
...  

Objectives: This study aims to investigate whether the machine learning algorithms could provide an optimal early mortality prediction method compared with other scoring systems for patients with cerebral hemorrhage in intensive care units in clinical practice.Methods: Between 2008 and 2012, from Intensive Care III (MIMIC-III) database, all cerebral hemorrhage patients monitored with the MetaVision system and admitted to intensive care units were enrolled in this study. The calibration, discrimination, and risk classification of predicted hospital mortality based on machine learning algorithms were assessed. The primary outcome was hospital mortality. Model performance was assessed with accuracy and receiver operating characteristic curve analysis.Results: Of 760 cerebral hemorrhage patients enrolled from MIMIC database [mean age, 68.2 years (SD, ±15.5)], 383 (50.4%) patients died in hospital, and 377 (49.6%) patients survived. The area under the receiver operating characteristic curve (AUC) of six machine learning algorithms was 0.600 (nearest neighbors), 0.617 (decision tree), 0.655 (neural net), 0.671(AdaBoost), 0.819 (random forest), and 0.725 (gcForest). The AUC was 0.423 for Acute Physiology and Chronic Health Evaluation II score. The random forest had the highest specificity and accuracy, as well as the greatest AUC, showing the best ability to predict in-hospital mortality.Conclusions: Compared with conventional scoring system and the other five machine learning algorithms in this study, random forest algorithm had better performance in predicting in-hospital mortality for cerebral hemorrhage patients in intensive care units, and thus further research should be conducted on random forest algorithm.


2020 ◽  
Vol 10 (5) ◽  
pp. 998-1004
Author(s):  
Binhua Wang ◽  
Xiao Ma ◽  
Yifei Wang ◽  
Wei Dong ◽  
Chengyu Liu ◽  
...  

An improved bagging algorithm, combined with a resample strategy, a neural network, and a support vector machine (SVM), is proposed for in-hospital mortality prediction using imbalanced data with very uneven ratio of positive and negative samples. This approach was compared with other machine learning algorithms such as SVM, neural network and GBDT to evaluate its effectiveness. Permutation importance algorithm was employed to assess risk factors for heart failure patients and experimental validation was conducted using medical data from the Chinese PLA General Hospital which consisted of 207 positive and 5975 negative samples, achieving area under curve (AUC), sensitivity, and specificity values of 0.850, 0.800, and 0.752, respectively. The top 5 risk factors extracted are creatinine, serum albumin, lactate dehydrogenase, platelet count, and lymphocytes. These results suggest that the proposed method has the potential to be a valuable new tool for in-hospital mortality prediction using electronic health record data.


2015 ◽  
Vol 23 (3) ◽  
pp. 553-561 ◽  
Author(s):  
Xiongcai Cai ◽  
Oscar Perez-Concha ◽  
Enrico Coiera ◽  
Fernando Martin-Sanchez ◽  
Richard Day ◽  
...  

Objective To develop a predictive model for real-time predictions of length of stay, mortality, and readmission for hospitalized patients using electronic health records (EHRs). Materials and Methods A Bayesian Network model was built to estimate the probability of a hospitalized patient being “at home,” in the hospital, or dead for each of the next 7 days. The network utilizes patient-specific administrative and laboratory data and is updated each time a new pathology test result becomes available. Electronic health records from 32 634 patients admitted to a Sydney metropolitan hospital via the emergency department from July 2008 through December 2011 were used. The model was tested on 2011 data and trained on the data of earlier years. Results The model achieved an average daily accuracy of 80% and area under the receiving operating characteristic curve (AUROC) of 0.82. The model’s predictive ability was highest within 24 hours from prediction (AUROC = 0.83) and decreased slightly with time. Death was the most predictable outcome with a daily average accuracy of 93% and AUROC of 0.84. Discussion We developed the first non–disease-specific model that simultaneously predicts remaining days of hospitalization, death, and readmission as part of the same outcome. By providing a future daily probability for each outcome class, we enable the visualization of future patient trajectories. Among these, it is possible to identify trajectories indicating expected discharge, expected continuing hospitalization, expected death, and possible readmission. Conclusions Bayesian Networks can model EHRs to provide real-time forecasts for patient outcomes, which provide richer information than traditional independent point predictions of length of stay, death, or readmission, and can thus better support decision making.


2020 ◽  
Author(s):  
Akhil Vaid ◽  
Suraj K Jaladanki ◽  
Jie Xu ◽  
Shelly Teng ◽  
Arvind Kumar ◽  
...  

BACKGROUND Machine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability. OBJECTIVE We aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days. METHODS Patient data were collected from the electronic health records of 5 hospitals within the Mount Sinai Health System. Logistic regression with L1 regularization/least absolute shrinkage and selection operator (LASSO) and multilayer perceptron (MLP) models were trained by using local data at each site. We developed a pooled model with combined data from all 5 sites, and a federated model that only shared parameters with a central aggregator. RESULTS The LASSO<sub>federated</sub> model outperformed the LASSO<sub>local</sub> model at 3 hospitals, and the MLP<sub>federated</sub> model performed better than the MLP<sub>local</sub> model at all 5 hospitals, as determined by the area under the receiver operating characteristic curve. The LASSO<sub>pooled</sub> model outperformed the LASSO<sub>federated</sub> model at all hospitals, and the MLP<sub>federated</sub> model outperformed the MLP<sub>pooled</sub> model at 2 hospitals. CONCLUSIONS The federated learning of COVID-19 electronic health record data shows promise in developing robust predictive models without compromising patient privacy.


2021 ◽  
pp. 1-13
Author(s):  
Tingyi Wanyan ◽  
Hossein Honarvar ◽  
Ariful Azad ◽  
Ying Ding ◽  
Benjamin S. Glicksberg

Abstract Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and varied in their structure and components, continued innovation of modelling strategies is required to identify architectures that can best model outcomes. In this work, we train a Heterogeneous Graph Model (HGM) on Electronic Health Record data and use the resulting embedding vector as additional information added to a Convolutional Neural Network (CNN) model for predicting in-hospital mortality. We show that the additional information provided by including time as a vector in the embedding captures the relationships between medical concepts, lab tests, and diagnoses, which enhances predictive performance. We find that adding HGM to a CNN model increases the mortality prediction accuracy up to 4%. This framework serves as a foundation for future experiments involving different EHR data types on important healthcare prediction tasks.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Zhixuan Zeng ◽  
Shuo Yao ◽  
Jianfei Zheng ◽  
Xun Gong

Abstract Background Early prediction of hospital mortality is crucial for ICU patients with sepsis. This study aimed to develop a novel blending machine learning (ML) model for hospital mortality prediction in ICU patients with sepsis. Methods Two ICU databases were employed: eICU Collaborative Research Database (eICU-CRD) and Medical Information Mart for Intensive Care III (MIMIC-III). All adult patients who fulfilled Sepsis-3 criteria were identified. Samples from eICU-CRD constituted training set and samples from MIMIC-III constituted test set. Stepwise logistic regression model was used for predictor selection. Blending ML model which integrated nine sorts of basic ML models was developed for hospital mortality prediction in ICU patients with sepsis. Model performance was evaluated by various measures related to discrimination or calibration. Results Twelve thousand five hundred fifty-eight patients from eICU-CRD were included as the training set, and 12,095 patients from MIMIC-III were included as the test set. Both the training set and the test set showed a hospital mortality of 17.9%. Maximum and minimum lactate, maximum and minimum albumin, minimum PaO2/FiO2 and age were important predictors identified by both random forest and extreme gradient boosting algorithm. Blending ML models based on corresponding set of predictors presented better discrimination than SAPS II (AUROC, 0.806 vs. 0.771; AUPRC 0.515 vs. 0.429) and SOFA (AUROC, 0.742 vs. 0.706; AUPRC 0.428 vs. 0.381) on the test set. In addition, calibration curves showed that blending ML models had better calibration than SAPS II. Conclusions The blending ML model is capable of integrating different sorts of basic ML models efficiently, and outperforms conventional severity scores in predicting hospital mortality among septic patients in ICU.


Author(s):  
Jeffrey G Klann ◽  
Griffin M Weber ◽  
Hossein Estiri ◽  
Bertrand Moal ◽  
Paul Avillach ◽  
...  

Abstract Introduction The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data. Objective We sought to develop and validate a computable phenotype for COVID-19 severity. Methods Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions. Conclusion We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.


2020 ◽  
Author(s):  
Jun Ke ◽  
Yiwei Chen ◽  
Xiaoping Wang ◽  
Zhiyong Wu ◽  
qiongyao Zhang ◽  
...  

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.


Author(s):  
Emily Kogan ◽  
Kathryn Twyman ◽  
Jesse Heap ◽  
Dejan Milentijevic ◽  
Jennifer H. Lin ◽  
...  

Abstract Background Stroke severity is an important predictor of patient outcomes and is commonly measured with the National Institutes of Health Stroke Scale (NIHSS) scores. Because these scores are often recorded as free text in physician reports, structured real-world evidence databases seldom include the severity. The aim of this study was to use machine learning models to impute NIHSS scores for all patients with newly diagnosed stroke from multi-institution electronic health record (EHR) data. Methods NIHSS scores available in the Optum© de-identified Integrated Claims-Clinical dataset were extracted from physician notes by applying natural language processing (NLP) methods. The cohort analyzed in the study consists of the 7149 patients with an inpatient or emergency room diagnosis of ischemic stroke, hemorrhagic stroke, or transient ischemic attack and a corresponding NLP-extracted NIHSS score. A subset of these patients (n = 1033, 14%) were held out for independent validation of model performance and the remaining patients (n = 6116, 86%) were used for training the model. Several machine learning models were evaluated, and parameters optimized using cross-validation on the training set. The model with optimal performance, a random forest model, was ultimately evaluated on the holdout set. Results Leveraging machine learning we identified the main factors in electronic health record data for assessing stroke severity, including death within the same month as stroke occurrence, length of hospital stay following stroke occurrence, aphagia/dysphagia diagnosis, hemiplegia diagnosis, and whether a patient was discharged to home or self-care. Comparing the imputed NIHSS scores to the NLP-extracted NIHSS scores on the holdout data set yielded an R2 (coefficient of determination) of 0.57, an R (Pearson correlation coefficient) of 0.76, and a root-mean-squared error of 4.5. Conclusions Machine learning models built on EHR data can be used to determine proxies for stroke severity. This enables severity to be incorporated in studies of stroke patient outcomes using administrative and EHR databases.


Sign in / Sign up

Export Citation Format

Share Document