scholarly journals Inter operator variability of machine learning researchers predicting all-cause mortality in patients admitted to intensive care unit

2021 ◽  
Vol 2 (4) ◽  
Author(s):  
Y Jones ◽  
J Cleland ◽  
C Li ◽  
P Pellicori ◽  
J Friday

Abstract Background The number of publications using machine learning (ML) to predict cardiovascular outcomes and identify clusters of patients at greater risk has risen dramatically in recent years. However, research papers which use ML often fail to provide sufficient information about their algorithms to enable results to be replicated by others in the same or different datasets. Aim To test the reproducibility of results from ML algorithms given three different levels of information commonly found in publications: model type alone, a description of the model, and complete algorithm. Methods MIMIC-III is a healthcare dataset comprising detailed information from over 60,000 intensive care unit (ICU) admissions from the Beth Israel Deaconess Medical Centre between 2001 and 2012. Access is available to everyone pending approval and completion of a short training course. Using this dataset, three models for predicting all-cause in-hospital mortality were created, two from a PhD student working in ML, and one from an existing research paper which used the same dataset and provided complete model information. A second researcher (a PhD student in ML and cardiology) was given the same dataset and was tasked with reproducing their results. Initially, this second researcher was told what type of model was created in each case, followed by a brief description of the algorithms. Finally, the complete algorithms from each participant were provided. In all three scenarios, recreated models were compared to original models using Area Under the Receiver Operating Characteristic Curve (AUC). Results After excluding those younger than 18 years and events with missing or invalid entries, 21,139 ICU admissions remained from 18,094 patients between 2001 and 2012, including 2,797 in-hospital deaths. Three models were produced: two Recurrent Neural Networks (RNNs) which differed significantly in internal weights and variables, and a Boosted Tree Classifier (BTC). The AUC of the first reproduced RNN matched that of the original RNN (Figure 1), however the second RNN and the BTC could not be reproduced given model type alone. As more information was provided about these algorithms, the results from the reproduced models matched the original results more closely. Conclusions In order to create clinically useful ML tools with results that are reproducible and consistent, it is vital that researchers share enough detail about their models. Model type alone is not enough to guarantee reproducibility. Although some models can be recreated with limited information, this is not always the case, and the best results are found when the complete algorithm is shared. These findings have huge relevance when trying to apply ML in clinical practice. Funding Acknowledgement Type of funding sources: None.

2021 ◽  
Author(s):  
Stefan Hegselmann ◽  
Christian Ertmer ◽  
Thomas Volkert ◽  
Antje Gottschalk ◽  
Martin Dugas ◽  
...  

Intensive care unit readmissions are associated with mortality and bad outcomes. Machine learning could help to identify patients at risk to improve discharge decisions. However, many models are black boxes, so that dangerous properties might remain unnoticed. In this study, an inherently interpretable model for 3-day ICU readmission prediction was developed. We used a retrospective cohort of 15,589 ICU stays and 169 variables collected between 2006 and 2019. A team of doctors inspected the model, checked the plausibility of each component, and removed problematic parts. Qualitative feedback revealed several challenges for interpretable machine learning in healthcare. The resulting model used 67 features and showed an area under the precision-recall curve of 0.119+/-0.020 and an area under the receiver operating characteristic curve of 0.680+/-0.025. This is on par with state-of-the-art gradient boosting machines and outperforms the Simplified Acute Physiology Score II. External validation with the Medical Information Mart for Intensive Care database version IV confirmed our findings. Hence, a machine learning model for readmission prediction with a high level of human control is feasible without sacrificing performance.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bongjin Lee ◽  
Kyunghoon Kim ◽  
Hyejin Hwang ◽  
You Sun Kim ◽  
Eun Hee Chung ◽  
...  

AbstractThe aim of this study was to develop a predictive model of pediatric mortality in the early stages of intensive care unit (ICU) admission using machine learning. Patients less than 18 years old who were admitted to ICUs at four tertiary referral hospitals were enrolled. Three hospitals were designated as the derivation cohort for machine learning model development and internal validation, and the other hospital was designated as the validation cohort for external validation. We developed a random forest (RF) model that predicts pediatric mortality within 72 h of ICU admission, evaluated its performance, and compared it with the Pediatric Index of Mortality 3 (PIM 3). The area under the receiver operating characteristic curve (AUROC) of RF model was 0.942 (95% confidence interval [CI] = 0.912–0.972) in the derivation cohort and 0.906 (95% CI = 0.900–0.912) in the validation cohort. In contrast, the AUROC of PIM 3 was 0.892 (95% CI = 0.878–0.906) in the derivation cohort and 0.845 (95% CI = 0.817–0.873) in the validation cohort. The RF model in our study showed improved predictive performance in terms of both internal and external validation and was superior even when compared to PIM 3.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S162-S163
Author(s):  
Guillermo Rodriguez-Nava ◽  
Daniela Patricia Trelles-Garcia ◽  
Maria Adriana Yanez-Bello ◽  
Chul Won Chung ◽  
Sana Chaudry ◽  
...  

Abstract Background As the ongoing COVID-19 pandemic develops, there is a need for prediction rules to guide clinical decisions. Previous reports have identified risk factors using statistical inference model. The primary goal of these models is to characterize the relationship between variables and outcomes, not to make predictions. In contrast, the primary purpose of machine learning is obtaining a model that can make repeatable predictions. The objective of this study is to develop decision rules tailored to our patient population to predict ICU admissions and death in patients with COVID-19. Methods We used a de-identified dataset of hospitalized adults with COVID-19 admitted to our community hospital between March 2020 and June 2020. We used a Random Forest algorithm to build the prediction models for ICU admissions and death. Random Forest is one of the most powerful machine learning algorithms; it leverages the power of multiple decision trees, randomly created, for making decisions. Results 313 patients were included; 237 patients were used to train each model, 26 were used for testing, and 50 for validation. A total of 16 variables, selected according to their availability in the Emergency Department, were fit into the models. For the survival model, the combination of age >57 years, the presence of altered mental status, procalcitonin ≥3.0 ng/mL, a respiratory rate >22, and a blood urea nitrogen >32 mg/dL resulted in a decision rule with an accuracy of 98.7% in the training model, 73.1% in the testing model, and 70% in the validation model (Table 1, Figure 1). For the ICU admission model, the combination of age < 82 years, a systolic blood pressure of ≤94 mm Hg, oxygen saturation of ≤93%, a lactate dehydrogenase >591 IU/L, and a lactic acid >1.5 mmol/L resulted in a decision rule with an accuracy of 99.6% in the training model, 80.8% in the testing model, and 82% in the validation model (Table 2, Figure 2). Table 1. Measures of Performance in Predicting Inpatient Mortality Conclusion We created decision rules using machine learning to predict ICU admission or death in patients with COVID-19. Although there are variables previously described with statistical inference, these decision rules are customized to our patient population; furthermore, we can continue to train the models fitting more data with new patients to create even more accurate prediction rules. Figure 1. Receiver Operating Characteristic (ROC) Curve for Inpatient Mortality Table 2. Measures of Performance in Predicting Intensive Care Unit Admission Figure 2. Receiver Operating Characteristic (ROC) Curve for Intensive Care Unit Admission Disclosures All Authors: No reported disclosures


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Eyal Klang ◽  
Benjamin R. Kummer ◽  
Neha S. Dangayach ◽  
Amy Zhong ◽  
M. Arash Kia ◽  
...  

AbstractEarly admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200–256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80–324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87–0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91–0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92–0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.


2016 ◽  
Vol 45 (6) ◽  
pp. 241
Author(s):  
Mia R A ◽  
Risa Etika ◽  
Agus Harianto ◽  
Fatimah Indarso ◽  
Sylviati M Damanik

Background Scoring systems which quantify initial risks have animportant role in aiding execution of optimum health services by pre-dicting morbidity and mortality. One of these is the score for neonatalacute physiology perinatal extention (SNAPPE), developed byRichardson in 1993 and simplified in 2001. It is derived of 6 variablesfrom the physical and laboratory observation within the first 12 hoursof admission, and 3 variables of perinatal risks of mortality.Objectives To assess the validity of SNAPPE II in predicting mor-tality at neonatal intensive care unit (NICU), Soetomo Hospital,Surabaya. The study was also undertaken to evolve the best cut-offscore for predicting mortality.Methods Eighty newborns were admitted during a four-month periodand were evaluated with the investigations as required for the specifi-cations of SNAPPE II. Neonates admitted >48 hours of age or afterhaving been discharged, who were moved to lower newborn care <24hours and those who were discharged on request were excluded. Re-ceiver operating characteristic curve (ROC) were constructed to derivethe best cut-off score with Kappa and McNemar Test.Results Twenty eight (35%) neonates died during the study, 22(82%) of them died within the first six days. The mean SNAPPE IIscore was 26.3+19.84 (range 0-81). SNAPPE II score of thenonsurvivors was significantly higher than the survivors(42.75+18.59 vs 17.4+14.05; P=0.0001). SNAPPE II had a goodperformance in predicting overall mortality and the first-6-daysmortality, with area under the ROC 0.863 and 0.889. The best cut-off score for predicting mortality was 30 with sensitivity 81.8%,specificity 76.9%, positive predictive value 60.0% and negativepredictive value 90.0%.Conclusions SNAPPE II is a measurement of illness severity whichcorrelates well with neonatal mortality at NICU, Soetomo Hospital.The score of more than 30 is associated with higher mortality


2019 ◽  
Author(s):  
Longxiang Su ◽  
Chun Liu ◽  
Dongkai Li ◽  
Jie He ◽  
Fanglan Zheng ◽  
...  

BACKGROUND Heparin is one of the most commonly used medications in intensive care units. In clinical practice, the use of a weight-based heparin dosing nomogram is standard practice for the treatment of thrombosis. Recently, machine learning techniques have dramatically improved the ability of computers to provide clinical decision support and have allowed for the possibility of computer generated, algorithm-based heparin dosing recommendations. OBJECTIVE The objective of this study was to predict the effects of heparin treatment using machine learning methods to optimize heparin dosing in intensive care units based on the predictions. Patient state predictions were based upon activated partial thromboplastin time in 3 different ranges: subtherapeutic, normal therapeutic, and supratherapeutic, respectively. METHODS Retrospective data from 2 intensive care unit research databases (Multiparameter Intelligent Monitoring in Intensive Care III, MIMIC-III; e–Intensive Care Unit Collaborative Research Database, eICU) were used for the analysis. Candidate machine learning models (random forest, support vector machine, adaptive boosting, extreme gradient boosting, and shallow neural network) were compared in 3 patient groups to evaluate the classification performance for predicting the subtherapeutic, normal therapeutic, and supratherapeutic patient states. The model results were evaluated using precision, recall, F1 score, and accuracy. RESULTS Data from the MIMIC-III database (n=2789 patients) and from the eICU database (n=575 patients) were used. In 3-class classification, the shallow neural network algorithm performed the best (F1 scores of 87.26%, 85.98%, and 87.55% for data set 1, 2, and 3, respectively). The shallow neural network algorithm achieved the highest F1 scores within the patient therapeutic state groups: subtherapeutic (data set 1: 79.35%; data set 2: 83.67%; data set 3: 83.33%), normal therapeutic (data set 1: 93.15%; data set 2: 87.76%; data set 3: 84.62%), and supratherapeutic (data set 1: 88.00%; data set 2: 86.54%; data set 3: 95.45%) therapeutic ranges, respectively. CONCLUSIONS The most appropriate model for predicting the effects of heparin treatment was found by comparing multiple machine learning models and can be used to further guide optimal heparin dosing. Using multicenter intensive care unit data, our study demonstrates the feasibility of predicting the outcomes of heparin treatment using data-driven methods, and thus, how machine learning–based models can be used to optimize and personalize heparin dosing to improve patient safety. Manual analysis and validation suggested that the model outperformed standard practice heparin treatment dosing.


BMJ Open ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. e051468
Author(s):  
David van Klaveren ◽  
Alexandros Rekkas ◽  
Jelmer Alsma ◽  
Rob J C G Verdonschot ◽  
Dick T J J Koning ◽  
...  

ObjectivesDevelop simple and valid models for predicting mortality and need for intensive care unit (ICU) admission in patients who present at the emergency department (ED) with suspected COVID-19.DesignRetrospective.SettingSecondary care in four large Dutch hospitals.ParticipantsPatients who presented at the ED and were admitted to hospital with suspected COVID-19. We used 5831 first-wave patients who presented between March and August 2020 for model development and 3252 second-wave patients who presented between September and December 2020 for model validation.Outcome measuresWe developed separate logistic regression models for in-hospital death and for need for ICU admission, both within 28 days after hospital admission. Based on prior literature, we considered quickly and objectively obtainable patient characteristics, vital parameters and blood test values as predictors. We assessed model performance by the area under the receiver operating characteristic curve (AUC) and by calibration plots.ResultsOf 5831 first-wave patients, 629 (10.8%) died within 28 days after admission. ICU admission was fully recorded for 2633 first-wave patients in 2 hospitals, with 214 (8.1%) ICU admissions within 28 days. A simple model—COVID outcome prediction in the emergency department (COPE)—with age, respiratory rate, C reactive protein, lactate dehydrogenase, albumin and urea captured most of the ability to predict death. COPE was well calibrated and showed good discrimination for mortality in second-wave patients (AUC in four hospitals: 0.82 (95% CI 0.78 to 0.86); 0.82 (95% CI 0.74 to 0.90); 0.79 (95% CI 0.70 to 0.88); 0.83 (95% CI 0.79 to 0.86)). COPE was also able to identify patients at high risk of needing ICU admission in second-wave patients (AUC in two hospitals: 0.84 (95% CI 0.78 to 0.90); 0.81 (95% CI 0.66 to 0.95)).ConclusionsCOPE is a simple tool that is well able to predict mortality and need for ICU admission in patients who present to the ED with suspected COVID-19 and may help patients and doctors in decision making.


Author(s):  
Shifa Nismath ◽  
Suchetha S. Rao ◽  
B. S. Baliga ◽  
Vaman Kulkarni ◽  
Gayatri M. Rao

Abstract Background Predicting morbidity and mortality in a pediatric intensive care unit (PICU) is of extreme importance to make precise decisions for better outcomes. Aim We compared the urine albumin creatinine ratio (ACR) with the established PICU score, pediatric index of mortality 2 (PIM 2) for predicting PICU outcomes. Methods This cross-sectional study enrolled 67 patients admitted to PICU with systemic inflammatory response syndrome. Urine ACR was estimated on admission, and PIM 2 score was calculated. ACR was compared with PIM 2 for PICU outcome measures: the need for inotropes, development of multiple organ dysfunction syndrome (MODS), duration of PICU stay, and survival. Results Microalbuminuria was found in 77.6% of patients with a median ACR of 80 mg/g. ACR showed a significant association with the need for inotropes (p < 0.001), MODS (p = 0.001), and significant correlation to PICU stay (p 0.001, rho = 0.361). The area under the receiver operating characteristic curve for ACR (0.798) was comparable to that of PIM 2 (0.896). The cutoff value of ACR derived to predict mortality was 110 mg/g. The study subjects were divided into 2 groups: below cutoff and above the cutoff. Outcome variables, inotrope use, MODS, mortality, and PICU stay compared between these subgroups, were statistically significant. Conclusion ACR is a good predictor of PICU outcomes and is comparable to PIM 2 for mortality prediction.


Sign in / Sign up

Export Citation Format

Share Document