scholarly journals Investigation of Machine Learning Models and Different Feature Sets for the Efficiency of Early Sepsis Prediction from Highly Unbalanced Data

Author(s):  
Vytautas Abromavičius ◽  
Darius Plonis ◽  
Deividas Tarasevičius ◽  
Artūras Serackis

The presented research faces the problem of early detection of sepsis for patients in the Intensive Care Unit. The PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data. A labeled clinical records dataset for training and verification of the algorithms was provided by the challenge organizers. However, a relatively small number of records with sepsis, supported by Sepsis-3 clinical criteria, led to highly unbalanced dataset (only 2% records with sepsis label). A high number of unbalanced data records is a great challenge for machine learning model training and is not suitable for training classical classifiers. To address these issues, a number of various models were investigated. A solution including feature selection and data balancing techniques was proposed in this paper. In addition, several performance metrics were investigated. Results show, that for successful prediction, a particular model having few or more predictors based on the length of stay in the Intensive Care Unit should be applied.

Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1133
Author(s):  
Vytautas Abromavičius ◽  
Darius Plonis ◽  
Deividas Tarasevičius ◽  
Artūras Serackis

The presented research faces the problem of early detection of sepsis for patients in the Intensive Care Unit. The PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data. A labeled clinical records dataset for training and verification of the algorithms was provided by the challenge organizers. However, a relatively small number of records with sepsis, supported by Sepsis-3 clinical criteria, led to highly unbalanced dataset (only 2% records with sepsis label). A high number of unbalanced data records is a great challenge for machine learning model training and is not suitable for training classical classifiers. To address these issues, a method taking into the account the amount of time the patients spent in the intensive care unit (ICU) was proposed. The proposed method uses two separate ensemble models, one trained on patient records under 56 h in the ICU, and another for patients who stayed longer than 56 h. A solution including feature selection and weighting based training on imbalanced data was proposed in this paper. In addition, several performance metrics were investigated. Results show, that for successful prediction, a particular model having few or more predictors based on the length of stay in the Intensive Care Unit should be applied.


2018 ◽  
Vol 39 (3) ◽  
pp. 035004 ◽  
Author(s):  
Jooyoung Oh ◽  
Dongrae Cho ◽  
Jaesub Park ◽  
Se Hee Na ◽  
Jongin Kim ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bongjin Lee ◽  
Kyunghoon Kim ◽  
Hyejin Hwang ◽  
You Sun Kim ◽  
Eun Hee Chung ◽  
...  

AbstractThe aim of this study was to develop a predictive model of pediatric mortality in the early stages of intensive care unit (ICU) admission using machine learning. Patients less than 18 years old who were admitted to ICUs at four tertiary referral hospitals were enrolled. Three hospitals were designated as the derivation cohort for machine learning model development and internal validation, and the other hospital was designated as the validation cohort for external validation. We developed a random forest (RF) model that predicts pediatric mortality within 72 h of ICU admission, evaluated its performance, and compared it with the Pediatric Index of Mortality 3 (PIM 3). The area under the receiver operating characteristic curve (AUROC) of RF model was 0.942 (95% confidence interval [CI] = 0.912–0.972) in the derivation cohort and 0.906 (95% CI = 0.900–0.912) in the validation cohort. In contrast, the AUROC of PIM 3 was 0.892 (95% CI = 0.878–0.906) in the derivation cohort and 0.845 (95% CI = 0.817–0.873) in the validation cohort. The RF model in our study showed improved predictive performance in terms of both internal and external validation and was superior even when compared to PIM 3.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S162-S163
Author(s):  
Guillermo Rodriguez-Nava ◽  
Daniela Patricia Trelles-Garcia ◽  
Maria Adriana Yanez-Bello ◽  
Chul Won Chung ◽  
Sana Chaudry ◽  
...  

Abstract Background As the ongoing COVID-19 pandemic develops, there is a need for prediction rules to guide clinical decisions. Previous reports have identified risk factors using statistical inference model. The primary goal of these models is to characterize the relationship between variables and outcomes, not to make predictions. In contrast, the primary purpose of machine learning is obtaining a model that can make repeatable predictions. The objective of this study is to develop decision rules tailored to our patient population to predict ICU admissions and death in patients with COVID-19. Methods We used a de-identified dataset of hospitalized adults with COVID-19 admitted to our community hospital between March 2020 and June 2020. We used a Random Forest algorithm to build the prediction models for ICU admissions and death. Random Forest is one of the most powerful machine learning algorithms; it leverages the power of multiple decision trees, randomly created, for making decisions. Results 313 patients were included; 237 patients were used to train each model, 26 were used for testing, and 50 for validation. A total of 16 variables, selected according to their availability in the Emergency Department, were fit into the models. For the survival model, the combination of age >57 years, the presence of altered mental status, procalcitonin ≥3.0 ng/mL, a respiratory rate >22, and a blood urea nitrogen >32 mg/dL resulted in a decision rule with an accuracy of 98.7% in the training model, 73.1% in the testing model, and 70% in the validation model (Table 1, Figure 1). For the ICU admission model, the combination of age < 82 years, a systolic blood pressure of ≤94 mm Hg, oxygen saturation of ≤93%, a lactate dehydrogenase >591 IU/L, and a lactic acid >1.5 mmol/L resulted in a decision rule with an accuracy of 99.6% in the training model, 80.8% in the testing model, and 82% in the validation model (Table 2, Figure 2). Table 1. Measures of Performance in Predicting Inpatient Mortality Conclusion We created decision rules using machine learning to predict ICU admission or death in patients with COVID-19. Although there are variables previously described with statistical inference, these decision rules are customized to our patient population; furthermore, we can continue to train the models fitting more data with new patients to create even more accurate prediction rules. Figure 1. Receiver Operating Characteristic (ROC) Curve for Inpatient Mortality Table 2. Measures of Performance in Predicting Intensive Care Unit Admission Figure 2. Receiver Operating Characteristic (ROC) Curve for Intensive Care Unit Admission Disclosures All Authors: No reported disclosures


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Eyal Klang ◽  
Benjamin R. Kummer ◽  
Neha S. Dangayach ◽  
Amy Zhong ◽  
M. Arash Kia ◽  
...  

AbstractEarly admission to the neurosciences intensive care unit (NSICU) is associated with improved patient outcomes. Natural language processing offers new possibilities for mining free text in electronic health record data. We sought to develop a machine learning model using both tabular and free text data to identify patients requiring NSICU admission shortly after arrival to the emergency department (ED). We conducted a single-center, retrospective cohort study of adult patients at the Mount Sinai Hospital, an academic medical center in New York City. All patients presenting to our institutional ED between January 2014 and December 2018 were included. Structured (tabular) demographic, clinical, bed movement record data, and free text data from triage notes were extracted from our institutional data warehouse. A machine learning model was trained to predict likelihood of NSICU admission at 30 min from arrival to the ED. We identified 412,858 patients presenting to the ED over the study period, of whom 1900 (0.5%) were admitted to the NSICU. The daily median number of ED presentations was 231 (IQR 200–256) and the median time from ED presentation to the decision for NSICU admission was 169 min (IQR 80–324). A model trained only with text data had an area under the receiver-operating curve (AUC) of 0.90 (95% confidence interval (CI) 0.87–0.91). A structured data-only model had an AUC of 0.92 (95% CI 0.91–0.94). A combined model trained on structured and text data had an AUC of 0.93 (95% CI 0.92–0.95). At a false positive rate of 1:100 (99% specificity), the combined model was 58% sensitive for identifying NSICU admission. A machine learning model using structured and free text data can predict NSICU admission soon after ED arrival. This may potentially improve ED and NSICU resource allocation. Further studies should validate our findings.


2019 ◽  
Author(s):  
Longxiang Su ◽  
Chun Liu ◽  
Dongkai Li ◽  
Jie He ◽  
Fanglan Zheng ◽  
...  

BACKGROUND Heparin is one of the most commonly used medications in intensive care units. In clinical practice, the use of a weight-based heparin dosing nomogram is standard practice for the treatment of thrombosis. Recently, machine learning techniques have dramatically improved the ability of computers to provide clinical decision support and have allowed for the possibility of computer generated, algorithm-based heparin dosing recommendations. OBJECTIVE The objective of this study was to predict the effects of heparin treatment using machine learning methods to optimize heparin dosing in intensive care units based on the predictions. Patient state predictions were based upon activated partial thromboplastin time in 3 different ranges: subtherapeutic, normal therapeutic, and supratherapeutic, respectively. METHODS Retrospective data from 2 intensive care unit research databases (Multiparameter Intelligent Monitoring in Intensive Care III, MIMIC-III; e–Intensive Care Unit Collaborative Research Database, eICU) were used for the analysis. Candidate machine learning models (random forest, support vector machine, adaptive boosting, extreme gradient boosting, and shallow neural network) were compared in 3 patient groups to evaluate the classification performance for predicting the subtherapeutic, normal therapeutic, and supratherapeutic patient states. The model results were evaluated using precision, recall, F1 score, and accuracy. RESULTS Data from the MIMIC-III database (n=2789 patients) and from the eICU database (n=575 patients) were used. In 3-class classification, the shallow neural network algorithm performed the best (F1 scores of 87.26%, 85.98%, and 87.55% for data set 1, 2, and 3, respectively). The shallow neural network algorithm achieved the highest F1 scores within the patient therapeutic state groups: subtherapeutic (data set 1: 79.35%; data set 2: 83.67%; data set 3: 83.33%), normal therapeutic (data set 1: 93.15%; data set 2: 87.76%; data set 3: 84.62%), and supratherapeutic (data set 1: 88.00%; data set 2: 86.54%; data set 3: 95.45%) therapeutic ranges, respectively. CONCLUSIONS The most appropriate model for predicting the effects of heparin treatment was found by comparing multiple machine learning models and can be used to further guide optimal heparin dosing. Using multicenter intensive care unit data, our study demonstrates the feasibility of predicting the outcomes of heparin treatment using data-driven methods, and thus, how machine learning–based models can be used to optimize and personalize heparin dosing to improve patient safety. Manual analysis and validation suggested that the model outperformed standard practice heparin treatment dosing.


2020 ◽  
Author(s):  
Sujeong Hur ◽  
Ji Young Min ◽  
Junsang Yoo ◽  
Kyunga Kim ◽  
Chi Ryang Chung ◽  
...  

BACKGROUND Patient safety in the intensive care unit (ICU) is one of the most critical issues, and unplanned extubation (UE) is considered as the most adverse event for patient safety. Prevention and early detection of such an event is an essential but difficult component of quality care. OBJECTIVE This study aimed to develop and validate prediction models for UE in ICU patients using machine learning. METHODS This study was conducted an academic tertiary hospital in Seoul. The hospital had approximately 2,000 inpatient beds and 120 intensive care unit (ICU) beds. The number of patients, on daily basis, was approximately 9,000 for the out-patient. The number of annual ICU admission was approximately 10,000. We conducted a retrospective study between January 1, 2010 and December 31, 2018. A total of 6,914 extubation cases were included. We developed an unplanned extubation prediction model using machine learning algorithms, which included random forest (RF), logistic regression (LR), artificial neural network (ANN), and support vector machine (SVM). For evaluating the model’s performance, we used area under the receiver operator characteristic curve (AUROC). Sensitivity, specificity, positive predictive value negative predictive value, and F1-score were also determined for each model. For performance evaluation, we also used calibration curve, the Brier score, and the Hosmer-Lemeshow goodness-of-fit statistic. RESULTS Among the 6,914 extubation cases, 248 underwent UE. In the UE group, there were more males than females, higher use of physical restraints, and fewer surgeries. The incidence of UE was more likely to occur during the night shift compared to the planned extubation group. The rate of reintubation within 24 hours and hospital mortality was higher in the UE group. The UE prediction algorithm was developed, and the AUROC for RF was 0.787, for LR was 0.762, for ANN was 0.762, and for SVM was 0.740. CONCLUSIONS We successfully developed and validated machine learning-based prediction models to predict UE in ICU patients using electronic health record data. The best AUROC was 0.787, which was obtained using RF. CLINICALTRIAL N/A


Sign in / Sign up

Export Citation Format

Share Document