scholarly journals Machine Learning Identifies ICU Outcome Predictors in a Multicenter COVID-19 Cohort.

Author(s):  
Harry Magunia ◽  
Simone Lederer ◽  
Raphael Verbuecheln ◽  
Bryant Joseph Gilot ◽  
Michael Koeppen ◽  
...  

Abstract Background. Intensive Care Resources are heavily utilized during the COVID-19 pandemic. However, risk stratification and prediction of SARS-CoV-2 patient clinical outcomes upon ICU admission remain inadequate. This study aimed to develop a machine learning model, based on retrospective & prospective clinical data, to stratify patient risk and predict ICU survival and outcomes.Methods. A Germany-wide electronic registry was established to pseudonymously collect admission, therapeutic and discharge information of SARS-CoV-2 ICU patients retrospectively and prospectively. Machine learning approaches were evaluated for the accuracy and interpretability of predictions. The Explainable Boosting Machine approach was selected as the most suitable method. Individual, non-linear shape functions for predictive parameters and parameter interactions are reported. Results. 1,039 patients were included in the Explainable Boosting Machine model, 596 patients retrospectively collected, and 443 patients prospectively collected. The model for prediction of general ICU outcome was shown to be more reliable to predict “survival”. Age, inflammatory and thrombotic activity, and severity of ARDS at ICU admission were shown to be predictive of ICU survival. Patients’ age, pulmonary dysfunction and transfer from an external institution were predictors for ECMO therapy. The interaction of patient age with D-dimer levels on admission and creatinine levels with SOFA score without GCS were predictors for renal replacement therapy. Conclusions. Using Explainable Boosting Machine analysis, we confirmed and weighed previously reported and identified novel predictors for outcome in critically ill COVID-19 patients. Using this strategy, predictive modeling of COVID-19 ICU patient outcomes can be performed overcoming the limitations of linear regression models.Trial registration. “ClinicalTrials” (clinicaltrials.gov) under NCT04455451

Critical Care ◽  
2021 ◽  
Vol 25 (1) ◽  
Author(s):  
Harry Magunia ◽  
Simone Lederer ◽  
Raphael Verbuecheln ◽  
Bryant Joseph Gilot ◽  
Michael Koeppen ◽  
...  

Abstract Background Intensive Care Resources are heavily utilized during the COVID-19 pandemic. However, risk stratification and prediction of SARS-CoV-2 patient clinical outcomes upon ICU admission remain inadequate. This study aimed to develop a machine learning model, based on retrospective & prospective clinical data, to stratify patient risk and predict ICU survival and outcomes. Methods A Germany-wide electronic registry was established to pseudonymously collect admission, therapeutic and discharge information of SARS-CoV-2 ICU patients retrospectively and prospectively. Machine learning approaches were evaluated for the accuracy and interpretability of predictions. The Explainable Boosting Machine approach was selected as the most suitable method. Individual, non-linear shape functions for predictive parameters and parameter interactions are reported. Results 1039 patients were included in the Explainable Boosting Machine model, 596 patients retrospectively collected, and 443 patients prospectively collected. The model for prediction of general ICU outcome was shown to be more reliable to predict “survival”. Age, inflammatory and thrombotic activity, and severity of ARDS at ICU admission were shown to be predictive of ICU survival. Patients’ age, pulmonary dysfunction and transfer from an external institution were predictors for ECMO therapy. The interaction of patient age with D-dimer levels on admission and creatinine levels with SOFA score without GCS were predictors for renal replacement therapy. Conclusions Using Explainable Boosting Machine analysis, we confirmed and weighed previously reported and identified novel predictors for outcome in critically ill COVID-19 patients. Using this strategy, predictive modeling of COVID-19 ICU patient outcomes can be performed overcoming the limitations of linear regression models. Trial registration “ClinicalTrials” (clinicaltrials.gov) under NCT04455451.


Author(s):  
Jeffrey G Klann ◽  
Griffin M Weber ◽  
Hossein Estiri ◽  
Bertrand Moal ◽  
Paul Avillach ◽  
...  

Abstract Introduction The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data. Objective We sought to develop and validate a computable phenotype for COVID-19 severity. Methods Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions. Conclusion We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bongjin Lee ◽  
Kyunghoon Kim ◽  
Hyejin Hwang ◽  
You Sun Kim ◽  
Eun Hee Chung ◽  
...  

AbstractThe aim of this study was to develop a predictive model of pediatric mortality in the early stages of intensive care unit (ICU) admission using machine learning. Patients less than 18 years old who were admitted to ICUs at four tertiary referral hospitals were enrolled. Three hospitals were designated as the derivation cohort for machine learning model development and internal validation, and the other hospital was designated as the validation cohort for external validation. We developed a random forest (RF) model that predicts pediatric mortality within 72 h of ICU admission, evaluated its performance, and compared it with the Pediatric Index of Mortality 3 (PIM 3). The area under the receiver operating characteristic curve (AUROC) of RF model was 0.942 (95% confidence interval [CI] = 0.912–0.972) in the derivation cohort and 0.906 (95% CI = 0.900–0.912) in the validation cohort. In contrast, the AUROC of PIM 3 was 0.892 (95% CI = 0.878–0.906) in the derivation cohort and 0.845 (95% CI = 0.817–0.873) in the validation cohort. The RF model in our study showed improved predictive performance in terms of both internal and external validation and was superior even when compared to PIM 3.


2020 ◽  
Vol 28 (4) ◽  
pp. 532-551
Author(s):  
Blake Miller ◽  
Fridolin Linder ◽  
Walter R. Mebane

Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where active learning can reduce the cost of labeling text data. We perform these simulations on three corpora that vary in size, document length, and domain. We find that in cases where the document class of interest is not balanced, researchers can label a fraction of the documents one would need using random sampling (or “passive” learning) to achieve equally performing classifiers. We further investigate how varying levels of intercoder reliability affect the active learning procedures and find that even with low reliability, active learning performs more efficiently than does random sampling.


2017 ◽  
Vol 36 (3) ◽  
pp. 267-269 ◽  
Author(s):  
Matt Hall ◽  
Brendon Hall

The Geophysical Tutorial in the October issue of The Leading Edge was the first we've done on the topic of machine learning. Brendon Hall's article ( Hall, 2016 ) showed readers how to take a small data set — wireline logs and geologic facies data from nine wells in the Hugoton natural gas and helium field of southwest Kansas ( Dubois et al., 2007 ) — and predict the facies in two wells for which the facies data were not available. The article demonstrated with 25 lines of code how to explore the data set, then create, train and test a machine learning model for facies classification, and finally visualize the results. The workflow took a deliberately naive approach using a support vector machine model. It achieved a sort of baseline accuracy rate — a first-order prediction, if you will — of 0.42. That might sound low, but it's not untypical for a naive approach to this kind of problem. For comparison, random draws from the facies distribution score 0.16, which is therefore the true baseline.


Author(s):  
Jeffrey G Klann ◽  
Griffin M Weber ◽  
Hossein Estiri ◽  
Bertrand Moal ◽  
Paul Avillach ◽  
...  

AbstractIntroductionThe Consortium for Clinical Characterization of COVID-19 by EHR (4CE) includes hundreds of hospitals internationally using a federated computational approach to COVID-19 research using the EHR.ObjectiveWe sought to develop and validate a standard definition of COVID-19 severity from readily accessible EHR data across the Consortium.MethodsWe developed an EHR-based severity algorithm and validated it on patient hospitalization data from 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also used a machine learning approach to compare selected predictors of severity to the 4CE algorithm at one site.ResultsThe 4CE severity algorithm performed with pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of single code categories for acuity were unacceptably inaccurate - varying by up to 0.65 across sites. A multivariate machine learning approach identified codes resulting in mean AUC 0.956 (95% CI: 0.952, 0.959) compared to 0.903 (95% CI: 0.886, 0.921) using expert-derived codes. Billing codes were poor proxies of ICU admission, with 49% precision and recall compared against chart review at one partner institution.DiscussionWe developed a proxy measure of severity that proved resilient to coding variability internationally by using a set of 6 code classes. In contrast, machine-learning approaches may tend to overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold standard outcomes, possibly due to pandemic conditions.ConclusionWe developed an EHR-based algorithm for COVID-19 severity and validated it at 12 international sites.


Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 219 ◽  
Author(s):  
Sweta Bhattacharya ◽  
Siva Rama Krishnan S ◽  
Praveen Kumar Reddy Maddikunta ◽  
Rajesh Kaluri ◽  
Saurabh Singh ◽  
...  

The enormous popularity of the internet across all spheres of human life has introduced various risks of malicious attacks in the network. The activities performed over the network could be effortlessly proliferated, which has led to the emergence of intrusion detection systems. The patterns of the attacks are also dynamic, which necessitates efficient classification and prediction of cyber attacks. In this paper we propose a hybrid principal component analysis (PCA)-firefly based machine learning model to classify intrusion detection system (IDS) datasets. The dataset used in the study is collected from Kaggle. The model first performs One-Hot encoding for the transformation of the IDS datasets. The hybrid PCA-firefly algorithm is then used for dimensionality reduction. The XGBoost algorithm is implemented on the reduced dataset for classification. A comprehensive evaluation of the model is conducted with the state of the art machine learning approaches to justify the superiority of our proposed approach. The experimental results confirm the fact that the proposed model performs better than the existing machine learning models.


2020 ◽  
Vol 13 (2) ◽  
pp. 89-96
Author(s):  
Muhammad Fachrie

In this paper, we discuss the implementation of Machine Learning (ML) to predict the victory of candidates in Regional Elections in Indonesia based on data taken from General Election Commission (KPU). The data consist of composition of political parties that support each candidate. The purpose of this research is to develop a Machine Learning model based on verified data provided by official institution to predict the victory of each candidate in a Regional Election instead of using social media data as in previous studies. The prediction itself simply a classification task between two classes, i.e. ‘win’ and ‘lose’. Several Machine Learning algorithms were applied to find the best model, i.e. k-Nearest Neighbors, Naïve Bayes Classifier, Decision Tree (C4.5), and Neural Networks (Multilayer Perceptron) where each of them was validated using 10-fold Cross Validation techniques. The selection of these algorithms aims to observe how the data works on different Machine Learning approaches. Besides, this research also aims to find the best combination of features that can lead to gain the highest performance. We found in this research that Neural Networks with Multilayer Perceptron is the best model with 74.20% of accuracy.


2021 ◽  
Author(s):  
Itai Guez ◽  
Gili Focht ◽  
Mary-Louise C.Greer ◽  
Ruth Cytter-Kuint ◽  
Li-tal Pratt ◽  
...  

Background and Aims: Endoscopic healing (EH), is a major treatment goal for Crohn's disease(CD). However, terminal ileum (TI) intubation failure is common, especially in children. We evaluated the added-value of machine-learning models in imputing a TI Simple Endoscopic Score for CD (SES-CD) from Magnetic Resonance Enterography (MRE) data of pediatric CD patients. Methods: This is a sub-study of the prospective ImageKids study. We developed machine-learning and baseline linear-regression models to predict TI SES-CD score from the Magnetic Resonance Index of Activity (MaRIA) and the Pediatric Inflammatory Crohn's MRE Index (PICMI) variables. We assessed TI SES-CD predictions' accuracy for intubated patients with a stratified 2-fold validation experimental setup, repeated 50 times. We determined clinical impact by imputing TI SES-CD in patients with ileal intubation failure during ileocolonscopy. Results: A total of 223 children were included (mean age 14.1+-2.5 years), of whom 132 had all relevant variables (107 with TI intubation and 25 with TI intubation failure). The combination of a machine-learning model with the PICMI variables achieved the lowest SES-CD prediction error compared to a baseline MaRIA-based linear regression model for the intubated patients (N=107, 11.7 (10.5-12.5) vs. 12.1 (11.4-12.9), p<0.05). The PICMI-based models suggested a higher rate of patients with TI disease among the non-intubated patients compared to a baseline MaRIA-based linear regression model (N=25, up to 25/25 (100%) vs. 23/25 (92%)). Conclusions: Machine-learning models with clinically-relevant variables as input are more accurate than linear-regression models in predicting TI SES-CD and EH when using the same MRE-based variables.


2021 ◽  
Vol 28 (1) ◽  
pp. 22-30
Author(s):  
Hon Fung Chow

This paper proposes and discusses the viability of a short-term grid maximum demand forecasting model combining autoregressive integrated moving average with regressors (ARIMAX) and support vector regression (SVR). Grid demand forecasting is essential to generation unit scheduling, maintenance planning and system security. Traditionally, grid demand is forecasted using multivariate linear regression models with parameters adjusted to past data. A disadvantage of the linear regression model is that the parameters require regular adjustment, otherwise the prediction accuracy will deteriorate over time. With recent advances in the field of machine learning and lower computational costs, the usage of machine learning in the power industry becomes increasingly practicable. The proposed model is a machine learning model that combines ARIMAX and SVR to exploit their respective effectiveness in predicting linear and non-linear data. In contrast to linear regression models, the machine learning model automatically updates itself when new data is included. The hybrid model is benchmarked against other forecasting models and demonstrated a marked improvement in accuracy, achieving RMSE of 67.7MW and MAPE of 1.32% in a seven-day forecast.


Sign in / Sign up

Export Citation Format

Share Document