Fluid Overload Phenotypes in Critical Illness—A Machine Learning Approach

Background: The detrimental impact of fluid overload (FO) on intensive care unit (ICU) morbidity and mortality is well known. However, research to identify subgroups of patients particularly prone to fluid overload is scarce. The aim of this cohort study was to derive “FO phenotypes” in the critically ill by using machine learning techniques. Methods: Retrospective single center study including adult intensive care patients with a length of stay of ≥3 days and sufficient data to compute FO. Data was analyzed by multivariable logistic regression, fast and frugal trees (FFT), classification decision trees (DT), and a random forest (RF) model. Results: Out of 1772 included patients, 387 (21.8%) met the FO definition. The random forest model had the highest area under the curve (AUC) (0.84, 95% CI 0.79–0.86), followed by multivariable logistic regression (0.81, 95% CI 0.77–0.86), FFT (0.75, 95% CI 0.69–0.79) and DT (0.73, 95% CI 0.68–0.78) to predict FO. The most important predictors identified in all models were lactate and bicarbonate at admission and postsurgical ICU admission. Sepsis/septic shock was identified as a risk factor in the MV and RF analysis. Conclusion: The FO phenotypes consist of patients admitted after surgery or with sepsis/septic shock with high lactate and low bicarbonate.

Download Full-text

Predicting mortality in hemodialysis patients using machine learning analysis

Clinical Kidney Journal ◽

10.1093/ckj/sfaa126 ◽

2020 ◽

Author(s):

Victoria Garcia-Montemayor ◽

Alejandro Martin-Malo ◽

Carlo Barbieri ◽

Francesco Bellocchio ◽

Sagrario Soriano ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Prediction Models ◽

Initial Period ◽

Area Under The Curve ◽

Mortality Prediction ◽

Machine Learning Techniques ◽

Prediction Of Mortality ◽

Mortality Prediction Models

Abstract Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.

Download Full-text

Predictive Modeling of Pressure Injury Risk in Patients Admitted to an Intensive Care Unit

American Journal of Critical Care ◽

10.4037/ajcc2020237 ◽

2020 ◽

Vol 29 (4) ◽

pp. e70-e80

Author(s):

Mireia Ladios-Martin ◽

José Fernández-de-Maya ◽

Francisco-Javier Ballesta-López ◽

Adrián Belso-Garzas ◽

Manuel Mas-Asencio ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care Unit ◽

Intensive Care ◽

Injury Risk ◽

Area Under The Curve ◽

University Hospital ◽

Machine Learning Techniques ◽

Real Environment ◽

Learning Techniques ◽

Pressure Injury

Background Pressure injuries are an important problem in hospital care. Detecting the population at risk for pressure injuries is the first step in any preventive strategy. Available tools such as the Norton and Braden scales do not take into account all of the relevant risk factors. Data mining and machine learning techniques have the potential to overcome this limitation. Objectives To build a model to detect pressure injury risk in intensive care unit patients and to put the model into production in a real environment. Methods The sample comprised adult patients admitted to an intensive care unit (N = 6694) at University Hospital of Torrevieja and University Hospital of Vinalopó. A retrospective design was used to train (n = 2508) and test (n = 1769) the model and then a prospective design was used to test the model in a real environment (n = 2417). Data mining was used to extract variables from electronic medical records and a predictive model was built with machine learning techniques. The sensitivity, specificity, area under the curve, and accuracy of the model were evaluated. Results The final model used logistic regression and incorporated 23 variables. The model had sensitivity of 0.90, specificity of 0.74, and area under the curve of 0.89 during the initial test, and thus it outperformed the Norton scale. The model performed well 1 year later in a real environment. Conclusions The model effectively predicts risk of pressure injury. This allows nurses to focus on patients at high risk for pressure injury without increasing workload.

Download Full-text

Don’t Dismiss Logistic Regression: The Case for Sensible Extraction of Interactions in the Era of Machine Learning

10.1101/2019.12.15.877134 ◽

2019 ◽

Cited By ~ 1

Author(s):

Joshua J. Levy ◽

A. James O’Malley

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Model Building ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Statistical Machine Learning ◽

Forest Model ◽

Learning Techniques ◽

Modeling Techniques

AbstractBackgroundMachine learning approaches have become increasingly popular modeling techniques, relying on data-driven heuristics to arrive at its solutions. Recent comparisons between these algorithms and traditional statistical modeling techniques have largely ignored the superiority gained by the former approaches due to involvement of model-building search algorithms. This has led to alignment of statistical and machine learning approaches with different types of problems and the under-development of procedures that combine their attributes. In this context, we hoped to understand the domains of applicability for each approach and to identify areas where a marriage between the two approaches is warranted. We then sought to develop a hybrid statistical-machine learning procedure with the best attributes of each.MethodsWe present three simple examples to illustrate when to use each modeling approach and posit a general framework for combining them into an enhanced logistic regression model building procedure that aids interpretation. We study 556 benchmark machine learning datasets to uncover when machine learning techniques outperformed rudimentary logistic regression models and so are potentially well-equipped to enhance them. We illustrate a software package, InteractionTransformer, which embeds logistic regression with advanced model building capacity by using machine learning algorithms to extract candidate interaction features from a random forest model for inclusion in the model. Finally, we apply our enhanced logistic regression analysis to two real-word biomedical examples, one where predictors vary linearly with the outcome and another with extensive second-order interactions.ResultsPreliminary statistical analysis demonstrated that across 556 benchmark datasets, the random forest approach significantly outperformed the logistic regression approach. We found a statistically significant increase in predictive performance when using hybrid procedures and greater clarity in the association with the outcome of terms acquired compared to directly interpreting the random forest output.ConclusionsWhen a random forest model is closer to the true model, hybrid statistical-machine learning procedures can substantially enhance the performance of statistical procedures in an automated manner while preserving easy interpretation of the results. Such hybrid methods may help facilitate widespread adoption of machine learning techniques in the biomedical setting.

Download Full-text

Alzheimer's Disease Early Detection Using Machine Learning Techniques

10.21203/rs.3.rs-624520/v1 ◽

2021 ◽

Author(s):

Roobaea Alroobaea ◽

Seifeddine Mechti ◽

Mariem Haoues ◽

Saeed Rubaiee ◽

Anas Ahmed ◽

...

Keyword(s):

Machine Learning ◽

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Support Vector Machine ◽

Logistic Regression ◽

Random Forest ◽

Machine Learning Techniques ◽

Support Vector ◽

Disease Detection ◽

Learning Techniques

Abstract Alzheimer's is the main reason for dementia, that affects frequently older adults. This disease is costly especially, in terms of treatment. In addition, Alzheimer's is one of the deaths causes in the old-age citizens. Early Alzheimer's detection helps medical staffs in this disease diagnosis, which will certainly decrease the risk of death. This made the early Alzheimer's disease detection a crucial problem in the healthcare industry. The objective of this research study is to introduce a computer-aided diagnosis system for Alzheimer's disease detection using machine learning techniques. We employed data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS) brain datasets. Common supervised machine learning techniques have been applied for automatic Alzheimer’s disease detection such as: logistic regression, support vector machine, random forest, linear discriminant analysis, etc. The best accuracy values provided by the machine learning classifiers are 99.43% and 99.10% given by respectively, logistic regression and support vector machine using ADNI dataset, whereas for the OASIS dataset, we obtained 84.33% and 83.92% given by respectively logistic regression and random forest.

Download Full-text

Complex Machine-Learning Algorithms and Multivariable Logistic Regression on Par in the Prediction of Insufficient Clinical Response to Methotrexate in Rheumatoid Arthritis

Journal of Personalized Medicine ◽

10.3390/jpm11010044 ◽

2021 ◽

Vol 11 (1) ◽

pp. 44

Author(s):

Helen R. Gosselt ◽

Maxime M. A. Verhoeven ◽

Maja Bulatović-Ćalasan ◽

Paco M. Welsing ◽

Maurits C. F. J. de Rotte ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Combination Therapy ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Multivariable Logistic Regression ◽

Extreme Gradient Boosting ◽

Insufficient Response

The goals of this study were to examine whether machine-learning algorithms outperform multivariable logistic regression in the prediction of insufficient response to methotrexate (MTX); secondly, to examine which features are essential for correct prediction; and finally, to investigate whether the best performing model specifically identifies insufficient responders to MTX (combination) therapy. The prediction of insufficient response (3-month Disease Activity Score 28-Erythrocyte-sedimentation rate (DAS28-ESR) > 3.2) was assessed using logistic regression, least absolute shrinkage and selection operator (LASSO), random forest, and extreme gradient boosting (XGBoost). The baseline features of 355 rheumatoid arthritis (RA) patients from the “treatment in the Rotterdam Early Arthritis CoHort” (tREACH) and the U-Act-Early trial were combined for analyses. The model performances were compared using area under the curve (AUC) of receiver operating characteristic (ROC) curves, 95% confidence intervals (95% CI), and sensitivity and specificity. Finally, the best performing model following feature selection was tested on 101 RA patients starting tocilizumab (TCZ)-monotherapy. Logistic regression (AUC = 0.77 95% CI: 0.68–0.86) performed as well as LASSO (AUC = 0.76, 95% CI: 0.67–0.85), random forest (AUC = 0.71, 95% CI: 0.61 = 0.81), and XGBoost (AUC = 0.70, 95% CI: 0.61–0.81), yet logistic regression reached the highest sensitivity (81%). The most important features were baseline DAS28 (components). For all algorithms, models with six features performed similarly to those with 16. When applied to the TCZ-monotherapy group, logistic regression’s sensitivity significantly dropped from 83% to 69% (p = 0.03). In the current dataset, logistic regression performed equally well compared to machine-learning algorithms in the prediction of insufficient response to MTX. Models could be reduced to six features, which are more conducive for clinical implementation. Interestingly, the prediction model was specific to MTX (combination) therapy response.

Download Full-text

Early Detection of Septic Shock Onset Using Interpretable Machine Learners

Journal of Clinical Medicine ◽

10.3390/jcm10020301 ◽

2021 ◽

Vol 10 (2) ◽

pp. 301

Author(s):

Debdipto Misra ◽

Venkatesh Avula ◽

Donna M. Wolk ◽

Hosam A. Farag ◽

Jiang Li ◽

...

Keyword(s):

Machine Learning ◽

Septic Shock ◽

Logistic Regression ◽

Decision Support ◽

Random Forest ◽

Decision Support System ◽

Support System ◽

Machine Learning Algorithms ◽

Sepsis Definition ◽

Time Of Admission

Background: Developing a decision support system based on advances in machine learning is one area for strategic innovation in healthcare. Predicting a patient’s progression to septic shock is an active field of translational research. The goal of this study was to develop a working model of a clinical decision support system for predicting septic shock in an acute care setting for up to 6 h from the time of admission in an integrated healthcare setting. Method: Clinical data from Electronic Health Record (EHR), at encounter level, were used to build a predictive model for progression from sepsis to septic shock up to 6 h from the time of admission; that is, T = 1, 3, and 6 h from admission. Eight different machine learning algorithms (Random Forest, XGBoost, C5.0, Decision Trees, Boosted Logistic Regression, Support Vector Machine, Logistic Regression, Regularized Logistic, and Bayes Generalized Linear Model) were used for model development. Two adaptive sampling strategies were used to address the class imbalance. Data from two sources (clinical and billing codes) were used to define the case definition (septic shock) using the Centers for Medicare & Medicaid Services (CMS) Sepsis criteria. The model assessment was performed using Area under Receiving Operator Characteristics (AUROC), sensitivity, and specificity. Model predictions for each feature window (1, 3 and 6 h from admission) were consolidated. Results: Retrospective data from April 2005 to September 2018 were extracted from the EHR, Insurance Claims, Billing, and Laboratory Systems to create a dataset for septic shock detection. The clinical criteria and billing information were used to label patients into two classes-septic shock patients and sepsis patients at three different time points from admission, creating two different case-control cohorts. Data from 45,425 unique in-patient visits were used to build 96 prediction models comparing clinical-based definition versus billing-based information as the gold standard. Of the 24 consolidated models (based on eight machine learning algorithms and three feature windows), four models reached an AUROC greater than 0.9. Overall, all the consolidated models reached an AUROC of at least 0.8820 or higher. Based on the AUROC of 0.9483, the best model was based on Random Forest, with a sensitivity of 83.9% and specificity of 88.1%. The sepsis detection window at 6 h outperformed the 1 and 3-h windows. The sepsis definition based on clinical variables had improved performance when compared to the sepsis definition based on only billing information. Conclusion: This study corroborated that machine learning models can be developed to predict septic shock using clinical and administrative data. However, the use of clinical information to define septic shock outperformed models developed based on only administrative data. Intelligent decision support tools can be developed and integrated into the EHR and improve clinical outcomes and facilitate the optimization of resources in real-time.

Download Full-text

Early Prediction of Seven-Day Mortality in Intensive Care Unit Using a Machine Learning Model: Results from the SPIN-UTI Project

Journal of Clinical Medicine ◽

10.3390/jcm10050992 ◽

2021 ◽

Vol 10 (5) ◽

pp. 992

Author(s):

Martina Barchitta ◽

Andrea Maugeri ◽

Giuliana Favara ◽

Paolo Marco Riela ◽

Giovanni Gallo ◽

...

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Learning Algorithm ◽

Area Under The Curve ◽

Support Vector ◽

Icu Admission ◽

Risk Of Death ◽

Saps Ii ◽

Svm Algorithm

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.

Download Full-text

Hyperoxia is associated with adverse outcomes in the cardiac intensive care unit: insights from the Medical Information Mart for Intensive Care (MIMI-III) database

European Heart Journal ◽

10.1093/ehjci/ehaa946.1837 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

A.Y Lui ◽

L Garber ◽

M Vincent ◽

L Celi ◽

J Masip ◽

...

Keyword(s):

Intensive Care Unit ◽

Logistic Regression ◽

Intensive Care ◽

Hospital Mortality ◽

Adverse Outcomes ◽

Rank Test ◽

Positive Pressure ◽

Cardiac Intensive Care Unit ◽

Multivariable Logistic Regression ◽

Cardiac Intensive Care

Abstract Background Hyperoxia produces reactive oxygen species, apoptosis, and vasoconstriction, and is associated with adverse outcomes in patients with heart failure and cardiac arrest. Our aim was to evaluate the association between hyperoxia and mortality in patients (pts) receiving positive pressure ventilation (PPV) in the cardiac intensive care unit (CICU). Methods Patients admitted to our medical center CICU who received any PPV (invasive or non-invasive) from 2001 through 2012 were included. Hyperoxia was defined as time-weighted mean of PaO2 >120mmHg and non-hyperoxia as PaO2 ≤120mmHg during CICU admission. Primary outcome was in-hospital mortality. Multivariable logistic regression was used to assess the association between hyperoxia and in-hospital mortality adjusted for age, female sex, Oxford Acute Severity of Illness Score, creatinine, lactate, pH, PaO2/FiO2 ratio, PCO2, PEEP, and estimated time spent on PEEP. Results Among 1493 patients, hyperoxia (median PaO2 147mmHg) during the CICU admission was observed in 702 (47.0%) pts. In-hospital mortality was 29.7% in the non-hyperoxia group and 33.9% in the hyperoxia group ((log rank test, p=0.0282, see figure). Using multivariable logistic regression, hyperoxia was independently associated with in-hospital mortality (OR 1.507, 95% CI 1.311–2.001, p=0.00508). Post-hoc analysis with PaO2 as a continuous variable was consistent with the primary analysis (OR 1.053 per 10mmHg increase in PaO2, 95% CI 1.024–1.082, p=0.0002). Conclusions In a large CICU cohort, hyperoxia was associated with increased mortality. Trials of titration of supplemental oxygen across the full spectrum of critically ill cardiac patients are warranted. Funding Acknowledgement Type of funding source: None

Download Full-text

Prediction of Short-Distance Aerial Movement of Phakopsora pachyrhizi Urediniospores Using Machine Learning

Phytopathology ◽

10.1094/phyto-04-17-0138-fi ◽

2017 ◽

Vol 107 (10) ◽

pp. 1187-1198 ◽

Cited By ~ 7

Author(s):

L. Wen ◽

C. R. Bowen ◽

G. L. Hartman

Keyword(s):

Machine Learning ◽

Random Forest ◽

Short Distance ◽

Soybean Rust ◽

Machine Learning Techniques ◽

Phakopsora Pachyrhizi ◽

Primary Means ◽

Soybean Plants ◽

Selection Operator ◽

Active Trap

Dispersal of urediniospores by wind is the primary means of spread for Phakopsora pachyrhizi, the cause of soybean rust. Our research focused on the short-distance movement of urediniospores from within the soybean canopy and up to 61 m from field-grown rust-infected soybean plants. Environmental variables were used to develop and compare models including the least absolute shrinkage and selection operator regression, zero-inflated Poisson/regular Poisson regression, random forest, and neural network to describe deposition of urediniospores collected in passive and active traps. All four models identified distance of trap from source, humidity, temperature, wind direction, and wind speed as the five most important variables influencing short-distance movement of urediniospores. The random forest model provided the best predictions, explaining 76.1 and 86.8% of the total variation in the passive- and active-trap datasets, respectively. The prediction accuracy based on the correlation coefficient (r) between predicted values and the true values were 0.83 (P < 0.0001) and 0.94 (P < 0.0001) for the passive and active trap datasets, respectively. Overall, multiple machine learning techniques identified the most important variables to make the most accurate predictions of movement of P. pachyrhizi urediniospores short-distance.

Download Full-text

A proof-of-concept study applying machine learning methods to putative risk factors for eating disorders: results from the multi-centre European project on healthy eating

Psychological Medicine ◽

10.1017/s003329172100489x ◽

2021 ◽

pp. 1-10

Author(s):

I. Krug ◽

J. Linardon ◽

C. Greenwood ◽

G. Youssef ◽

J. Treasure ◽

...

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Logistic Regression ◽

Predictive Accuracy ◽

Area Under The Curve ◽

Prediction Rule ◽

Predictive Performance ◽

Individual Risk ◽

European Project ◽

Wide Range

Abstract Background Despite a wide range of proposed risk factors and theoretical models, prediction of eating disorder (ED) onset remains poor. This study undertook the first comparison of two machine learning (ML) approaches [penalised logistic regression (LASSO), and prediction rule ensembles (PREs)] to conventional logistic regression (LR) models to enhance prediction of ED onset and differential ED diagnoses from a range of putative risk factors. Method Data were part of a European Project and comprised 1402 participants, 642 ED patients [52% with anorexia nervosa (AN) and 40% with bulimia nervosa (BN)] and 760 controls. The Cross-Cultural Risk Factor Questionnaire, which assesses retrospectively a range of sociocultural and psychological ED risk factors occurring before the age of 12 years (46 predictors in total), was used. Results All three statistical approaches had satisfactory model accuracy, with an average area under the curve (AUC) of 86% for predicting ED onset and 70% for predicting AN v. BN. Predictive performance was greatest for the two regression methods (LR and LASSO), although the PRE technique relied on fewer predictors with comparable accuracy. The individual risk factors differed depending on the outcome classification (EDs v. non-EDs and AN v. BN). Conclusions Even though the conventional LR performed comparably to the ML approaches in terms of predictive accuracy, the ML methods produced more parsimonious predictive models. ML approaches offer a viable way to modify screening practices for ED risk that balance accuracy against participant burden.

Download Full-text