A machine-learning approach for dynamic prediction of sepsis-induced coagulopathy in critically ill patients with sepsis: an integrated analysis of the MIMIC-IV and eICU-CRD databases

Abstract Background Sepsis-induced coagulopathy (SIC) denotes an increased mortality rate and poorer prognosis in septic patients. Methods Machine-learning models were developed based on septic patients who were older than 18 years and stayed in intensive care units (ICUs) for more than 24 hours in Medical Information Mart for Intensive Care (MIMIC)-IV. Eighty-eight potential predictors were extracted, and 15 various machine-learning models assessed the daily risk of SIC. The most potent model was selected based on its accuracy and Area Under the receiver operating characteristic Curve (AUC), followed by fine-grained hyperparameter adjustment using the Bayesian Optimization Algorithm. The effects of features on prediction scores were measured using the SHapley Additive exPlanations (SHAP) values. A compact model was developed, based on 15 features selected according to their importance and clinical availability. Two models were compared with Logistic Regression and SIC scores in terms of SIC prediction. Additionally, an external validation was performed in the eICU Collaborative Research Database (eICU-CRD). Results Of 11362 patients in MIMIC-IV included in the final cohort, a total of 6744 (59%) patients had SIC during sepsis, and 16183 samples were extracted. The model named Categorical Boosting (CatBoost) had the greatest AUC in our study (0.869 [0.850, 0.886]). Coagulation profile and renal function indicators are the most important features to predict SIC. A compact model was developed with the AUC of 0.854 [0.832, 0.872], while the AUCs of Logistic Regression and SIC scores were 0.746 [0.735, 0.755] and 0.709 [0.687, 0.733], respectively. A cohort of 35252 septic patients in eICU-CRD was analyzed. The AUCs of the full and the compact models in external validation were 0.842 [0.837, 0.846] and 0.803 [0.798, 0.809], respectively, which were still larger than those of Logistic Regression (0.660 [0.653, 0.667]) and SIC scores (0.752 [0.747, 0.757]). Prediction results can be illustrated by using SHAP values in the instance level, which makes our models clinically interpretable. Conclusions We developed two models which were able to dynamically predict the risk of SIC in septic patients better than conventional Logistic Regression and SIC scores. Prediction results of our two models can be interpreted by using SHAP values.

Download Full-text

A Machine-Learning Approach for Dynamic Prediction of Sepsis-Induced Coagulopathy in Critically Ill Patients With Sepsis

Frontiers in Medicine ◽

10.3389/fmed.2020.637434 ◽

2021 ◽

Vol 7 ◽

Author(s):

Qin-Yu Zhao ◽

Le-Ping Liu ◽

Jing-Chao Luo ◽

Yan-Wei Luo ◽

Huan Wang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Critically Ill ◽

Critically Ill Patients ◽

Compact Model ◽

Bayesian Optimization ◽

Research Database ◽

Learning Models ◽

Dynamic Prediction ◽

Machine Learning Models

Background: Sepsis-induced coagulopathy (SIC) denotes an increased mortality rate and poorer prognosis in septic patients.Objectives: Our study aimed to develop and validate machine-learning models to dynamically predict the risk of SIC in critically ill patients with sepsis.Methods: Machine-learning models were developed and validated based on two public databases named Medical Information Mart for Intensive Care (MIMIC)-IV and the eICU Collaborative Research Database (eICU-CRD). Dynamic prediction of SIC involved an evaluation of the risk of SIC each day after the diagnosis of sepsis using 15 predictive models. The best model was selected based on its accuracy and area under the receiver operating characteristic curve (AUC), followed by fine-grained hyperparameter adjustment using the Bayesian Optimization Algorithm. A compact model was developed, based on 15 features selected according to their importance and clinical availability. These two models were compared with Logistic Regression and SIC scores in terms of SIC prediction.Results: Of 11,362 patients in MIMIC-IV included in the final cohort, a total of 6,744 (59%) patients developed SIC during sepsis. The model named Categorical Boosting (CatBoost) had the greatest AUC in our study (0.869; 95% CI: 0.850–0.886). Coagulation profile and renal function indicators were the most important features for predicting SIC. A compact model was developed with an AUC of 0.854 (95% CI: 0.832–0.872), while the AUCs of Logistic Regression and SIC scores were 0.746 (95% CI: 0.735–0.755) and 0.709 (95% CI: 0.687–0.733), respectively. A cohort of 35,252 septic patients in eICU-CRD was analyzed. The AUCs of the full and the compact models in the external validation were 0.842 (95% CI: 0.837–0.846) and 0.803 (95% CI: 0.798–0.809), respectively, which were still larger than those of Logistic Regression (0.660; 95% CI: 0.653–0.667) and SIC scores (0.752; 95% CI: 0.747–0.757). Prediction results were illustrated by SHapley Additive exPlanations (SHAP) values, which made our models clinically interpretable.Conclusions: We developed two models which were able to dynamically predict the risk of SIC in septic patients better than conventional Logistic Regression and SIC scores.

Download Full-text

Application of Machine Learning to Predict Acute Kidney Disease in Patients With Sepsis Associated Acute Kidney Injury

Frontiers in Medicine ◽

10.3389/fmed.2021.792974 ◽

2021 ◽

Vol 8 ◽

Author(s):

Jiawei He ◽

Jin Lin ◽

Meili Duan

Keyword(s):

Machine Learning ◽

Acute Kidney Injury ◽

Logistic Regression ◽

Intensive Care ◽

Kidney Injury ◽

Learning Models ◽

Short Term ◽

Acute Kidney Disease ◽

Mimic Iii ◽

Machine Learning Models

Background: Sepsis-associated acute kidney injury (AKI) is frequent in patients admitted to intensive care units (ICU) and may contribute to adverse short-term and long-term outcomes. Acute kidney disease (AKD) reflects the adverse events developing after AKI. We aimed to develop and validate machine learning models to predict the occurrence of AKD in patients with sepsis-associated AKI.Methods: Using clinical data from patients with sepsis in the ICU at Beijing Friendship Hospital (BFH), we studied whether the following three machine learning models could predict the occurrence of AKD using demographic, laboratory, and other related variables: Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM), decision trees, and logistic regression. In addition, we externally validated the results in the Medical Information Mart for Intensive Care III (MIMIC III) database. The outcome was the diagnosis of AKD when defined as AKI prolonged for 7–90 days according to Acute Disease Quality Initiative-16.Results: In this study, 209 patients from BFH were included, with 55.5% of them diagnosed as having AKD. Furthermore, 509 patients were included from the MIMIC III database, of which 46.4% were diagnosed as having AKD. Applying machine learning could successfully achieve very high accuracy (RNN-LSTM AUROC = 1; decision trees AUROC = 0.954; logistic regression AUROC = 0.728), with RNN-LSTM showing the best results. Further analyses revealed that the change of non-renal Sequential Organ Failure Assessment (SOFA) score between the 1st day and 3rd day (Δnon-renal SOFA) is instrumental in predicting the occurrence of AKD.Conclusion: Our results showed that machine learning, particularly RNN-LSTM, can accurately predict AKD occurrence. In addition, Δ SOFAnon−renal plays an important role in predicting the occurrence of AKD.

Download Full-text

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text

Prospective and external validation of stroke discharge planning machine learning models

Journal of Clinical Neuroscience ◽

10.1016/j.jocn.2021.12.031 ◽

2022 ◽

Vol 96 ◽

pp. 80-84

Author(s):

Stephen Bacchi ◽

Luke Oakden-Rayner ◽

David K Menon ◽

Andrew Moey ◽

Jim Jannes ◽

...

Keyword(s):

Machine Learning ◽

Discharge Planning ◽

External Validation ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

Predictive modelling of turbofan engine components condition using machine and deep learning methods

Eksploatacja i Niezawodnosc - Maintenance and Reliability ◽

10.17531/ein.2021.2.16 ◽

2021 ◽

Vol 23 (2) ◽

pp. 359-370

Author(s):

Michał Matuszczak ◽

Mateusz Żbikowski ◽

Andrzej Teodorczyk

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Predictive Modelling ◽

Continuous Variable ◽

Environmental Data ◽

Bayesian Optimization ◽

Aviation Industry ◽

Learning Models ◽

Turbofan Engine ◽

Machine Learning Models

The article proposes an approach based on deep and machine learning models to predict a component failure as an enhancement of condition based maintenance scheme of a turbofan engine and reviews currently used prognostics approaches in the aviation industry. Component degradation scale representing its life consumption is proposed and such collected condition data are combined with engines sensors and environmental data. With use of data manipulation techniques, a framework for models training is created and models' hyperparameters obtained through Bayesian optimization. Models predict the continuous variable representing condition based on the input. Best performed model is identified by detemining its score on the holdout set. Deep learning models achieved 0.71 MSE score (ensemble meta-model of neural networks) and outperformed significantly machine learning models with their best score at 1.75. The deep learning models shown their feasibility to predict the component condition within less than 1 unit of the error in the rank scale.

Download Full-text

Machine Learning Models Have Better Performance than Traditional Logistic Regression Models in Predicting the Risk of Diabetes

SSRN Electronic Journal ◽

10.2139/ssrn.3854672 ◽

2021 ◽

Author(s):

Yaqian Mao ◽

Shuyao Pan ◽

Zheng Zhu ◽

Wei Lin ◽

Junping Wen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Learning Models ◽

Logistic Regression Models ◽

Machine Learning Models

Download Full-text

Identification of Primary Antimicrobial Resistance Drivers in Agricultural Nontyphoidal Salmonella enterica Serovars by Using Machine Learning

mSystems ◽

10.1128/msystems.00211-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 1

Author(s):

Finlay Maguire ◽

Muhammad Attiq Rehman ◽

Catherine Carrillo ◽

Moussa S. Diarra ◽

Robert G. Beiko

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Antimicrobial Resistance ◽

Broiler Chicken ◽

Genomic Data ◽

Set Covering ◽

Learning Models ◽

Commercial Chicken ◽

The Impact ◽

Machine Learning Models

ABSTRACT Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained “reference-free” k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 β-lactamase is a primary driver of β-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem. IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.

Download Full-text

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02206-9 ◽

2020 ◽

Vol 58 (10) ◽

pp. 2517-2529

Author(s):

Farahnaz Hajipour ◽

Mohammad Jafari Jozani ◽

Zahra Moussavi

Keyword(s):

Machine Learning ◽

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

Random Forest ◽

Learning Models ◽

Obstructive Sleep ◽

Machine Learning Models

Download Full-text

Machine learning to predict early TNF inhibitor users in patients with ankylosing spondylitis

Scientific Reports ◽

10.1038/s41598-020-75352-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Seulkee Lee ◽

Yeonghee Eun ◽

Hyungjin Kim ◽

Hoon-Suk Cha ◽

Eun-Mi Koh ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

AbstractWe aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis. The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early-TNF and non-early-TNF users. Machine learning models were formulated to predict the early-TNF users using the baseline data. Feature importance analysis was performed to delineate significant baseline characteristics. The numbers of early-TNF and non-early-TNF users were 90 and 505, respectively. The performance of the ANN model, based on the area under curve (AUC) for a receiver operating characteristic curve (ROC) of 0.783, was superior to logistic regression, support vector machine, random forest, and XGBoost models (for an ROC curve of 0.719, 0.699, 0.761, and 0.713, respectively) in predicting early-TNF users. Feature importance analysis revealed CRP and ESR as the top significant baseline characteristics for predicting early-TNF users. Our model displayed superior performance in predicting early-TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.

Download Full-text