A Machine-Learning Approach for Dynamic Prediction of Sepsis-Induced Coagulopathy in Critically Ill Patients With Sepsis

Background: Sepsis-induced coagulopathy (SIC) denotes an increased mortality rate and poorer prognosis in septic patients.Objectives: Our study aimed to develop and validate machine-learning models to dynamically predict the risk of SIC in critically ill patients with sepsis.Methods: Machine-learning models were developed and validated based on two public databases named Medical Information Mart for Intensive Care (MIMIC)-IV and the eICU Collaborative Research Database (eICU-CRD). Dynamic prediction of SIC involved an evaluation of the risk of SIC each day after the diagnosis of sepsis using 15 predictive models. The best model was selected based on its accuracy and area under the receiver operating characteristic curve (AUC), followed by fine-grained hyperparameter adjustment using the Bayesian Optimization Algorithm. A compact model was developed, based on 15 features selected according to their importance and clinical availability. These two models were compared with Logistic Regression and SIC scores in terms of SIC prediction.Results: Of 11,362 patients in MIMIC-IV included in the final cohort, a total of 6,744 (59%) patients developed SIC during sepsis. The model named Categorical Boosting (CatBoost) had the greatest AUC in our study (0.869; 95% CI: 0.850–0.886). Coagulation profile and renal function indicators were the most important features for predicting SIC. A compact model was developed with an AUC of 0.854 (95% CI: 0.832–0.872), while the AUCs of Logistic Regression and SIC scores were 0.746 (95% CI: 0.735–0.755) and 0.709 (95% CI: 0.687–0.733), respectively. A cohort of 35,252 septic patients in eICU-CRD was analyzed. The AUCs of the full and the compact models in the external validation were 0.842 (95% CI: 0.837–0.846) and 0.803 (95% CI: 0.798–0.809), respectively, which were still larger than those of Logistic Regression (0.660; 95% CI: 0.653–0.667) and SIC scores (0.752; 95% CI: 0.747–0.757). Prediction results were illustrated by SHapley Additive exPlanations (SHAP) values, which made our models clinically interpretable.Conclusions: We developed two models which were able to dynamically predict the risk of SIC in septic patients better than conventional Logistic Regression and SIC scores.

Download Full-text

A machine-learning approach for dynamic prediction of sepsis-induced coagulopathy in critically ill patients with sepsis: an integrated analysis of the MIMIC-IV and eICU-CRD databases

10.21203/rs.3.rs-125438/v1 ◽

2020 ◽

Author(s):

Qin-Yu Zhao ◽

Le-Ping Liu ◽

Jing-Chao Luo ◽

Yan-Wei Luo ◽

Huan Wang ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Intensive Care ◽

External Validation ◽

Compact Model ◽

Bayesian Optimization ◽

Integrated Analysis ◽

Research Database ◽

Learning Models ◽

Machine Learning Models

Abstract Background Sepsis-induced coagulopathy (SIC) denotes an increased mortality rate and poorer prognosis in septic patients. Methods Machine-learning models were developed based on septic patients who were older than 18 years and stayed in intensive care units (ICUs) for more than 24 hours in Medical Information Mart for Intensive Care (MIMIC)-IV. Eighty-eight potential predictors were extracted, and 15 various machine-learning models assessed the daily risk of SIC. The most potent model was selected based on its accuracy and Area Under the receiver operating characteristic Curve (AUC), followed by fine-grained hyperparameter adjustment using the Bayesian Optimization Algorithm. The effects of features on prediction scores were measured using the SHapley Additive exPlanations (SHAP) values. A compact model was developed, based on 15 features selected according to their importance and clinical availability. Two models were compared with Logistic Regression and SIC scores in terms of SIC prediction. Additionally, an external validation was performed in the eICU Collaborative Research Database (eICU-CRD). Results Of 11362 patients in MIMIC-IV included in the final cohort, a total of 6744 (59%) patients had SIC during sepsis, and 16183 samples were extracted. The model named Categorical Boosting (CatBoost) had the greatest AUC in our study (0.869 [0.850, 0.886]). Coagulation profile and renal function indicators are the most important features to predict SIC. A compact model was developed with the AUC of 0.854 [0.832, 0.872], while the AUCs of Logistic Regression and SIC scores were 0.746 [0.735, 0.755] and 0.709 [0.687, 0.733], respectively. A cohort of 35252 septic patients in eICU-CRD was analyzed. The AUCs of the full and the compact models in external validation were 0.842 [0.837, 0.846] and 0.803 [0.798, 0.809], respectively, which were still larger than those of Logistic Regression (0.660 [0.653, 0.667]) and SIC scores (0.752 [0.747, 0.757]). Prediction results can be illustrated by using SHAP values in the instance level, which makes our models clinically interpretable. Conclusions We developed two models which were able to dynamically predict the risk of SIC in septic patients better than conventional Logistic Regression and SIC scores. Prediction results of our two models can be interpreted by using SHAP values.

Download Full-text

ACEI/ARB Medication During ICU Stay Decrease All-Cause In-hospital Mortality in Critically Ill Patients With Hypertension: A Retrospective Cohort Study Based on Machine Learning

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.787740 ◽

2022 ◽

Vol 8 ◽

Author(s):

Boshen Yang ◽

Sixuan Xu ◽

Di Wang ◽

Yu Chen ◽

Zhenfa Zhou ◽

...

Keyword(s):

Machine Learning ◽

Cohort Study ◽

Hospital Mortality ◽

Critically Ill ◽

Critically Ill Patients ◽

Retrospective Cohort ◽

Learning Models ◽

Hyperparameter Optimization ◽

Icu Stay ◽

Machine Learning Models

Background: Hypertension is a rather common comorbidity among critically ill patients and hospital mortality might be higher among critically ill patients with hypertension (SBP ≥ 140 mmHg and/or DBP ≥ 90 mmHg). This study aimed to explore the association between ACEI/ARB medication during ICU stay and all-cause in-hospital mortality in these patients.Methods: A retrospective cohort study was conducted based on data from Medical Information Mart for Intensive Care IV (MIMIC-IV) database, which consisted of more than 40,000 patients in ICU between 2008 and 2019 at Beth Israel Deaconess Medical Center. Adults diagnosed with hypertension on admission and those had high blood pressure (SBP ≥ 140 mmHg and/or DBP ≥ 90 mmHg) during ICU stay were included. The primary outcome was all-cause in-hospital mortality. Patients were divided into ACEI/ARB treated and non-treated group during ICU stay. Propensity score matching (PSM) was used to adjust potential confounders. Nine machine learning models were developed and validated based on 37 clinical and laboratory features of all patients. The model with the best performance was selected based on area under the receiver operating characteristic curve (AUC) followed by 5-fold cross-validation. After hyperparameter optimization using Grid and random hyperparameter search, a final LightGBM model was developed, and Shapley Additive exPlanations (SHAP) values were calculated to evaluate feature importance of each feature. The features closely associated with hospital mortality were presented as significant features.Results: A total of 15,352 patients were enrolled in this study, among whom 5,193 (33.8%) patients were treated with ACEI/ARB. A significantly lower all-cause in-hospital mortality was observed among patients treated with ACEI/ARB (3.9 vs. 12.7%) as well as a lower 28-day mortality (3.6 vs. 12.2%). The outcome remained consistent after propensity score matching. Among nine machine learning models, the LightGBM model had the highest AUC = 0.9935. The SHAP plot was employed to make the model interpretable based on LightGBM model after hyperparameter optimization, showing that ACEI/ARB use was among the top five significant features, which were associated with hospital mortality.Conclusions: The use of ACEI/ARB in critically ill patients with hypertension during ICU stay is related to lower all-cause in-hospital mortality, which was independently associated with increased survival in a large and heterogeneous cohort of critically ill hypertensive patients with or without kidney dysfunction.

Download Full-text

Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01318-4 ◽

2020 ◽

Vol 20 (S11) ◽

Cited By ~ 1

Author(s):

Jiancheng Ye ◽

Liang Yao ◽

Jiahong Shen ◽

Rethavathi Janarthanam ◽

Yuan Luo

Keyword(s):

Machine Learning ◽

Health Care ◽

Critically Ill ◽

Health Care Providers ◽

Critically Ill Patients ◽

Care Providers ◽

Learning Models ◽

Clinical Notes ◽

Risk Of Mortality ◽

Machine Learning Models

Abstract Background Diabetes mellitus is a prevalent metabolic disease characterized by chronic hyperglycemia. The avalanche of healthcare data is accelerating precision and personalized medicine. Artificial intelligence and algorithm-based approaches are becoming more and more vital to support clinical decision-making. These methods are able to augment health care providers by taking away some of their routine work and enabling them to focus on critical issues. However, few studies have used predictive modeling to uncover associations between comorbidities in ICU patients and diabetes. This study aimed to use Unified Medical Language System (UMLS) resources, involving machine learning and natural language processing (NLP) approaches to predict the risk of mortality. Methods We conducted a secondary analysis of Medical Information Mart for Intensive Care III (MIMIC-III) data. Different machine learning modeling and NLP approaches were applied. Domain knowledge in health care is built on the dictionaries created by experts who defined the clinical terminologies such as medications or clinical symptoms. This knowledge is valuable to identify information from text notes that assert a certain disease. Knowledge-guided models can automatically extract knowledge from clinical notes or biomedical literature that contains conceptual entities and relationships among these various concepts. Mortality classification was based on the combination of knowledge-guided features and rules. UMLS entity embedding and convolutional neural network (CNN) with word embeddings were applied. Concept Unique Identifiers (CUIs) with entity embeddings were utilized to build clinical text representations. Results The best configuration of the employed machine learning models yielded a competitive AUC of 0.97. Machine learning models along with NLP of clinical notes are promising to assist health care providers to predict the risk of mortality of critically ill patients. Conclusion UMLS resources and clinical notes are powerful and important tools to predict mortality in diabetic patients in the critical care setting. The knowledge-guided CNN model is effective (AUC = 0.97) for learning hidden features.

Download Full-text

High performance logistic regression for privacy-preserving genome analysis

BMC Medical Genomics ◽

10.1186/s12920-020-00869-9 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Martine De Cock ◽

Rafael Dowsley ◽

Anderson C. A. Nascimento ◽

Davis Railsback ◽

Jianwei Shen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Genome Analysis ◽

Local Area Network ◽

Local Area ◽

Activation Function ◽

Area Network ◽

Learning Models ◽

Data Set ◽

Machine Learning Models

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Download Full-text

Comparison of the Performance of Machine Learning Algorithms in Predicting Heart Disease

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.349 ◽

2021 ◽

Vol 10 (1) ◽

pp. 99

Author(s):

Sajad Yousefi

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Disease ◽

Decision Tree ◽

Roc Curve ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Models ◽

Algorithm Performance ◽

Machine Learning Models

Introduction: Heart disease is often associated with conditions such as clogged arteries due to the sediment accumulation which causes chest pain and heart attack. Many people die due to the heart disease annually. Most countries have a shortage of cardiovascular specialists and thus, a significant percentage of misdiagnosis occurs. Hence, predicting this disease is a serious issue. Using machine learning models performed on multidimensional dataset, this article aims to find the most efficient and accurate machine learning models for disease prediction.Material and Methods: Several algorithms were utilized to predict heart disease among which Decision Tree, Random Forest and KNN supervised machine learning are highly mentioned. The algorithms are applied to the dataset taken from the UCI repository including 294 samples. The dataset includes heart disease features. To enhance the algorithm performance, these features are analyzed, the feature importance scores and cross validation are considered.Results: The algorithm performance is compared with each other, so that performance based on ROC curve and some criteria such as accuracy, precision, sensitivity and F1 score were evaluated for each model. As a result of evaluation, Accuracy, AUC ROC are 83% and 99% respectively for Decision Tree algorithm. Logistic Regression algorithm with accuracy and AUC ROC are 88% and 91% respectively has better performance than other algorithms. Therefore, these techniques can be useful for physicians to predict heart disease patients and prescribe them correctly.Conclusion: Machine learning technique can be used in medicine for analyzing the related data collections to a disease and its prediction. The area under the ROC curve and evaluating criteria related to a number of classifying algorithms of machine learning to evaluate heart disease and indeed, the prediction of heart disease is compared to determine the most appropriate classification. As a result of evaluation, better performance was observed in both Decision Tree and Logistic Regression models.

Download Full-text

Predictive modelling of turbofan engine components condition using machine and deep learning methods

Eksploatacja i Niezawodnosc - Maintenance and Reliability ◽

10.17531/ein.2021.2.16 ◽

2021 ◽

Vol 23 (2) ◽

pp. 359-370

Author(s):

Michał Matuszczak ◽

Mateusz Żbikowski ◽

Andrzej Teodorczyk

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Predictive Modelling ◽

Continuous Variable ◽

Environmental Data ◽

Bayesian Optimization ◽

Aviation Industry ◽

Learning Models ◽

Turbofan Engine ◽

Machine Learning Models

The article proposes an approach based on deep and machine learning models to predict a component failure as an enhancement of condition based maintenance scheme of a turbofan engine and reviews currently used prognostics approaches in the aviation industry. Component degradation scale representing its life consumption is proposed and such collected condition data are combined with engines sensors and environmental data. With use of data manipulation techniques, a framework for models training is created and models' hyperparameters obtained through Bayesian optimization. Models predict the continuous variable representing condition based on the input. Best performed model is identified by detemining its score on the holdout set. Deep learning models achieved 0.71 MSE score (ensemble meta-model of neural networks) and outperformed significantly machine learning models with their best score at 1.75. The deep learning models shown their feasibility to predict the component condition within less than 1 unit of the error in the rank scale.

Download Full-text

Machine Learning Models Have Better Performance than Traditional Logistic Regression Models in Predicting the Risk of Diabetes

SSRN Electronic Journal ◽

10.2139/ssrn.3854672 ◽

2021 ◽

Author(s):

Yaqian Mao ◽

Shuyao Pan ◽

Zheng Zhu ◽

Wei Lin ◽

Junping Wen ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Regression Models ◽

Learning Models ◽

Logistic Regression Models ◽

Machine Learning Models

Download Full-text

Identification of Primary Antimicrobial Resistance Drivers in Agricultural Nontyphoidal Salmonella enterica Serovars by Using Machine Learning

mSystems ◽

10.1128/msystems.00211-19 ◽

2019 ◽

Vol 4 (4) ◽

Cited By ~ 1

Author(s):

Finlay Maguire ◽

Muhammad Attiq Rehman ◽

Catherine Carrillo ◽

Moussa S. Diarra ◽

Robert G. Beiko

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Antimicrobial Resistance ◽

Broiler Chicken ◽

Genomic Data ◽

Set Covering ◽

Learning Models ◽

Commercial Chicken ◽

The Impact ◽

Machine Learning Models

ABSTRACT Nontyphoidal Salmonella (NTS) is a leading global cause of bacterial foodborne morbidity and mortality. Our ability to treat severe NTS infections has been impaired by increasing antimicrobial resistance (AMR). To understand and mitigate the global health crisis AMR represents, we need to link the observed resistance phenotypes with their underlying genomic mechanisms. Broiler chickens represent a key reservoir and vector for NTS infections, but isolates from this setting have been characterized in only very low numbers relative to clinical isolates. In this study, we sequenced and assembled 97 genomes encompassing 7 serotypes isolated from broiler chicken in farms in British Columbia between 2005 and 2008. Through application of machine learning (ML) models to predict the observed AMR phenotype from this genomic data, we were able to generate highly (0.92 to 0.99) precise logistic regression models using known AMR gene annotations as features for 7 antibiotics (amoxicillin-clavulanic acid, ampicillin, cefoxitin, ceftiofur, ceftriaxone, streptomycin, and tetracycline). Similarly, we also trained “reference-free” k-mer-based set-covering machine phenotypic prediction models (0.91 to 1.0 precision) for these antibiotics. By combining the inferred k-mers and logistic regression weights, we identified the primary drivers of AMR for the 7 studied antibiotics in these isolates. With our research representing one of the largest studies of a diverse set of NTS isolates from broiler chicken, we can thus confirm that the AmpC-like CMY-2 β-lactamase is a primary driver of β-lactam resistance and that the phosphotransferases APH(6)-Id and APH(3″-Ib) are the principal drivers of streptomycin resistance in this important ecosystem. IMPORTANCE Antimicrobial resistance (AMR) represents an existential threat to the function of modern medicine. Genomics and machine learning methods are being increasingly used to analyze and predict AMR. This type of surveillance is very important to try to reduce the impact of AMR. Machine learning models are typically trained using genomic data, but the aspects of the genomes that they use to make predictions are rarely analyzed. In this work, we showed how, by using different types of machine learning models and performing this analysis, it is possible to identify the key genes underlying AMR in nontyphoidal Salmonella (NTS). NTS is among the leading cause of foodborne illness globally; however, AMR in NTS has not been heavily studied within the food chain itself. Therefore, in this work we performed a broad-scale analysis of the AMR in NTS isolates from commercial chicken farms and identified some priority AMR genes for surveillance.

Download Full-text

A comparison of regularized logistic regression and random forest machine learning models for daytime diagnosis of obstructive sleep apnea

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02206-9 ◽

2020 ◽

Vol 58 (10) ◽

pp. 2517-2529

Author(s):

Farahnaz Hajipour ◽

Mohammad Jafari Jozani ◽

Zahra Moussavi

Keyword(s):

Machine Learning ◽

Obstructive Sleep Apnea ◽

Logistic Regression ◽

Sleep Apnea ◽

Random Forest ◽

Learning Models ◽

Obstructive Sleep ◽

Machine Learning Models

Download Full-text

Machine learning to predict early TNF inhibitor users in patients with ankylosing spondylitis

Scientific Reports ◽

10.1038/s41598-020-75352-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Seulkee Lee ◽

Yeonghee Eun ◽

Hyungjin Kim ◽

Hoon-Suk Cha ◽

Eun-Mi Koh ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

AbstractWe aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis. The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early-TNF and non-early-TNF users. Machine learning models were formulated to predict the early-TNF users using the baseline data. Feature importance analysis was performed to delineate significant baseline characteristics. The numbers of early-TNF and non-early-TNF users were 90 and 505, respectively. The performance of the ANN model, based on the area under curve (AUC) for a receiver operating characteristic curve (ROC) of 0.783, was superior to logistic regression, support vector machine, random forest, and XGBoost models (for an ROC curve of 0.719, 0.699, 0.761, and 0.713, respectively) in predicting early-TNF users. Feature importance analysis revealed CRP and ESR as the top significant baseline characteristics for predicting early-TNF users. Our model displayed superior performance in predicting early-TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.

Download Full-text