scholarly journals A machine learning predictive model of in-hospital mortality in patients with sepsis complicated by anemia: a retrospective study based on the MIMIC-III database

Author(s):  
Xiaobin Liu ◽  
Yu Zhao ◽  
Yingyi Qin ◽  
Dan Wang ◽  
Xi Yin ◽  
...  

Abstract BackgroudPatients with sepsis complicated by anemia have a higher risk of mortality. It is clinically important to study the risk factors associated with the prognosis of this disease. The aim of this study was to establish a predictive model of mortality during hospitalization by extracting clinical data from the Medical Information Mart for Intensive Care III (MIMIC-III) database. MethodsThe clinical data of patients with sepsis complicated by anemia in the MIMIC-III database were retrospectively analyzed. Indexes were screened by stepwise logistic regression (LR), and machine learning predictive models such as Decision Tree (DT), Random Forests (RF), and eXtreme Gradient Boosting (XGBoost) were developed and compared, identifying advantages and disadvantages of each model. ResultsA total of 13,547 patients with sepsis complicated by anemia were included in the study, among which 1,827 died during hospitalization and 11,720 were still alive at discharge. The preliminary stepwise regression model selected 20 clinical indexes, including Elixhauser comorbidity index, maximum blood urea nitrogen (BUN), and maximum hemoglobin reduction. The predictive models showed good discriminative ability (area under the receiver operating characteristic curve [AUROC]:LR, 0.777; DT, 0.726; RF, 0.788; XGBoost, 0.815) and goodness of fit (area under the precision-recall curve [AUPRC]: LR, 0.350; DT, 0.290; RF, 0.400; XGBoost, 0.428). The Shapley Additive exPlanation (SHAP) values in the XGBoost model showed that Elixhauser comorbidity index, maximum BUN, maximum hemoglobin reduction, ventilator use within 24 hours of admission, and age were significant features for predicting in-hospital mortality in patients with sepsis complicated by anemia. ConclusionsThe XGBoost model had better discrimination ability and goodness of fit when compared with other models. Machine learning algorithms have significant practical value in the development of an early warning system for patients with sepsis complicated by anemia.

2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
J A Ortiz ◽  
R Morales ◽  
B Lledo ◽  
E Garcia-Hernandez ◽  
A Cascales ◽  
...  

Abstract Study question Is it possible to predict the likelihood of an IVF embryo being aneuploid and/or mosaic using a machine learning algorithm? Summary answer There are paternal, maternal, embryonic and IVF-cycle factors that are associated with embryonic chromosomal status that can be used as predictors in machine learning models. What is known already The factors associated with embryonic aneuploidy have been extensively studied. Mostly maternal age and to a lesser extent male factor and ovarian stimulation have been related to the occurrence of chromosomal alterations in the embryo. On the other hand, the main factors that may increase the incidence of embryo mosaicism have not yet been established. The models obtained using classical statistical methods to predict embryonic aneuploidy and mosaicism are not of high reliability. As an alternative to traditional methods, different machine and deep learning algorithms are being used to generate predictive models in different areas of medicine, including human reproduction. Study design, size, duration The study design is observational and retrospective. A total of 4654 embryos from 1558 PGT-A cycles were included (January-2017 to December-2020). The trophoectoderm biopsies on D5, D6 or D7 blastocysts were analysed by NGS. Embryos with ≤25% aneuploid cells were considered euploid, between 25-50% were classified as mosaic and aneuploid with >50%. The variables of the PGT-A were recorded in a database from which predictive models of embryonic aneuploidy and mosaicism were developed. Participants/materials, setting, methods The main indications for PGT-A were advanced maternal age, abnormal sperm FISH and recurrent miscarriage or implantation failure. Embryo analysis were performed using Veriseq-NGS (Illumina). The software used to carry out all the analysis was R (RStudio). The library used to implement the different algorithms was caret. In the machine learning models, 22 predictor variables were introduced, which can be classified into 4 categories: maternal, paternal, embryonic and those specific to the IVF cycle. Main results and the role of chance The different couple, embryo and stimulation cycle variables were recorded in a database (22 predictor variables). Two different predictive models were performed, one for aneuploidy and the other for mosaicism. The predictor variable was of multi-class type since it included the segmental and whole chromosome alteration categories. The dataframe were first preprocessed and the different classes to be predicted were balanced. A 80% of the data were used for training the model and 20% were reserved for further testing. The classification algorithms applied include multinomial regression, neural networks, support vector machines, neighborhood-based methods, classification trees, gradient boosting, ensemble methods, Bayesian and discriminant analysis-based methods. The algorithms were optimized by minimizing the Log_Loss that measures accuracy but penalizing misclassifications. The best predictive models were achieved with the XG-Boost and random forest algorithms. The AUC of the predictive model for aneuploidy was 80.8% (Log_Loss 1.028) and for mosaicism 84.1% (Log_Loss: 0.929). The best predictor variables of the models were maternal age, embryo quality, day of biopsy and whether or not the couple had a history of pregnancies with chromosomopathies. The male factor only played a relevant role in the mosaicism model but not in the aneuploidy model. Limitations, reasons for caution Although the predictive models obtained can be very useful to know the probabilities of achieving euploid embryos in an IVF cycle, increasing the sample size and including additional variables could improve the models and thus increase their predictive capacity. Wider implications of the findings Machine learning can be a very useful tool in reproductive medicine since it can allow the determination of factors associated with embryonic aneuploidies and mosaicism in order to establish a predictive model for both. To identify couples at risk of embryo aneuploidy/mosaicism could benefit them of the use of PGT-A. Trial registration number Not Applicable


2021 ◽  
Author(s):  
Yue Yu ◽  
Chi Peng ◽  
Zhiyuan Zhang ◽  
Kejia Shen ◽  
Yufeng Zhang ◽  
...  

Abstract Background Establishing a mortality prediction model of patients undergoing cardiac surgery might be useful for clinicians for alerting, judgment, and intervention, while few predictive tools for long-term mortality have been developed targeting patients post-cardiac surgery. Objective We aimed to construct and validate several machine learning (ML) algorithms to predict long-term mortality and identify risk factors in unselected patients after cardiac surgery during a 4-year follow-up. Methods The Medical Information Mart for Intensive Care (MIMIC-III) database was used to perform a retrospective administrative database study. Candidate predictors consisted of the demographics, comorbidity, vital signs, laboratory test results, prognostic scoring systems, and treatment information on the first day of ICU admission. 4-year mortality was set as the study outcome. We used the ML methods of logistic regression (LR), artificial neural network (NNET), naïve bayes (NB), gradient boosting machine (GBM), adapting boosting (Ada), random forest (RF), bagged trees (BT), and eXtreme Gradient Boosting (XGB). The prognostic capacity and clinical utility of these ML models were compared using the area under the receiver operating characteristic curves (AUC), calibration curves, and decision curve analysis (DCA). Results Of 7,368 patients in MIMIC-III included in the final cohort, a total of 1,337 (18.15%) patients died during a 4-year follow-up. Among 65 variables extracted from the database, a total of 25 predictors were selected using recursive feature elimination (RFE) and included in the subsequent analysis. The Ada model performed best among eight models in both discriminatory ability with the highest AUC of 0.801 and goodness of fit (visualized by calibration curve). Moreover, the DCA shows that the net benefit of the RF, Ada, and BT models surpassed that of other ML models for almost all threshold probability values. Additionally, through the Ada technique, we determined that red blood cell distribution width (RDW), blood urea nitrogen (BUN), SAPS II, anion gap (AG), age, urine output, chloride, creatinine, congestive heart failure, and SOFA were the Top 10 predictors in the feature importance rankings. Conclusions The Ada model performs best in predicting long-term mortality after cardiac surgery among the eight ML models. The ML-based algorithms might have significant application in the development of early warning systems for patients following operations.


2021 ◽  
Vol 8 (1) ◽  
pp. e000761
Author(s):  
Hao Du ◽  
Kewin Tien Ho Siah ◽  
Valencia Zhang Ru-Yan ◽  
Readon Teh ◽  
Christopher Yu En Tan ◽  
...  

Research objectivesClostriodiodes difficile infection (CDI) is a major cause of healthcare-associated diarrhoea with high mortality. There is a lack of validated predictors for severe outcomes in CDI. The aim of this study is to derive and validate a clinical prediction tool for CDI in-hospital mortality using a large critical care database.MethodologyThe demographics, clinical parameters, laboratory results and mortality of CDI were extracted from the Medical Information Mart for Intensive Care-III (MIMIC-III) database. We subsequently trained three machine learning models: logistic regression (LR), random forest (RF) and gradient boosting machine (GBM) to predict in-hospital mortality. The individual performances of the models were compared against current severity scores (Clostridiodes difficile Associated Risk of Death Score (CARDS) and ATLAS (Age, Treatment with systemic antibiotics, leukocyte count, Albumin and Serum creatinine as a measure of renal function) by calculating area under receiver operating curve (AUROC). We identified factors associated with higher mortality risk in each model.Summary of resultsFrom 61 532 intensive care unit stays in the MIMIC-III database, there were 1315 CDI cases. The mortality rate for CDI in the study cohort was 18.33%. AUROC was 0.69 (95% CI, 0.60 to 0.76) for LR, 0.71 (95% CI, 0.62 to 0.77) for RF and 0.72 (95% CI, 0.64 to 0.78) for GBM, while previously AUROC was 0.57 (95% CI, 0.51 to 0.65) for CARDS and 0.63 (95% CI, 0.54 to 0.70) for ATLAS. Albumin, lactate and bicarbonate were significant mortality factors for all the models. Free calcium, potassium, white blood cell, urea, platelet and mean blood pressure were present in at least two of the three models.ConclusionOur machine learning derived CDI in-hospital mortality prediction model identified pertinent factors that can assist critical care clinicians in identifying patients at high risk of dying from CDI.


Author(s):  
Chunsheng Yang ◽  
Yanni Zou ◽  
Jie Liu ◽  
Kyle R Mulligan

In the past decades, machine learning techniques or algorithms, particularly, classifiers have been widely applied to various real-world applications such as PHM. In developing high-performance classifiers, or machine learning-based models, i.e. predictive model for PHM, the predictive model evaluation remains a challenge. Generic methods such as accuracy may not fully meet the needs of models evaluation for prognostic applications. This paper addresses this issue from the point of view of PHM systems. Generic methods are first reviewed while outlining their limitations or deficiencies with respect to PHM. Then, two approaches developed for evaluating predictive models are presented with emphasis on specificities and requirements of PHM. A case of real prognostic application is studies to demonstrate the usefulness of two proposed methods for predictive model evaluation. We argue that predictive models for PHM must be evaluated not only using generic methods, but also domain-oriented approaches in order to deploy the models in real-world applications.


2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
F Garcia-Rodeja Arias ◽  
M Perez Dominguez ◽  
J Martinon Martinez ◽  
J M Garcia Acuna ◽  
C Abou Joch Casas ◽  
...  

Abstract Introduction and objectives Cardiogenic shock is a condition caused by reduced cardiac output and hypotension, resulting in end-organ damage and multiorgan failure. Although prognosis has been improved in recent years, this state is still associated with high morbidity and mortality. The aim of our study was to perform a predictive model for in-hospital mortality that allows stratifying the risk of death in patients with cardiogenic shock. Methods This is a retrospective analysis from a prospective registry, that included 135 patients from one Spanish Universitary Hospital between 2011 and 2020. Multivariate analysis was performed among those variables with significant association with short-term outcome of univariate analysis with a p-value <0.2. Those variables which had a p-value >0.1 in the multivariable analysis were excluded of the final model. Our method was assessed using the area under the ROC-curve (AUC). Goodness of fit was tested using Hosmer-Lemeshow statistic test. Finally, we performed a risk score using the pondered weight of the coefficients of a simplified model created after categorizing the continuous quantitative variables included in the final model, giving a maximum of 16 points and creating three categories of risk. Results The in-hospital mortality rate was 41.5%, the average of age was 74.2 years, 35.6% were females and acute coronary syndrome (ACS) was the main cause of shock (60.7%). Mitral regurgitation (moderate-severe), age, ACS etiology, NT-proBNP, blood hemoglobin and lactate at admission were included in the final model. Risk-adjustment model had good accuracy in predicting in-hospital mortality (AUC 0.85; 95% CI 0,78–0,90) and the goodness of fit test was p-value>0.10. According to the risk score made with the simplified model, these patients were stratified into three categories: low (scores 0–6), intermediate (scores 7–10), and high (scores 11–16) risk with observed mortality of 12.9%, 49.1% and 87.5% respectively (p<0,001). Conclusions Our predictive model using six variables, shows good discernment for in-hospital mortality and the risk score has identified three groups with significant differences in prognosis. This model could help in guiding treatments and clinical decision-making, so it needs external validation and to be compared with other models already published. FUNDunding Acknowledgement Type of funding sources: None. ROC curve Risk Score


2020 ◽  
Vol 10 (2) ◽  
pp. 21 ◽  
Author(s):  
Gopi Battineni ◽  
Getu Gamo Sagaro ◽  
Nalini Chinatalapudi ◽  
Francesco Amenta

This paper reviews applications of machine learning (ML) predictive models in the diagnosis of chronic diseases. Chronic diseases (CDs) are responsible for a major portion of global health costs. Patients who suffer from these diseases need lifelong treatment. Nowadays, predictive models are frequently applied in the diagnosis and forecasting of these diseases. In this study, we reviewed the state-of-the-art approaches that encompass ML models in the primary diagnosis of CD. This analysis covers 453 papers published between 2015 and 2019, and our document search was conducted from PubMed (Medline), and Cumulative Index to Nursing and Allied Health Literature (CINAHL) libraries. Ultimately, 22 studies were selected to present all modeling methods in a precise way that explains CD diagnosis and usage models of individual pathologies with associated strengths and limitations. Our outcomes suggest that there are no standard methods to determine the best approach in real-time clinical practice since each method has its advantages and disadvantages. Among the methods considered, support vector machines (SVM), logistic regression (LR), clustering were the most commonly used. These models are highly applicable in classification, and diagnosis of CD and are expected to become more important in medical practice in the near future.


2018 ◽  
Vol 24 (1) ◽  
pp. 214-228 ◽  
Author(s):  
Kush Aggarwal ◽  
R.J. Urbanic ◽  
Syed Mohammad Saqib

Purpose The purpose of this work is to explore predictive model approaches for selecting laser cladding process settings for a desired bead geometry/overlap strategy. Complementing the modelling challenges is the development of a framework and methodologies to minimize data collection while maximizing the goodness of fit for the predictive models. This is essential for developing a foundation for metallic additive manufacturing process planning solutions. Design/methodology/approach Using the coaxial powder flow laser cladding method, 420 steel cladding powder is deposited on low carbon structural steel plates. A design of experiments (DOE) approach is taken using the response surface methodology (RSM) to establish the experimental configuration. The five process parameters such as laser power, travel speed, etc. are varied to explore their impact on the bead geometry. A total of three replicate experiments are performed and the collected data are assessed using a variety of methods to determine the process trends and the best modelling approaches. Findings There exist unpredictable, non-linear relationships between the process parameters and the bead geometry. The best fit for a predictive model is achieved with the artificial neural network (ANN) approach. Using the RSM, the experimental set is reduced by an order of magnitude; however, a model with R2 = 0.96 is generated with ANN. The predictive model goodness of fit for a single bead is similar to that for the overlapping bead geometry using ANN. Originality/value Developing a bead shape to process parameters model is challenging due to the non-linear coupling between the process parameters and the bead geometry and the number of parameters to be considered. The experimental design and modelling approaches presented in this work illustrate how designed experiments can minimize the data collection and produce a robust predictive model. The output of this work will provide a solid foundation for process planning operations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Svyatoslav Khamzin ◽  
Arsenii Dokuchaev ◽  
Anastasia Bazhutina ◽  
Tatiana Chumarnaya ◽  
Stepan Zubarev ◽  
...  

Background: Up to 30–50% of chronic heart failure patients who underwent cardiac resynchronization therapy (CRT) do not respond to the treatment. Therefore, patient stratification for CRT and optimization of CRT device settings remain a challenge.Objective: The main goal of our study is to develop a predictive model of CRT outcome using a combination of clinical data recorded in patients before CRT and simulations of the response to biventricular (BiV) pacing in personalized computational models of the cardiac electrophysiology.Materials and Methods: Retrospective data from 57 patients who underwent CRT device implantation was utilized. Positive response to CRT was defined by a 10% increase in the left ventricular ejection fraction in a year after implantation. For each patient, an anatomical model of the heart and torso was reconstructed from MRI and CT images and tailored to ECG recorded in the participant. The models were used to compute ventricular activation time, ECG duration and electrical dyssynchrony indices during intrinsic rhythm and BiV pacing from the sites of implanted leads. For building a predictive model of CRT response, we used clinical data recorded before CRT device implantation together with model-derived biomarkers of ventricular excitation in the left bundle branch block mode of activation and under BiV stimulation. Several Machine Learning (ML) classifiers and feature selection algorithms were tested on the hybrid dataset, and the quality of predictors was assessed using the area under receiver operating curve (ROC AUC). The classifiers on the hybrid data were compared with ML models built on clinical data only.Results: The best ML classifier utilizing a hybrid set of clinical and model-driven data demonstrated ROC AUC of 0.82, an accuracy of 0.82, sensitivity of 0.85, and specificity of 0.78, improving quality over that of ML predictors built on clinical data from much larger datasets by more than 0.1. Distance from the LV pacing site to the post-infarction zone and ventricular activation characteristics under BiV pacing were shown as the most relevant model-driven features for CRT response classification.Conclusion: Our results suggest that combination of clinical and model-driven data increases the accuracy of classification models for CRT outcomes.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0244629
Author(s):  
Ali A. El-Solh ◽  
Yolanda Lawson ◽  
Michael Carter ◽  
Daniel A. El-Solh ◽  
Kari A. Mergenhagen

Objective Our objective is to compare the predictive accuracy of four recently established outcome models of patients hospitalized with coronavirus disease 2019 (COVID-19) published between January 1st and May 1st 2020. Methods We used data obtained from the Veterans Affairs Corporate Data Warehouse (CDW) between January 1st, 2020, and May 1st 2020 as an external validation cohort. The outcome measure was hospital mortality. Areas under the ROC (AUC) curves were used to evaluate discrimination of the four predictive models. The Hosmer–Lemeshow (HL) goodness-of-fit test and calibration curves assessed applicability of the models to individual cases. Results During the study period, 1634 unique patients were identified. The mean age of the study cohort was 68.8±13.4 years. Hypertension, hyperlipidemia, and heart disease were the most common comorbidities. The crude hospital mortality was 29% (95% confidence interval [CI] 0.27–0.31). Evaluation of the predictive models showed an AUC range from 0.63 (95% CI 0.60–0.66) to 0.72 (95% CI 0.69–0.74) indicating fair to poor discrimination across all models. There were no significant differences among the AUC values of the four prognostic systems. All models calibrated poorly by either overestimated or underestimated hospital mortality. Conclusions All the four prognostic models examined in this study portend high-risk bias. The performance of these scores needs to be interpreted with caution in hospitalized patients with COVID-19.


2020 ◽  
Author(s):  
Patrick Schwab ◽  
August DuMont Schütte ◽  
Benedikt Dietz ◽  
Stefan Bauer

BACKGROUND COVID-19 is a rapidly emerging respiratory disease caused by SARS-CoV-2. Due to the rapid human-to-human transmission of SARS-CoV-2, many health care systems are at risk of exceeding their health care capacities, in particular in terms of SARS-CoV-2 tests, hospital and intensive care unit (ICU) beds, and mechanical ventilators. Predictive algorithms could potentially ease the strain on health care systems by identifying those who are most likely to receive a positive SARS-CoV-2 test, be hospitalized, or admitted to the ICU. OBJECTIVE The aim of this study is to develop, study, and evaluate clinical predictive models that estimate, using machine learning and based on routinely collected clinical data, which patients are likely to receive a positive SARS-CoV-2 test or require hospitalization or intensive care. METHODS Using a systematic approach to model development and optimization, we trained and compared various types of machine learning models, including logistic regression, neural networks, support vector machines, random forests, and gradient boosting. To evaluate the developed models, we performed a retrospective evaluation on demographic, clinical, and blood analysis data from a cohort of 5644 patients. In addition, we determined which clinical features were predictive to what degree for each of the aforementioned clinical tasks using causal explanations. RESULTS Our experimental results indicate that our predictive models identified patients that test positive for SARS-CoV-2 a priori at a sensitivity of 75% (95% CI 67%-81%) and a specificity of 49% (95% CI 46%-51%), patients who are SARS-CoV-2 positive that require hospitalization with 0.92 area under the receiver operator characteristic curve (AUC; 95% CI 0.81-0.98), and patients who are SARS-CoV-2 positive that require critical care with 0.98 AUC (95% CI 0.95-1.00). CONCLUSIONS Our results indicate that predictive models trained on routinely collected clinical data could be used to predict clinical pathways for COVID-19 and, therefore, help inform care and prioritize resources.


Sign in / Sign up

Export Citation Format

Share Document