scholarly journals Prediction of Sepsis in COVID-19 Using Laboratory Indicators

Author(s):  
Guoxing Tang ◽  
Ying Luo ◽  
Feng Lu ◽  
Wei Li ◽  
Xiongcheng Liu ◽  
...  

BackgroundThe outbreak of coronavirus disease 2019 (COVID-19) has become a global public health concern. Many inpatients with COVID-19 have shown clinical symptoms related to sepsis, which will aggravate the deterioration of patients’ condition. We aim to diagnose Viral Sepsis Caused by SARS-CoV-2 by analyzing laboratory test data of patients with COVID-19 and establish an early predictive model for sepsis risk among patients with COVID-19.MethodsThis study retrospectively investigated laboratory test data of 2,453 patients with COVID-19 from electronic health records. Extreme gradient boosting (XGBoost) was employed to build four models with different feature subsets of a total of 69 collected indicators. Meanwhile, the explainable Shapley Additive ePlanation (SHAP) method was adopted to interpret predictive results and to analyze the feature importance of risk factors.FindingsThe model for classifying COVID-19 viral sepsis with seven coagulation function indicators achieved the area under the receiver operating characteristic curve (AUC) 0.9213 (95% CI, 89.94–94.31%), sensitivity 97.17% (95% CI, 94.97–98.46%), and specificity 82.05% (95% CI, 77.24–86.06%). The model for identifying COVID-19 coagulation disorders with eight features provided an average of 3.68 (±) 4.60 days in advance for early warning prediction with 0.9298 AUC (95% CI, 86.91–99.04%), 82.22% sensitivity (95% CI, 67.41–91.49%), and 84.00% specificity (95% CI, 63.08–94.75%).InterpretationWe found that an abnormality of the coagulation function was related to the occurrence of sepsis and the other routine laboratory test represented by inflammatory factors had a moderate predictive value on coagulopathy, which indicated that early warning of sepsis in COVID-19 patients could be achieved by our established model to improve the patient’s prognosis and to reduce mortality.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Louis Ehwerhemuepha ◽  
Theodore Heyming ◽  
Rachel Marano ◽  
Mary Jane Piroutek ◽  
Antonio C. Arrieta ◽  
...  

AbstractThis study was designed to develop and validate an early warning system for sepsis based on a predictive model of critical decompensation. Data from the electronic medical records for 537,837 visits to a pediatric Emergency Department (ED) from March 2013 to December 2019 were collected. A multiclass stochastic gradient boosting model was built to identify early warning signs associated with death, severe sepsis, non-severe sepsis, and bacteremia. Model features included triage vital signs, previous diagnoses, medications, and healthcare utilizations within 6 months of the index ED visit. There were 483 patients who had severe sepsis and/or died, 1102 had non-severe sepsis, 1103 had positive bacteremia tests, and the remaining had none of the events. The most important predictors were age, heart rate, length of stay of previous hospitalizations, temperature, systolic blood pressure, and prior sepsis. The one-versus-all area under the receiver operator characteristic curve (AUROC) were 0.979 (0.967, 0.991), 0.990 (0.985, 0.995), 0.976 (0.972, 0.981), and 0.968 (0.962, 0.974) for death, severe sepsis, non-severe sepsis, and bacteremia without sepsis respectively. The multi-class macro average AUROC and area under the precision recall curve were 0.977 and 0.316 respectively. The study findings were used to develop an automated early warning decision tool for sepsis. Implementation of this model in pediatric EDs will allow sepsis-related critical decompensation to be predicted accurately after a few seconds of triage.


2021 ◽  
Vol 8 ◽  
Author(s):  
Ruixia Cui ◽  
Wenbo Hua ◽  
Kai Qu ◽  
Heran Yang ◽  
Yingmu Tong ◽  
...  

Sepsis-associated coagulation dysfunction greatly increases the mortality of sepsis. Irregular clinical time-series data remains a major challenge for AI medical applications. To early detect and manage sepsis-induced coagulopathy (SIC) and sepsis-associated disseminated intravascular coagulation (DIC), we developed an interpretable real-time sequential warning model toward real-world irregular data. Eight machine learning models including novel algorithms were devised to detect SIC and sepsis-associated DIC 8n (1 ≤ n ≤ 6) hours prior to its onset. Models were developed on Xi'an Jiaotong University Medical College (XJTUMC) and verified on Beth Israel Deaconess Medical Center (BIDMC). A total of 12,154 SIC and 7,878 International Society on Thrombosis and Haemostasis (ISTH) overt-DIC labels were annotated according to the SIC and ISTH overt-DIC scoring systems in train set. The area under the receiver operating characteristic curve (AUROC) were used as model evaluation metrics. The eXtreme Gradient Boosting (XGBoost) model can predict SIC and sepsis-associated DIC events up to 48 h earlier with an AUROC of 0.929 and 0.910, respectively, and even reached 0.973 and 0.955 at 8 h earlier, achieving the highest performance to date. The novel ODE-RNN model achieved continuous prediction at arbitrary time points, and with an AUROC of 0.962 and 0.936 for SIC and DIC predicted 8 h earlier, respectively. In conclusion, our model can predict the sepsis-associated SIC and DIC onset up to 48 h in advance, which helps maximize the time window for early management by physicians.


2021 ◽  
Vol 25 (Spec. issue 1) ◽  
pp. 1-7
Author(s):  
Ahmet Yurttakal

The thermal conductivity estimation for the soil is an important step for many geothermal applications. But it is a difficult and complicated process since it involves a variety of factors that have significant effects on the thermal conductivity of soils such as soil moisture and granular structure. In this study, regression was performed with the extreme gradient boosting algorithm to develop a model for estimating thermal conductivity value. The performance of the model was measured on the unseen test data. As a result, the proposed algorithm reached 0.18 RMSE, 0.99 R2, and 3.18% MAE values which state that the algorithm is encouraging.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e14069-e14069
Author(s):  
Oguz Akbilgic ◽  
Ibrahim Karabayir ◽  
Hakan Gunturkun ◽  
Joseph F Pierre ◽  
Ashley C Rashe ◽  
...  

e14069 Background: There is growing interest in the links between cancer and the gut microbiome. However, the effect of chemotherapy upon the gut microbiome remains unknown. We studied whether machine learning can: 1) accurately classify subjects with cancer vs healthy controls and 2) whether this classification model is affected by chemotherapy exposure status. Methods: We used the American Gut Project data to build a extreme gradient boosting (XGBoost) model to distinguish between subjects with cancer vs healthy controls using data on simple demographics and published microbiome. We then further explore the selected features for cancer subjects based on chemotherapy exposure. Results: The cohort included 7,685 subjects consisting of 561 subjects with cancer, 52.5% female, 87.3% White, and average age of 44.7 (SD 17.7). The binary outcome variable represents cancer status. Among 561 subjects with cancer, 94 of them were treated with chemotherapy agents before sampling of microbiomes. As predictors, there were four demographic variables (sex, race, age, BMI) and 1,812 operational taxonomic units (OTUs) each found in at least 2 subjects via RNA sequencing. We randomly split data into 80% training and 20% hidden test. We then built an XGBoost model with 5-fold cross-validation using only training data yielding an AUC (with 95% CI) of 0.79 (0.77, 0.80) and obtained the almost the same AUC on the hidden test data. Based on feature importance analysis, we identified 12 most important features (Age, BMI and 12 OTUs; 4C0d-2, Brachyspirae, Methanosphaera, Geodermatophilaceae, Bifidobacteriaceae, Slackia, Staphylococcus, Acidaminoccus, Devosia, Proteus) and rebuilt a model using only these features and obtained AUC of 0.80 (0.77, 0.83) on the hidden test data. The average predicted probabilities for controls, cancer patients who were exposed to chemotherapy, and cancer patients who were not were 0.071 (0.070,0.073), 0.125 (0.110, 0.140), 0.156 (0.148, 0.164), respectively. There was no statistically significant difference on levels of these 12 OTUs between cancer subjects treated with and without chemotherapy. Conclusions: Machine learning achieved a moderately high accuracy identifying patients’ cancer status based on microbiome. Despite the literature on microbiome and chemotherapy interaction, the levels of 12 OTUs used in our model were not significantly different for cancer patients with or without chemotherapy exposure. Testing this model on other large population databases is needed for broader validation.


Author(s):  
He Yang ◽  
Emma Li ◽  
Yi Fang Cai ◽  
Jiapei Li ◽  
George X. Yuan

The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.


2021 ◽  
Vol 10 (9) ◽  
pp. 1875
Author(s):  
I-Min Chiu ◽  
Chi-Yung Cheng ◽  
Wun-Huei Zeng ◽  
Ying-Hsien Huang ◽  
Chun-Hung Richard Lin

Background: The aim of this study was to develop and evaluate a machine learning (ML) model to predict invasive bacterial infections (IBIs) in young febrile infants visiting the emergency department (ED). Methods: This retrospective study was conducted in the EDs of three medical centers across Taiwan from 2011 to 2018. We included patients age in 0–60 days who were visiting the ED with clinical symptoms of fever. We developed three different ML algorithms, including logistic regression (LR), supportive vector machine (SVM), and extreme gradient boosting (XGboost), comparing their performance at predicting IBIs to a previous validated score system (IBI score). Results: During the study period, 4211 patients were included, where 126 (3.1%) had IBI. A total of eight, five, and seven features were used in the LR, SVM, and XGboost through the feature selection process, respectively. The ML models can achieve a better AUROC value when predicting IBIs in young infants compared with the IBI score (LR: 0.85 vs. SVM: 0.84 vs. XGBoost: 0.85 vs. IBI score: 0.70, p-value < 0.001). Using a cost sensitive learning algorithm, all ML models showed better specificity in predicting IBIs at a 90% sensitivity level compared to an IBI score > 2 (LR: 0.59 vs. SVM: 0.60 vs. XGBoost: 0.57 vs. IBI score >2: 0.43, p-value < 0.001). Conclusions: All ML models developed in this study outperformed the traditional scoring system in stratifying low-risk febrile infants after the standardized sensitivity level.


Atmosphere ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 373 ◽  
Author(s):  
Mehdi Zamani Joharestani ◽  
Chunxiang Cao ◽  
Xiliang Ni ◽  
Barjeece Bashir ◽  
Somayeh Talebiesfandarani

In recent years, air pollution has become an important public health concern. The high concentration of fine particulate matter with diameter less than 2.5 µm (PM2.5) is known to be associated with lung cancer, cardiovascular disease, respiratory disease, and metabolic disease. Predicting PM2.5 concentrations can help governments warn people at high risk, thus mitigating the complications. Although attempts have been made to predict PM2.5 concentrations, the factors influencing PM2.5 prediction have not been investigated. In this work, we study feature importance for PM2.5 prediction in Tehran’s urban area, implementing random forest, extreme gradient boosting, and deep learning machine learning (ML) approaches. We use 23 features, including satellite and meteorological data, ground-measured PM2.5, and geographical data, in the modeling. The best model performance obtained was R2 = 0.81 (R = 0.9), MAE = 9.93 µg/m3, and RMSE = 13.58 µg/m3 using the XGBoost approach, incorporating elimination of unimportant features. However, all three ML methods performed similarly and R2 varied from 0.63 to 0.67, when Aerosol Optical Depth (AOD) at 3 km resolution was included, and 0.77 to 0.81, when AOD at 3 km resolution was excluded. Contrary to the PM2.5 lag data, satellite-derived AODs did not improve model performance.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jiaxin Fan ◽  
Mengying Chen ◽  
Jian Luo ◽  
Shusen Yang ◽  
Jinming Shi ◽  
...  

Abstract Background Screening carotid B-mode ultrasonography is a frequently used method to detect subjects with carotid atherosclerosis (CAS). Due to the asymptomatic progression of most CAS patients, early identification is challenging for clinicians, and it may trigger ischemic stroke. Recently, machine learning has shown a strong ability to classify data and a potential for prediction in the medical field. The combined use of machine learning and the electronic health records of patients could provide clinicians with a more convenient and precise method to identify asymptomatic CAS. Methods Retrospective cohort study using routine clinical data of medical check-up subjects from April 19, 2010 to November 15, 2019. Six machine learning models (logistic regression [LR], random forest [RF], decision tree [DT], eXtreme Gradient Boosting [XGB], Gaussian Naïve Bayes [GNB], and K-Nearest Neighbour [KNN]) were used to predict asymptomatic CAS and compared their predictability in terms of the area under the receiver operating characteristic curve (AUCROC), accuracy (ACC), and F1 score (F1). Results Of the 18,441 subjects, 6553 were diagnosed with asymptomatic CAS. Compared to DT (AUCROC 0.628, ACC 65.4%, and F1 52.5%), the other five models improved prediction: KNN + 7.6% (0.704, 68.8%, and 50.9%, respectively), GNB + 12.5% (0.753, 67.0%, and 46.8%, respectively), XGB + 16.0% (0.788, 73.4%, and 55.7%, respectively), RF + 16.6% (0.794, 74.5%, and 56.8%, respectively) and LR + 18.1% (0.809, 74.7%, and 59.9%, respectively). The highest achieving model, LR predicted 1045/1966 cases (sensitivity 53.2%) and 3088/3566 non-cases (specificity 86.6%). A tenfold cross-validation scheme further verified the predictive ability of the LR. Conclusions Among machine learning models, LR showed optimal performance in predicting asymptomatic CAS. Our findings set the stage for an early automatic alarming system, allowing a more precise allocation of CAS prevention measures to individuals probably to benefit most.


2021 ◽  
Vol 271 ◽  
pp. 03030
Author(s):  
Wufan Xuan ◽  
Zhen Han ◽  
Lina Zheng

Silicosis is a fibrotic lung disease caused by inhalation of silica dusts, early and accurate diagnosis of which remains a challenge. We aimed to assess the performance of a nanofiber sensor array and pattern recognition to promptly and noninvasively detect silicosis. A total of 210 silicosis cases and 430 non-silicosis controls were enrolled in a cross-sectional study. Exhaled breath was analysed by a portable analytical system incorporating an array of 16x organic nanofiber sensors. Models were established by Deep Neural Network and eXtreme Gradient Boosting. Linear Discriminant Analysis was used for dimensionality reduction and visualized data analysis. Receiver Operating Characteristic Curve, accuracy, sensitivity and specificity were used to evaluate models. Results: 99.3% AUC, 96.0% accuracy, 94.1% sensitivity, and 96.3% specificity were achieved in test set. Silicosis cases present different breath patterns from healthy controls, classification results using which were highly consistent with the experts’ diagnosis. Breath analysis performed with the sensor array and pattern recognition is expected to provide a quick, stable recognition for silicosis. In this paper, different forms of features, different algorithms and data sets over long time periods were used, which provides a reference for silicosis expiratory diagnosis scheme.


Author(s):  
Nelson Yego ◽  
Juma Kasozi ◽  
Joseph Nkrunziza

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.


Sign in / Sign up

Export Citation Format

Share Document