scholarly journals An Artificial Neural Network–Based Pediatric Mortality Risk Score: Development and Performance Evaluation Using Data From a Large North American Registry (Preprint)

2020 ◽  
Author(s):  
Niema Ghanad Poor ◽  
Nicholas C West ◽  
Rama Syamala Sreepada ◽  
Srinivas Murthy ◽  
Matthias Görges

BACKGROUND In the pediatric intensive care unit (PICU), quantifying illness severity can be guided by risk models to enable timely identification and appropriate intervention. Logistic regression models, including the pediatric index of mortality 2 (PIM-2) and pediatric risk of mortality III (PRISM-III), produce a mortality risk score using data that are routinely available at PICU admission. Artificial neural networks (ANNs) outperform regression models in some medical fields. OBJECTIVE In light of this potential, we aim to examine ANN performance, compared to that of logistic regression, for mortality risk estimation in the PICU. METHODS The analyzed data set included patients from North American PICUs whose discharge diagnostic codes indicated evidence of infection and included the data used for the PIM-2 and PRISM-III calculations and their corresponding scores. We stratified the data set into training and test sets, with approximately equal mortality rates, in an effort to replicate real-world data. Data preprocessing included imputing missing data through simple substitution and normalizing data into binary variables using PRISM-III thresholds. A 2-layer ANN model was built to predict pediatric mortality, along with a simple logistic regression model for comparison. Both models used the same features required by PIM-2 and PRISM-III. Alternative ANN models using single-layer or unnormalized data were also evaluated. Model performance was compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC) and their empirical 95% CIs. RESULTS Data from 102,945 patients (including 4068 deaths) were included in the analysis. The highest performing ANN (AUROC 0.871, 95% CI 0.862-0.880; AUPRC 0.372, 95% CI 0.345-0.396) that used normalized data performed better than PIM-2 (AUROC 0.805, 95% CI 0.801-0.816; AUPRC 0.234, 95% CI 0.213-0.255) and PRISM-III (AUROC 0.844, 95% CI 0.841-0.855; AUPRC 0.348, 95% CI 0.322-0.367). The performance of this ANN was also significantly better than that of the logistic regression model (AUROC 0.862, 95% CI 0.852-0.872; AUPRC 0.329, 95% CI 0.304-0.351). The performance of the ANN that used unnormalized data (AUROC 0.865, 95% CI 0.856-0.874) was slightly inferior to our highest performing ANN; the single-layer ANN architecture performed poorly and was not investigated further. CONCLUSIONS A simple ANN model performed slightly better than the benchmark PIM-2 and PRISM-III scores and a traditional logistic regression model trained on the same data set. The small performance gains achieved by this two-layer ANN model may not offer clinically significant improvement; however, further research with other or more sophisticated model designs and better imputation of missing data may be warranted. CLINICALTRIAL

10.2196/24079 ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. e24079
Author(s):  
Niema Ghanad Poor ◽  
Nicholas C West ◽  
Rama Syamala Sreepada ◽  
Srinivas Murthy ◽  
Matthias Görges

Background In the pediatric intensive care unit (PICU), quantifying illness severity can be guided by risk models to enable timely identification and appropriate intervention. Logistic regression models, including the pediatric index of mortality 2 (PIM-2) and pediatric risk of mortality III (PRISM-III), produce a mortality risk score using data that are routinely available at PICU admission. Artificial neural networks (ANNs) outperform regression models in some medical fields. Objective In light of this potential, we aim to examine ANN performance, compared to that of logistic regression, for mortality risk estimation in the PICU. Methods The analyzed data set included patients from North American PICUs whose discharge diagnostic codes indicated evidence of infection and included the data used for the PIM-2 and PRISM-III calculations and their corresponding scores. We stratified the data set into training and test sets, with approximately equal mortality rates, in an effort to replicate real-world data. Data preprocessing included imputing missing data through simple substitution and normalizing data into binary variables using PRISM-III thresholds. A 2-layer ANN model was built to predict pediatric mortality, along with a simple logistic regression model for comparison. Both models used the same features required by PIM-2 and PRISM-III. Alternative ANN models using single-layer or unnormalized data were also evaluated. Model performance was compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC) and their empirical 95% CIs. Results Data from 102,945 patients (including 4068 deaths) were included in the analysis. The highest performing ANN (AUROC 0.871, 95% CI 0.862-0.880; AUPRC 0.372, 95% CI 0.345-0.396) that used normalized data performed better than PIM-2 (AUROC 0.805, 95% CI 0.801-0.816; AUPRC 0.234, 95% CI 0.213-0.255) and PRISM-III (AUROC 0.844, 95% CI 0.841-0.855; AUPRC 0.348, 95% CI 0.322-0.367). The performance of this ANN was also significantly better than that of the logistic regression model (AUROC 0.862, 95% CI 0.852-0.872; AUPRC 0.329, 95% CI 0.304-0.351). The performance of the ANN that used unnormalized data (AUROC 0.865, 95% CI 0.856-0.874) was slightly inferior to our highest performing ANN; the single-layer ANN architecture performed poorly and was not investigated further. Conclusions A simple ANN model performed slightly better than the benchmark PIM-2 and PRISM-III scores and a traditional logistic regression model trained on the same data set. The small performance gains achieved by this two-layer ANN model may not offer clinically significant improvement; however, further research with other or more sophisticated model designs and better imputation of missing data may be warranted.


2021 ◽  
Vol 29 ◽  
pp. 287-295
Author(s):  
Zhiming Zhou ◽  
Haihui Huang ◽  
Yong Liang

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.


2021 ◽  
Author(s):  
Katrin Nissen ◽  
Stefan Rupp ◽  
Björn Guse ◽  
Uwe Ulbrich ◽  
Sergiy Vorogushyn ◽  
...  

<p>In this study we present the results of a logistic regression model aimed at describing changes in probabilities for rockfall events in Germany in response to changes in meteorological and hydrological conditions.</p><p>The rockfall events for this study are taken from the landslide database for Germany (Damm and Klose, 2015). The meteorological variables we tested as predictors for the logistic regression model are daily precipitation from the REGNIE data set (Rauthe et al. 2013), hourly precipitation from the RADKLIM radar climatology (Winterrath et al., 2018) and temperature from the E-OBS data set (Cornes et al., 2018). As there is no observational soil moisture data set covering the entire country, we used soil moisture modelled with the state-of-the-art hydrological model mHM (Samaniego et al. 2010), which was calibrated using gauge measurements.</p><p>In order to select the best statistical model we tested a large number of physically plausible combinations of meteorological and hydrological predictors. Each model was checked using cross-validation. The decision on the final model was based on the value of the logarithmic skill score and on expert judgement.</p><p>The final statistical model includes the local percentile of daily precipitation, total relative soil moisture and freeze-thawing cycles in the previous weeks as predictors. It was found that daily precipitation is the most important parameter in the model. An increase of daily precipitation from its median to its 80th percentile approximately doubles the probability for a rockfall event. Higher soil moisture and the occurrence of freeze-thaw cycles also increase the probability for rockfall events. </p><p><br>Cornes, R. C. et al., 2018: An ensemble version of the E‐OBS temperature and precipitation data sets. Journal of Geophysical Research: Atmospheres, 123, 9391– 9409.</p><p>Damm, B., Klose, M., 2015. The landslide database for Germany: Closing the gap at national level. Geomorphology 249, 82–93</p><p>Rauthe, M. et al., 2013: A Central European precipitation climatology – Part I: Generation and validation of a high-reso-lution gridded daily data set (HYRAS), Vol. 22(3), p 235–256.</p><p>Samaniego, L. et al., 2010: Multiscale parameter regionalization of a grid-based hydrologic model at the mesoscale. Water Resour. Res., 46,W05523</p><p>Winterrath, T. et al., 2018: RADKLIM Version 2017.002: Reprocessed gauge-adjusted radar data, one-hour precipitation sums (RW), DOI: 10.5676/DWD/RADKLIM_RW_V2017.002.</p>


2015 ◽  
Vol 32 (1) ◽  
pp. 288 ◽  
Author(s):  
Daniel Lapresa ◽  
Javier Arana ◽  
M.Teresa Anguera ◽  
J.Ignacio Pérez-Castellanos ◽  
Mario Amatria

This study shows how simple and multiple logistic regression can be used in observational methodology and more specifically, in the fields of physical activity and sport. We demonstrate this in a study designed to determine whether three-a-side futsal or five-a-side futsal is more suited to the needs and potential of children aged 6-to-8 years. We constructed a multiple logistic regression model to analyze use of space (depth of play) and three simple logistic regression models to determine which game format is more likely to potentiate effective technical and tactical performance.


2020 ◽  
Vol 8 (2) ◽  
pp. e001314
Author(s):  
Chao Liu ◽  
Li Li ◽  
Kehan Song ◽  
Zhi-Ying Zhan ◽  
Yi Yao ◽  
...  

BackgroundIndividualized prediction of mortality risk can inform the treatment strategy for patients with COVID-19 and solid tumors and potentially improve patient outcomes. We aimed to develop a nomogram for predicting in-hospital mortality of patients with COVID-19 with solid tumors.MethodsWe enrolled patients with COVID-19 with solid tumors admitted to 32 hospitals in China between December 17, 2020, and March 18, 2020. A multivariate logistic regression model was constructed via stepwise regression analysis, and a nomogram was subsequently developed based on the fitted multivariate logistic regression model. Discrimination and calibration of the nomogram were evaluated by estimating the area under the receiver operator characteristic curve (AUC) for the model and by bootstrap resampling, a Hosmer-Lemeshow test, and visual inspection of the calibration curve.ResultsThere were 216 patients with COVID-19 with solid tumors included in the present study, of whom 37 (17%) died and the other 179 all recovered from COVID-19 and were discharged. The median age of the enrolled patients was 63.0 years and 113 (52.3%) were men. Multivariate logistic regression revealed that increasing age (OR=1.08, 95% CI 1.00 to 1.16), receipt of antitumor treatment within 3 months before COVID-19 (OR=28.65, 95% CI 3.54 to 231.97), peripheral white blood cell (WBC) count ≥6.93 ×109/L (OR=14.52, 95% CI 2.45 to 86.14), derived neutrophil-to-lymphocyte ratio (dNLR; neutrophil count/(WBC count minus neutrophil count)) ≥4.19 (OR=18.99, 95% CI 3.58 to 100.65), and dyspnea on admission (OR=20.38, 95% CI 3.55 to 117.02) were associated with elevated mortality risk. The performance of the established nomogram was satisfactory, with an AUC of 0.953 (95% CI 0.908 to 0.997) for the model, non-significant findings on the Hosmer-Lemeshow test, and rough agreement between predicted and observed probabilities as suggested in calibration curves. The sensitivity and specificity of the model were 86.4% and 92.5%.ConclusionIncreasing age, receipt of antitumor treatment within 3 months before COVID-19 diagnosis, elevated WBC count and dNLR, and having dyspnea on admission were independent risk factors for mortality among patients with COVID-19 and solid tumors. The nomogram based on these factors accurately predicted mortality risk for individual patients.


2020 ◽  
Author(s):  
Qiqiang Liang ◽  
Qinyu Zhao ◽  
Xin Xu ◽  
Yu Zhou ◽  
Man Huang

Abstract Background The prevention and control of carbapenem-resistance gram-negative bacteria (CR-GNB) is the difficulty and focus for clinicians in the intensive care unit (ICU). This study construct a CR-GNB carriage prediction model in order to predict the CR-GNB incidence in one week. Methods The database is comprised of nearly 10,000 patients. the model is constructed by the multivariate logistic regression model and three machine learning algorithms. Then we choose the optimal model and verify the accuracy by daily predicted and recorded the occurrence of CR-GNB of all patients admitted for 4 months. Results There are 1385 patients with positive CR-GNB cultures and 1535 negative patients in this study. Forty-five variables have statistical significant differences. We include the 17 variables in the multivariate logistic regression model and build three machine learning models for all variables. In terms of accuracy and the area under the receiver operating characteristic (AUROC) curve, the random forest is better than XGBoost and multivariate logistic regression model, and better than decision tree model (accuracy: 84% >82%>81%>72%), (AUROC: 0.9089 > 0.8947 ≈ 0.8987 > 0.7845). In the 4-month prospective study, 81 cases were predicted to be positive in CR-GNB culture within 7 days, 146 cases were predicted to be negative, 86 cases were positive, and 120 cases were negative, with an overall accuracy of 84% and AUROC of 91.98%. Conclusions Prediction models by machine learning can predict the occurrence of CR-GNB colonization or infection within a week period, and can real-time predict and guide medical staff to identify high-risk groups more accurately.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Wentao Hu ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1517
Author(s):  
Hao Yang Teng ◽  
Zhengjun Zhang

Logistic regression is widely used in the analysis of medical data with binary outcomes to study treatment effects through (absolute) treatment effect parameters in the models. However, the indicative parameters of relative treatment effects are not introduced in logistic regression models, which can be a severe problem in efficiently modeling treatment effects and lead to the wrong conclusions with regard to treatment effects. This paper introduces a new enhanced logistic regression model that offers a new way of studying treatment effects by measuring the relative changes in the treatment effects and also incorporates the way in which logistic regression models the treatment effects. The new model, called the Absolute and Relative Treatment Effects (AbRelaTEs) model, is viewed as a generalization of logistic regression and an enhanced model with increased flexibility, interpretability, and applicability in real data applications than the logistic regression. The AbRelaTEs model is capable of modeling significant treatment effects via an absolute or relative or both ways. The new model can be easily implemented using statistical software, with the logistic regression model being treated as a special case. As a result, the classical logistic regression models can be replaced by the AbRelaTEs model to gain greater applicability and have a new benchmark model for more efficiently studying treatment effects in clinical trials, economic developments, and many applied areas. Moreover, the estimators of the coefficients are consistent and asymptotically normal under regularity conditions. In both simulation and real data applications, the model provides both significant and more meaningful results.


Spinal Cord ◽  
2020 ◽  
Author(s):  
Omar Khan ◽  
Jetan H. Badhiwala ◽  
Michael G. Fehlings

Abstract Study design Retrospective analysis of prospectively collected data. Objectives Recently, logistic regression models were developed to predict independence in bowel function 1 year after spinal cord injury (SCI) on a multicenter European SCI (EMSCI) dataset. Here, we evaluated the external validity of these models against a prospectively accrued North American SCI dataset. Setting Twenty-five SCI centers in the United States and Canada. Methods Two logistic regression models developed by the EMSCI group were applied to data for 277 patients derived from three prospective multicenter SCI studies based in North America. External validation was evaluated for both models by assessing their discrimination, calibration, and clinical utility. Discrimination and calibration were assessed using ROC curves and calibration curves, respectively, while clinical utility was assessed using decision curve analysis. Results The simplified logistic regression model, which used baseline total motor score as the predictor, demonstrated the best performance, with an area under the ROC curve of 0.869 (95% confidence interval: 0.826–0.911), a sensitivity of 75.5%, and a specificity of 88.5%. Moreover, the model was well calibrated across the full range of observed probabilities and displayed superior clinical benefit on the decision curve. Conclusions A logistic regression model using baseline total motor score as a predictor of independent bowel function 1 year after SCI was successfully validated against an external dataset. These findings provide evidence supporting the use of this model to enhance the care for individuals with SCI.


2020 ◽  
Vol 35 (6) ◽  
pp. 933-933
Author(s):  
Rolin S ◽  
Kitchen Andren K ◽  
Mullen C ◽  
Kurniadi N ◽  
Davis J

Abstract Objective Previous research in a Veterans Affairs sample proposed using single items on the Neurobehavioral Symptom Inventory (NSI) to screen for anxiety (item 19) and depression (item 20). This study examined the approach in an outpatient physical medicine and rehabilitation sample. Method Participants (N = 84) underwent outpatient neuropsychological evaluation using the NSI, BDI-II, GAD-7, MMPI-2-RF, and Memory Complaints Inventory (MCI) among other measures. Anxiety and depression were psychometrically determined via cutoffs on the GAD-7 (&gt;4) and MMPI-2-RF ANX (&gt;64 T), and BDI-II (&gt;13) and MMPI-2-RF RC2 (&gt;64 T), respectively. Analyses included receiver operating characteristic analysis (ROC) and logistic regression. Logistic regression models used dichotomous anxiety and depression as outcomes and relevant NSI items and MCI average score as predictors. Results ROC analysis using NSI items to classify cases showed area under the curve (AUC) values of .77 for anxiety and .85 for depression. The logistic regression model predicting anxiety correctly classified 80% of cases with AUC of .86. The logistic regression model predicting depression correctly classified 79% of cases with AUC of .88. Conclusion Findings support the utility of NSI anxiety and depression items as screening measures in a rehabilitation population. Consideration of symptom validity via the MCI improved classification accuracy of the regression models. The approach may be useful in other clinical settings for quick assessment of psychological issues warranting further evaluation.


Sign in / Sign up

Export Citation Format

Share Document