Multi-Task Learning with Recurrent Neural Networks for ARDS Prediction using only EHR Data: Model Development and Validation Study (Preprint)

2022 ◽  
Author(s):  
Carson Lam ◽  
Rahul Thapa ◽  
Jenish Maharjan ◽  
Keyvan Rahmani ◽  
Chak Foon Tso ◽  
...  

BACKGROUND Acute Respiratory Distress Syndrome (ARDS) is a condition that is often considered to have broad and subjective diagnostic criteria and is associated with significant mortality and morbidity. Early and accurate prediction of ARDS and related conditions such as hypoxemia and sepsis could allow timely administration of therapies, leading to improved patient outcomes. OBJECTIVE To perform an exploration of how multi-label classification in the clinical setting can take advantage of the underlying dependencies between ARDS and related conditions to improve early prediction of ARDS. METHODS The electronic health record dataset included 40,073 patient encounters from 7 hospitals from 4/20/2018 to 3/17/2021. A recurrent neural network (RNN) was trained using data from 5 hospitals, and external validation was conducted on data from 2 hospitals. In addition to ARDS, 12 target labels for related conditions such as sepsis, hypoxemia and Covid-19 were used to train the model to classify a total of 13 outputs. As a comparator, XGBoost models were developed for each of the 13 target labels. Model performance was assessed using the area under the receiver operating characteristic (AUROC). Heatmaps to visualize attention scores were generated to provide interpretability to the NNs. Finally, cluster analysis was performed to identify potential phenotypic subgroups of ARDS patients. RESULTS The single RNN model trained to classify 13 outputs outperformed the XGBoost model for ARDS prediction, achieving an AUROC of 0.842 on the external test sets. Models trained on an increasing number of tasks resulted in increasing performance. Earlier diagnosis of ARDS nearly doubled the rate of in-hospital survival. Cluster analysis revealed distinct ARDS subgroups, some of which had similar mortality rates but different clinical presentations. CONCLUSIONS The RNN model presented in this paper can be used as an early warning system to stratify patients who are at risk of developing one of the multiple risk outcomes, hence providing practitioners with means to take early action.

2019 ◽  
Vol 98 (10) ◽  
pp. 1088-1095 ◽  
Author(s):  
J. Krois ◽  
C. Graetz ◽  
B. Holtfreter ◽  
P. Brinkmann ◽  
T. Kocher ◽  
...  

Prediction models learn patterns from available data (training) and are then validated on new data (testing). Prediction modeling is increasingly common in dental research. We aimed to evaluate how different model development and validation steps affect the predictive performance of tooth loss prediction models of patients with periodontitis. Two independent cohorts (627 patients, 11,651 teeth) were followed over a mean ± SD 18.2 ± 5.6 y (Kiel cohort) and 6.6 ± 2.9 y (Greifswald cohort). Tooth loss and 10 patient- and tooth-level predictors were recorded. The impact of different model development and validation steps was evaluated: 1) model complexity (logistic regression, recursive partitioning, random forest, extreme gradient boosting), 2) sample size (full data set or 10%, 25%, or 75% of cases dropped at random), 3) prediction periods (maximum 10, 15, or 20 y or uncensored), and 4) validation schemes (internal or external by centers/time). Tooth loss was generally a rare event (880 teeth were lost). All models showed limited sensitivity but high specificity. Patients’ age and tooth loss at baseline as well as probing pocket depths showed high variable importance. More complex models (random forest, extreme gradient boosting) had no consistent advantages over simpler ones (logistic regression, recursive partitioning). Internal validation (in sample) overestimated the predictive power (area under the curve up to 0.90), while external validation (out of sample) found lower areas under the curve (range 0.62 to 0.82). Reducing the sample size decreased the predictive power, particularly for more complex models. Censoring the prediction period had only limited impact. When the model was trained in one period and tested in another, model outcomes were similar to the base case, indicating temporal validation as a valid option. No model showed higher accuracy than the no-information rate. In conclusion, none of the developed models would be useful in a clinical setting, despite high accuracy. During modeling, rigorous development and external validation should be applied and reported accordingly.


2019 ◽  
Vol 20 (8) ◽  
pp. 1897 ◽  
Author(s):  
Shuaibing He ◽  
Tianyuan Ye ◽  
Ruiying Wang ◽  
Chenyang Zhang ◽  
Xuelian Zhang ◽  
...  

As one of the leading causes of drug failure in clinical trials, drug-induced liver injury (DILI) seriously impeded the development of new drugs. Assessing the DILI risk of drug candidates in advance has been considered as an effective strategy to decrease the rate of attrition in drug discovery. Recently, there have been continuous attempts in the prediction of DILI. However, it indeed remains a huge challenge to predict DILI successfully. There is an urgent need to develop a quantitative structure–activity relationship (QSAR) model for predicting DILI with satisfactory performance. In this work, we reported a high-quality QSAR model for predicting the DILI risk of xenobiotics by incorporating the use of eight effective classifiers and molecular descriptors provided by Marvin. In model development, a large-scale and diverse dataset consisting of 1254 compounds for DILI was built through a comprehensive literature retrieval. The optimal model was attained by an ensemble method, averaging the probabilities from eight classifiers, with accuracy (ACC) of 0.783, sensitivity (SE) of 0.818, specificity (SP) of 0.748, and area under the receiver operating characteristic curve (AUC) of 0.859. For further validation, three external test sets and a large negative dataset were utilized. Consequently, both the internal and external validation indicated that our model outperformed prior studies significantly. Data provided by the current study will also be a valuable source for modeling/data mining in the future.


Flooding is a major problem globally, and especially in SuratThani province, Thailand. Along the lower Tapeeriver in SuratThani, the population density is high. Implementing an early warning system can benefit people living along the banks here. In this study, our aim was to build a flood prediction model using artificial neural network (ANN), which would utilize water and stream levels along the lower Tapeeriver to predict floods. This model was used to predict flood using a dataset of rainfall and stream levels measured at local stations. The developed flood prediction model consisted of 4 input variables, namely, the rainfall amounts and stream levels at stations located in the PhraSeang district (X.37A), the Khian Sa district (X.217), and in the Phunphin district (X.5C). Model performance was evaluated using input data spanning a period of eight years (2011–2018). The model performance was compared with support vector machine (SVM), and ANN had better accuracy. The results showed an accuracy of 97.91% for the ANN model; however, for SVM it was 97.54%. Furthermore, the recall (42.78%) and f-measure (52.24%) were better for our model, however, the precision was lower. Therefore, the designed flood prediction model can estimate the likelihood of floods around the lower Tapee river region


2019 ◽  
Vol 4 (6) ◽  
pp. e001801
Author(s):  
Sarah Hanieh ◽  
Sabine Braat ◽  
Julie A Simpson ◽  
Tran Thi Thu Ha ◽  
Thach D Tran ◽  
...  

IntroductionGlobally, an estimated 151 million children under 5 years of age still suffer from the adverse effects of stunting. We sought to develop and externally validate an early life predictive model that could be applied in infancy to accurately predict risk of stunting in preschool children.MethodsWe conducted two separate prospective cohort studies in Vietnam that intensively monitored children from early pregnancy until 3 years of age. They included 1168 and 475 live-born infants for model development and validation, respectively. Logistic regression on child stunting at 3 years of age was performed for model development, and the predicted probabilities for stunting were used to evaluate the performance of this model in the validation data set.ResultsStunting prevalence was 16.9% (172 of 1015) in the development data set and 16.4% (70 of 426) in the validation data set. Key predictors included in the final model were paternal and maternal height, maternal weekly weight gain during pregnancy, infant sex, gestational age at birth, and infant weight and length at 6 months of age. The area under the receiver operating characteristic curve in the validation data set was 0.85 (95% Confidence Interval, 0.80–0.90).ConclusionThis tool applied to infants at 6 months of age provided valid prediction of risk of stunting at 3 years of age using a readily available set of parental and infant measures. Further research is required to examine the impact of preventive measures introduced at 6 months of age on those identified as being at risk of growth faltering at 3 years of age.


2021 ◽  
Vol 7 ◽  
Author(s):  
Kai Zhang ◽  
Shufang Zhang ◽  
Wei Cui ◽  
Yucai Hong ◽  
Gensheng Zhang ◽  
...  

Background: Many severity scores are widely used for clinical outcome prediction for critically ill patients in the intensive care unit (ICU). However, for patients identified by sepsis-3 criteria, none of these have been developed. This study aimed to develop and validate a risk stratification score for mortality prediction in sepsis-3 patients.Methods: In this retrospective cohort study, we employed the Medical Information Mart for Intensive Care III (MIMIC III) database for model development and the eICU database for external validation. We identified septic patients by sepsis-3 criteria on day 1 of ICU entry. The Least Absolute Shrinkage and Selection Operator (LASSO) technique was performed to select predictive variables. We also developed a sepsis mortality prediction model and associated risk stratification score. We then compared model discrimination and calibration with other traditional severity scores.Results: For model development, we enrolled a total of 5,443 patients fulfilling the sepsis-3 criteria. The 30-day mortality was 16.7%. With 5,658 septic patients in the validation set, there were 1,135 deaths (mortality 20.1%). The score had good discrimination in development and validation sets (area under curve: 0.789 and 0.765). In the validation set, the calibration slope was 0.862, and the Brier value was 0.140. In the development dataset, the score divided patients according to mortality risk of low (3.2%), moderate (12.4%), high (30.7%), and very high (68.1%). The corresponding mortality in the validation dataset was 2.8, 10.5, 21.1, and 51.2%. As shown by the decision curve analysis, the score always had a positive net benefit.Conclusion: We observed moderate discrimination and calibration for the score termed Sepsis Mortality Risk Score (SMRS), allowing stratification of patients according to mortality risk. However, we still require further modification and external validation.


2021 ◽  
Author(s):  
Edward Korot ◽  
Nikolas Pontikos ◽  
Xiaoxuan Liu ◽  
Siegfried K Wagner ◽  
Livia Faes ◽  
...  

Abstract Deep learning may transform health care, but model development has largely been dependent on availability of advanced technical expertise. Herein we present the development of a deep learning model by clinicians without coding, which predicts reported sex from retinal fundus photographs. A model was trained on 84,743 retinal fundus photos from the UK Biobank dataset. External validation was performed on 252 fundus photos from a tertiary ophthalmic referral center. For internal validation, the area under the receiver operating characteristic curve (AUROC) of the code free deep learning (CFDL) model was 0.93. Sensitivity, specificity, positive predictive value (PPV) and accuracy (ACC) were 88.8%, 83.6%, 87.3% and 86.5%, and for external validation were 83.9%, 72.2%, 78.2% and 78.6% respectively. Clinicians are currently unaware of distinct retinal feature variations between males and females, highlighting the importance of model explainability for this task. The model performed significantly worse when foveal pathology was present in the external validation dataset, ACC: 69.4%, compared to 85.4% in healthy eyes, suggesting the fovea is a salient region for model performance OR (95% CI): 0.36 (0.19, 0.70) p = 0.0022. Automated machine learning (AutoML) may enable clinician-driven automated discovery of novel insights and disease biomarkers.


2021 ◽  
Author(s):  
Steven J. Staffa ◽  
David Zurakowski

Summary Clinical prediction models in anesthesia and surgery research have many clinical applications including preoperative risk stratification with implications for clinical utility in decision-making, resource utilization, and costs. It is imperative that predictive algorithms and multivariable models are validated in a suitable and comprehensive way in order to establish the robustness of the model in terms of accuracy, predictive ability, reliability, and generalizability. The purpose of this article is to educate anesthesia researchers at an introductory level on important statistical concepts involved with development and validation of multivariable prediction models for a binary outcome. Methods covered include assessments of discrimination and calibration through internal and external validation. An anesthesia research publication is examined to illustrate the process and presentation of multivariable prediction model development and validation for a binary outcome. Properly assessing the statistical and clinical validity of a multivariable prediction model is essential for reassuring the generalizability and reproducibility of the published tool.


Author(s):  
Khalid Bouhedjar ◽  
Abdelmalek Khorief Nacereddine ◽  
Hamida Ghorab ◽  
Abdelhafid Djerourou

The simplified molecular input line entry system (SMILES) is particularly suitable for high-speed machine processing, based on the Monte Carlo method using CORAL software. Quantitative structure-property relationships (QSPR) of critical temperatures have been established using a dataset of 165 diverse organic compounds employing hybrid optimal descriptors defined by graph and SMILES notation. External validation is one of the most important parts in the evaluation of model performance. However, previous models on the same dataset have poor predictive power in the external test set, or the authors had not done that check. In the present work, the predictive ability of model has been tested using external validation. The statistical quality of the three splits are similar and good. The r2 values for the best model are: r2 = 0.98 for the training set, r2 = 0.95 for the calibration set, and r2 = 0.94 for the validation set.


Author(s):  
Isabelle Kaiser ◽  
Annette B. Pfahlberg ◽  
Wolfgang Uter ◽  
Markus V. Heppt ◽  
Marit B. Veierød ◽  
...  

The rising incidence of cutaneous melanoma over the past few decades has prompted substantial efforts to develop risk prediction models identifying people at high risk of developing melanoma to facilitate targeted screening programs. We review these models, regarding study characteristics, differences in risk factor selection and assessment, evaluation, and validation methods. Our systematic literature search revealed 40 studies comprising 46 different risk prediction models eligible for the review. Altogether, 35 different risk factors were part of the models with nevi being the most common one (n = 35, 78%); little consistency in other risk factors was observed. Results of an internal validation were reported for less than half of the studies (n = 18, 45%), and only 6 performed external validation. In terms of model performance, 29 studies assessed the discriminative ability of their models; other performance measures, e.g., regarding calibration or clinical usefulness, were rarely reported. Due to the substantial heterogeneity in risk factor selection and assessment as well as methodologic aspects of model development, direct comparisons between models are hardly possible. Uniform methodologic standards for the development and validation of risk prediction models for melanoma and reporting standards for the accompanying publications are necessary and need to be obligatory for that reason.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Christine K. Lee ◽  
Muntaha Samad ◽  
Ira Hofer ◽  
Maxime Cannesson ◽  
Pierre Baldi

AbstractWhile deep neural networks (DNNs) and other machine learning models often have higher accuracy than simpler models like logistic regression (LR), they are often considered to be “black box” models and this lack of interpretability and transparency is considered a challenge for clinical adoption. In healthcare, intelligible models not only help clinicians to understand the problem and create more targeted action plans, but also help to gain the clinicians’ trust. One method of overcoming the limited interpretability of more complex models is to use Generalized Additive Models (GAMs). Standard GAMs simply model the target response as a sum of univariate models. Inspired by GAMs, the same idea can be applied to neural networks through an architecture referred to as Generalized Additive Models with Neural Networks (GAM-NNs). In this manuscript, we present the development and validation of a model applying the concept of GAM-NNs to allow for interpretability by visualizing the learned feature patterns related to risk of in-hospital mortality for patients undergoing surgery under general anesthesia. The data consists of 59,985 patients with a feature set of 46 features extracted at the end of surgery to which we added previously not included features: total anesthesia case time (1 feature); the time in minutes spent with mean arterial pressure (MAP) below 40, 45, 50, 55, 60, and 65 mmHg during surgery (6 features); and Healthcare Cost and Utilization Project (HCUP) Code Descriptions of the Primary current procedure terminology (CPT) codes (33 features) for a total of 86 features. All data were randomly split into 80% for training (n = 47,988) and 20% for testing (n = 11,997) prior to model development. Model performance was compared to a standard LR model using the same features as the GAM-NN. The data consisted of 59,985 surgical records, and the occurrence of in-hospital mortality was 0.81% in the training set and 0.72% in the testing set. The GAM-NN model with HCUP features had the highest area under the curve (AUC) 0.921 (0.895–0.95). Overall, both GAM-NN models had higher AUCs than LR models, however, had lower average precisions. The LR model without HCUP features had the highest average precision 0.217 (0.136–0.31). To assess the interpretability of the GAM-NNs, we then visualized the learned contributions of the GAM-NNs and compared against the learned contributions of the LRs for the models with HCUP features. Overall, we were able to demonstrate that our proposed generalized additive neural network (GAM-NN) architecture is able to (1) leverage a neural network’s ability to learn nonlinear patterns in the data, which is more clinically intuitive, (2) be interpreted easily, making it more clinically useful, and (3) maintain model performance as compared to previously published DNNs.


Sign in / Sign up

Export Citation Format

Share Document