scholarly journals Development and validation of an interpretable neural network for prediction of postoperative in-hospital mortality

2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Christine K. Lee ◽  
Muntaha Samad ◽  
Ira Hofer ◽  
Maxime Cannesson ◽  
Pierre Baldi

AbstractWhile deep neural networks (DNNs) and other machine learning models often have higher accuracy than simpler models like logistic regression (LR), they are often considered to be “black box” models and this lack of interpretability and transparency is considered a challenge for clinical adoption. In healthcare, intelligible models not only help clinicians to understand the problem and create more targeted action plans, but also help to gain the clinicians’ trust. One method of overcoming the limited interpretability of more complex models is to use Generalized Additive Models (GAMs). Standard GAMs simply model the target response as a sum of univariate models. Inspired by GAMs, the same idea can be applied to neural networks through an architecture referred to as Generalized Additive Models with Neural Networks (GAM-NNs). In this manuscript, we present the development and validation of a model applying the concept of GAM-NNs to allow for interpretability by visualizing the learned feature patterns related to risk of in-hospital mortality for patients undergoing surgery under general anesthesia. The data consists of 59,985 patients with a feature set of 46 features extracted at the end of surgery to which we added previously not included features: total anesthesia case time (1 feature); the time in minutes spent with mean arterial pressure (MAP) below 40, 45, 50, 55, 60, and 65 mmHg during surgery (6 features); and Healthcare Cost and Utilization Project (HCUP) Code Descriptions of the Primary current procedure terminology (CPT) codes (33 features) for a total of 86 features. All data were randomly split into 80% for training (n = 47,988) and 20% for testing (n = 11,997) prior to model development. Model performance was compared to a standard LR model using the same features as the GAM-NN. The data consisted of 59,985 surgical records, and the occurrence of in-hospital mortality was 0.81% in the training set and 0.72% in the testing set. The GAM-NN model with HCUP features had the highest area under the curve (AUC) 0.921 (0.895–0.95). Overall, both GAM-NN models had higher AUCs than LR models, however, had lower average precisions. The LR model without HCUP features had the highest average precision 0.217 (0.136–0.31). To assess the interpretability of the GAM-NNs, we then visualized the learned contributions of the GAM-NNs and compared against the learned contributions of the LRs for the models with HCUP features. Overall, we were able to demonstrate that our proposed generalized additive neural network (GAM-NN) architecture is able to (1) leverage a neural network’s ability to learn nonlinear patterns in the data, which is more clinically intuitive, (2) be interpreted easily, making it more clinically useful, and (3) maintain model performance as compared to previously published DNNs.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Narayan Sharma ◽  
René Schwendimann ◽  
Olga Endrich ◽  
Dietmar Ausserhofer ◽  
Michael Simon

Abstract Background Understanding how comorbidity measures contribute to patient mortality is essential both to describe patient health status and to adjust for risks and potential confounding. The Charlson and Elixhauser comorbidity indices are well-established for risk adjustment and mortality prediction. Still, a different set of comorbidity weights might improve the prediction of in-hospital mortality. The present study, therefore, aimed to derive a set of new Swiss Elixhauser comorbidity weightings, to validate and compare them against those of the Charlson and Elixhauser-based van Walraven weights in an adult in-patient population-based cohort of general hospitals. Methods Retrospective analysis was conducted with routine data of 102 Swiss general hospitals (2012–2017) for 6.09 million inpatient cases. To derive the Swiss weightings for the Elixhauser comorbidity index, we randomly halved the inpatient data and validated the results of part 1 alongside the established weighting systems in part 2, to predict in-hospital mortality. Charlson and van Walraven weights were applied to Charlson and Elixhauser comorbidity indices. Derivation and validation of weightings were conducted with generalized additive models adjusted for age, gender and hospital types. Results Overall, the Elixhauser indices, c-statistic with Swiss weights (0.867, 95% CI, 0.865–0.868) and van Walraven’s weights (0.863, 95% CI, 0.862–0.864) had substantial advantage over Charlson’s weights (0.850, 95% CI, 0.849–0.851) and in the derivation and validation groups. The net reclassification improvement of new Swiss weights improved the predictive performance by 1.6% on the Elixhauser-van Walraven and 4.9% on the Charlson weights. Conclusions All weightings confirmed previous results with the national dataset. The new Swiss weightings model improved slightly the prediction of in-hospital mortality in Swiss hospitals. The newly derive weights support patient population-based analysis of in-hospital mortality and seek country or specific cohort-based weightings.


2021 ◽  
Author(s):  
Ana Gabriela Reyna Flores ◽  
Quentin Fisher ◽  
Piroska Lorinczi

Abstract Tight gas sandstone reservoirs vary widely in terms of rock type, depositional environment, mineralogy and petrophysical properties. For this reason, estimating their permeability is a challenge when core is not available because it is a property that cannot be measured directly from wire-line logs. The aim of this work is to create an automatic tool for rock microstructure classification as a first step for future permeability prediction. Permeability can be estimated from porosity measured using wire-line data such as derived from density-neutron tools. However, without additional information this is highly inaccurate because porosity-permeability relationships are controlled by the microstructure of samples and permeability can vary by over five orders of magnitude. Experts can broadly estimate porosity-permeability relationships by analysing the microstructure of rocks using Scanning Electron Microscopy (SEM) or optical microscopy. Such estimates are, however, subjective and require many years of experience. A Machine Learning model for the automation of rock microstructure determination on tight gas sandstones has been built using Convolutional Neural Networks (CNNs) and trained on backscattered images from cuttings. Current results were obtained by training the model on around 24,000 Back Scattering Electron Microscopy (BSEM) images from 25 different rock samples. The obtained model performance for the current dataset are 97% of average of both validation and test categorical accuracy. Also, loss of 0.09 and 0.089 were obtained for validation and test correspondingly. Such high accuracy and low loss indicate an overall great model performance. Other metrics and debugging techniques such Gradient-weighted Class Activation Mapping (Grad-CAM), Receiver Operator Characteristics (ROC) and Area Under the Curve (AUC) were considered for the model performance evaluation obtaining positive results. Nevertheless, this can be improved by obtaining images from new already available samples and make the model generalizes better. Current results indicate that CNNs are a powerful tool and their application over thin section images is an answer for image analysis and classification problems. The use of this classifier removes the subjectivity of estimating porosity-permeability relationships from microstructure and can be used by non-experts. The current results also open the possibility of a data driven permeability prediction based on rock microstructure and porosity from well logs.


2019 ◽  
Vol 98 (10) ◽  
pp. 1088-1095 ◽  
Author(s):  
J. Krois ◽  
C. Graetz ◽  
B. Holtfreter ◽  
P. Brinkmann ◽  
T. Kocher ◽  
...  

Prediction models learn patterns from available data (training) and are then validated on new data (testing). Prediction modeling is increasingly common in dental research. We aimed to evaluate how different model development and validation steps affect the predictive performance of tooth loss prediction models of patients with periodontitis. Two independent cohorts (627 patients, 11,651 teeth) were followed over a mean ± SD 18.2 ± 5.6 y (Kiel cohort) and 6.6 ± 2.9 y (Greifswald cohort). Tooth loss and 10 patient- and tooth-level predictors were recorded. The impact of different model development and validation steps was evaluated: 1) model complexity (logistic regression, recursive partitioning, random forest, extreme gradient boosting), 2) sample size (full data set or 10%, 25%, or 75% of cases dropped at random), 3) prediction periods (maximum 10, 15, or 20 y or uncensored), and 4) validation schemes (internal or external by centers/time). Tooth loss was generally a rare event (880 teeth were lost). All models showed limited sensitivity but high specificity. Patients’ age and tooth loss at baseline as well as probing pocket depths showed high variable importance. More complex models (random forest, extreme gradient boosting) had no consistent advantages over simpler ones (logistic regression, recursive partitioning). Internal validation (in sample) overestimated the predictive power (area under the curve up to 0.90), while external validation (out of sample) found lower areas under the curve (range 0.62 to 0.82). Reducing the sample size decreased the predictive power, particularly for more complex models. Censoring the prediction period had only limited impact. When the model was trained in one period and tested in another, model outcomes were similar to the base case, indicating temporal validation as a valid option. No model showed higher accuracy than the no-information rate. In conclusion, none of the developed models would be useful in a clinical setting, despite high accuracy. During modeling, rigorous development and external validation should be applied and reported accordingly.


2011 ◽  
Vol 187 ◽  
pp. 411-415
Author(s):  
Lu Yue Xia ◽  
Hai Tian Pan ◽  
Meng Fei Zhou ◽  
Yi Jun Cai ◽  
Xiao Fang Sun

Melt index is the most important parameter in determining the polypropylene grade. Since the lack of proper on-line instruments, its measurement interval and delay are both very long. This makes the quality control quite difficult. A modeling approach based on stacked neural networks is proposed to estimation the polypropylene melt index. Single neural network model generalization capability can be significantly improved by using stacked neural networks model. Proper determination of the stacking weights is essential for good stacked neural networks model performance, so determination of appropriate weights for combining individual networks using the criteria about minimization of sum of absolute prediction error is proposed. Application to real industrial data demonstrates that the polypropylene melt index can be successfully estimated using stacked neural networks. The results obtained demonstrate significant improvements in model accuracy, as a result of using stacked neural networks model, compared to using single neural network model.


2018 ◽  
Vol 129 (4) ◽  
pp. 649-662 ◽  
Author(s):  
Christine K. Lee ◽  
Ira Hofer ◽  
Eilon Gabel ◽  
Pierre Baldi ◽  
Maxime Cannesson

Abstract Editor’s Perspective What We Already Know about This Topic What This Article Tells Us That Is New Background The authors tested the hypothesis that deep neural networks trained on intraoperative features can predict postoperative in-hospital mortality. Methods The data used to train and validate the algorithm consists of 59,985 patients with 87 features extracted at the end of surgery. Feed-forward networks with a logistic output were trained using stochastic gradient descent with momentum. The deep neural networks were trained on 80% of the data, with 20% reserved for testing. The authors assessed improvement of the deep neural network by adding American Society of Anesthesiologists (ASA) Physical Status Classification and robustness of the deep neural network to a reduced feature set. The networks were then compared to ASA Physical Status, logistic regression, and other published clinical scores including the Surgical Apgar, Preoperative Score to Predict Postoperative Mortality, Risk Quantification Index, and the Risk Stratification Index. Results In-hospital mortality in the training and test sets were 0.81% and 0.73%. The deep neural network with a reduced feature set and ASA Physical Status classification had the highest area under the receiver operating characteristics curve, 0.91 (95% CI, 0.88 to 0.93). The highest logistic regression area under the curve was found with a reduced feature set and ASA Physical Status (0.90, 95% CI, 0.87 to 0.93). The Risk Stratification Index had the highest area under the receiver operating characteristics curve, at 0.97 (95% CI, 0.94 to 0.99). Conclusions Deep neural networks can predict in-hospital mortality based on automatically extractable intraoperative data, but are not (yet) superior to existing methods.


2015 ◽  
Vol 12 (10) ◽  
pp. 11083-11127 ◽  
Author(s):  
J. E. Shortridge ◽  
S. D. Guikema ◽  
B. F. Zaitchik

Abstract. In the past decade, certain methods for empirical rainfall–runoff modeling have seen extensive development and been proposed as a useful complement to physical hydrologic models, particularly in basins where data to support process-based models is limited. However, the majority of research has focused on a small number of methods, such as artificial neural networks, despite the development of multiple other approaches for non-parametric regression in recent years. Furthermore, this work has generally evaluated model performance based on predictive accuracy alone, while not considering broader objectives such as model interpretability and uncertainty that are important if such methods are to be used for planning and management decisions. In this paper, we use multiple regression and machine-learning approaches to simulate monthly streamflow in five highly-seasonal rivers in the highlands of Ethiopia and compare their performance in terms of predictive accuracy, error structure and bias, model interpretability, and uncertainty when faced with extreme climate conditions. While the relative predictive performance of models differed across basins, data-driven approaches were able to achieve reduced errors when compared to physical models developed for the region. Methods such as random forests and generalized additive models may have advantages in terms of visualization and interpretation of model structure, which can be useful in providing insights into physical watershed function. However, the uncertainty associated with model predictions under climate change should be carefully evaluated, since certain models (especially generalized additive models and multivariate adaptive regression splines) became highly variable when faced with high temperatures.


Author(s):  
Suhail Ahamed ◽  
Gabriele Weiler ◽  
Karl Boden ◽  
Kai Januschowski ◽  
Matthias Stennes ◽  
...  

The automation of medical documentation is a highly desirable process, especially as it could avert significant temporal and monetary expenses in healthcare. With the help of complex modelling and high computational capability, Automatic Speech Recognition (ASR) and deep learning have made several promising attempts to this end. However, a factor that significantly determines the efficiency of these systems is the volume of speech that is processed in each medical examination. In the course of this study, we found that over half of the speech, recorded during follow-up examinations of patients treated with Intra-Vitreal Injections, was not relevant for medical documentation. In this paper, we evaluate the application of Convolutional and Long Short-Term Memory (LSTM) neural networks for the development of a speech classification module aimed at identifying speech relevant for medical report generation. In this regard, various topology parameters are tested and the effect of the model performance on different speaker attributes is analyzed. The results indicate that Convolutional Neural Networks (CNNs) are more successful than LSTM networks, and achieve a validation accuracy of 92.41%. Furthermore, on evaluation of the robustness of the model to gender, accent and unknown speakers, the neural network generalized satisfactorily.


Author(s):  
Oskar Allerbo ◽  
Rebecka Jörnsten

AbstractNon-parametric, additive models are able to capture complex data dependencies in a flexible, yet interpretable way. However, choosing the format of the additive components often requires non-trivial data exploration. Here, as an alternative, we propose PrAda-net, a one-hidden-layer neural network, trained with proximal gradient descent and adaptive lasso. PrAda-net automatically adjusts the size and architecture of the neural network to reflect the complexity and structure of the data. The compact network obtained by PrAda-net can be translated to additive model components, making it suitable for non-parametric statistical modelling with automatic model selection. We demonstrate PrAda-net on simulated data, where we compare the test error performance, variable importance and variable subset identification properties of PrAda-net to other lasso-based regularization approaches for neural networks. We also apply PrAda-net to the massive U.K. black smoke data set, to demonstrate how PrAda-net can be used to model complex and heterogeneous data with spatial and temporal components. In contrast to classical, statistical non-parametric approaches, PrAda-net requires no preliminary modeling to select the functional forms of the additive components, yet still results in an interpretable model representation.


2020 ◽  
Author(s):  
Narayan Sharma ◽  
René Schwendimann ◽  
Olga Endrich ◽  
Dietmar Ausserhofer ◽  
Michael Simon

Abstract Background When chronic conditions are associated with outcomes such as mortality, comorbidity measures are essential both to describe patient health status and to adjust for potential confounding. The Charlson and Elixhauser comorbidity indices are well-established for risk adjustment and mortality prediction. Still, as optimal comorbidity weightings remain undetermined. The present study aimed to derive a set of new population-based Elixhauser comorbidity weightings, then to validate and compare their mortality predictivity against those of the Charlson and Elixhauser-based van Walraven weightings estimates in a population-based cohort.Methods Retrospective analysis was conducted with routine Swiss general hospital (102 hospitals) data (2012–2017) for 6.09 million inpatient cases. To derive the population-based weightings for the Elixhauser comorbidity index, we randomly halved the inpatient data and validated the results for Part 1 alongside the established weighting systems used for Part 2. Charlson and van Walraven weightings were applied to Charlson and Elixhauser comorbidity indices. Generalized additive models were weighted and adjusted for age, gender and hospital types.Results Overall, the population-based weights’ c-statistic (0.867, 95% CI: 0.865–0.868) was consistently higher than Elixhauser-van Walraven’s (0.863, 95% CI: 0.862–0.864) and Charlson’s (0.850, 95% CI: 0.849–0.851) in the derivation and validation groups and net reclassification improvement of new weights offers improved predictive performance of 0.4% on the Elixhauser-van Walraven and 6.1% on the Charlson weightings.Conclusions All weightings were validated with the national dataset and the new population-based weightings model improved the prediction of in-hospital mortality. The newly derive weights support patient population-based analysis of health outcomes.


Sign in / Sign up

Export Citation Format

Share Document