Predictive Models of Intensive Care Unit Mortality - Severity of Illness Scores or Artificial Intelligence instruments? - Literature Review and Metanalysis (Preprint)
BACKGROUND The Severity of Illness Scores (SIS)- Acute Physiology and Chronic Health Evaluation (APACHE), Simplified Acute Physiology Score (SAPS), and Sequential Organ Failure Assessment (SOFA) - are current risk stratification and mortality prediction tools used in Intensive Care Units (ICU) across the globe, and rely on scores that assess disease severity on admission. Developers of Artificial Intelligence (AI) or Machine Learning (ML) models predictive of ICU mortality use the SIS performance as a reference point when reporting the performance of these computational constructs. OBJECTIVE Using systematic review and meta-analysis, we evaluated studies that compare ML-based mortality prediction models to SIS-based models. The review should inform clinicians regarding the prognostic value of ML-based ICU mortality prediction models compared with SIS models and their validity in supporting clinical decision-making. METHODS We performed a systematic search using PubMed, Scopus, Embase, and IEEE databases. Studies that report the performance of newly developed ML models predictive of ICU mortality and compare it with the performance of SIS models on the same datasets were eligible for inclusion. ML and the SIS models with a reported Area Under the Receiver Operating Characteristic (AUROC) curve were included in the meta-analysis to identify the group with superior performance. Data were extracted with guidance from the CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies) checklist[1] and was appraised for risk of bias and applicability using PROBAST (A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies ) [2]. RESULTS After screening the literature, we identified and included 20 papers containing 47 ML models based on seven types of algorithms that were compared with three types of SIS models. The AUROC for predicting ICU mortality ranged between 0.828-0.875 for ML-based models and between 0.707-0.760 for SI-based models. We noted substantial heterogeneity among the models reported, and considerable variation among the AUROC estimates for both ML and SIS model types. Due to the high degree of heterogeneity, we performed a limited random-effect meta-analysis of externally validated subgroups of ML models and the subgroups of SIS used for comparison. CONCLUSIONS ML-based models can accurately predict ICU mortality as an alternative to traditional scoring models. The high degree of heterogeneity observed within and between studies limit the assessment of pooled results. The differences in development strategies, validation, statistical, and computational methods that these models rely on impede a head-to-head comparison, and we cannot declare the superiority of one model over the other. Consequently, we make no recommendation regarding the ML-based ICU mortality prediction models’ performance in clinical practice. To bridge the knowledge gap from design to practice, ML model developers must provide explainer models and make those knowledge objects reproducible, interoperable, and transparent[3]. CLINICALTRIAL the review was registered and approved by the international prospective register of systematic reviews, PROSPERO (reference number CRD42021203871).