Abstract
Background Early warning scores (EWS) have been developed as clinical prognostication tools to identify acutely deteriorating patients. With recent advancements in machine learning, there has been a proliferation of studies that describe the development and validation of novel EWS. Systematic reviews of published studies which focus on evaluating performance of both well-established and novel EWS have shown conflicting conclusions. A possible reason for this is the lack of consistency in the validation methods used. In this review, we aim to examine the methodologies and performance metrics used in studies which describe EWS validation.Methods A systematic review of all eligible studies in the MEDLINE database from inception to 22-Feb-2019 was performed. Studies were eligible if they performed validation on at least one EWS and reported associations between EWS scores and mortality, intensive care unit (ICU) transfers, or cardiac arrest (CA) of adults within the inpatient setting. Two reviewers independently did a full-text review and performed data abstraction by using standardized data-worksheet based on the TRIPOD (Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) checklist. Meta-analysis was not performed due to heterogeneity.Results The key differences in validation methodologies identified were (1) validation population characteristics, (2) outcomes of interest, (3) case definition, intended time of use and aggregation methods, and (4) handling of missing values in the validation dataset. In terms of case definition, among the 34 eligible studies, 22 used the patient episode case definition while 10 used the observation set case definition, and 2 did the validation using both case definitions. Of those that used the patient episode case definition, 11 studies used a single point of time score to validate the EWS, most of which used the first recorded observation. There were also more than 10 different performance metrics reported among the studies.Conclusions Methodologies and performance metrics used in studies performing validation on EWS were not consistent hence making it difficult to interpret and compare EWS performance. Standardizing EWS validation methodology and reporting can potentially address this issue.