scholarly journals Submissions from the SPRINT Data Analysis Challenge on clinical risk prediction: a cross-sectional evaluation

BMJ Open ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. e025936
Author(s):  
Cynthia A Jackevicius ◽  
JaeJin An ◽  
Dennis T Ko ◽  
Joseph S Ross ◽  
Suveen Angraal ◽  
...  

ObjectivesTo collate and systematically characterise the methods, results and clinical performance of the clinical risk prediction submissions to the Systolic Blood Pressure Intervention Trial (SPRINT) Data Analysis Challenge.DesignCross-sectional evaluation.Data sourcesSPRINT Challenge online submission website.Study selectionSubmissions to the SPRINT Challenge for clinical prediction tools or clinical risk scores.Data extractionIn duplicate by three independent reviewers.ResultsOf 143 submissions, 29 met our inclusion criteria. Of these, 23/29 (79%) reported prediction models for an efficacy outcome (20/23 [87%] of these used the SPRINT study primary composite outcome, 14/29 [48%] used a safety outcome, and 4/29 [14%] examined a combined safety/efficacy outcome). Age and cardiovascular disease history were the most common variables retained in 80% (12/15) of the efficacy and 60% (6/10) of the safety models. However, no two submissions included an identical list of variables intending to predict the same outcomes. Model performance measures, most commonly, the C-statistic, were reported in 57% (13/23) of efficacy and 64% (9/14) of safety model submissions. Only 2/29 (7%) models reported external validation. Nine of 29 (31%) submissions developed and provided evaluable risk prediction tools. Using two hypothetical vignettes, 67% (6/9) of the tools provided expected recommendations for a low-risk patient, while 44% (4/9) did for a high-risk patient. Only 2/29 (7%) of the clinical risk prediction submissions have been published to date.ConclusionsDespite use of the same data source, a diversity of approaches, methods and results was produced by the 29 SPRINT Challenge competition submissions for clinical risk prediction. Of the nine evaluable risk prediction tools, clinical performance was suboptimal. By collating an overview of the range of approaches taken, researchers may further optimise the development of risk prediction tools in SPRINT-eligible populations, and our findings may inform the conduct of future similar open science projects.

BMJ Open ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. e038088
Author(s):  
Jacky Tu ◽  
Peter Gowdie ◽  
Julian Cassar ◽  
Simon Craig

BackgroundSeptic arthritis is an uncommon but potentially significant diagnosis to be considered when a child presents to the emergency department (ED) with non-traumatic limp. Our objective was to determine the diagnostic accuracy of clinical findings (history and examination) and investigation results (pathology tests and imaging) for the diagnosis of septic arthritis among children presenting with acute non-traumatic limp to the ED.MethodsSystematic review of the literature published between 1966 and June 2019 on MEDLINE and EMBASE databases. Studies were included if they evaluated children presenting with lower limb complaints and evaluated diagnostic performance of items from history, physical examination, laboratory testing or radiological examination. Data were independently extracted by two authors, and quality assessment was performed using the Quality Assessment Tool for Diagnostic Accuracy Studies 2 tool.Results18 studies were identified, and included 2672 children (560 with a final diagnosis of septic arthritis). There was substantial heterogeneity in inclusion criteria, study setting, definitions of specific variables and the gold standard used to confirm septic arthritis. Clinical and investigation findings were reported using varying definitions and cut-offs, and applied to differing study populations. Spectrum bias and poor-to-moderate study design quality limit their applicability to the ED setting.Single studies suggest that the presence of joint tenderness (n=189; positive likelihood ratio 11.4 (95% CI 5.9 to 22.0); negative likelihood ratio 0.2 (95% CI 0.0 to 1.2)) and joint effusion on ultrasound (n=127; positive likelihood ratio 8.4 (95% CI 4.1 to 17.1); negative likelihood ratio 0.2 (95% CI 0.1 to 0.3)) appear to be useful. Two promising clinical risk prediction tools were identified, however, their performance was notably lower when tested in external validation studies.DiscussionDifferentiating children with septic arthritis from non-emergent disorders of non-traumatic limp remains a key diagnostic challenge for emergency physicians. There is a need for prospectively derived and validated ED-based clinical risk prediction tools.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Michelle Louise Gatt ◽  
Maria Cassar ◽  
Sandra C. Buttigieg

Purpose The purpose of this paper is to identify and analyse the readmission risk prediction tools reported in the literature and their benefits when it comes to healthcare organisations and management.Design/methodology/approach Readmission risk prediction is a growing topic of interest with the aim of identifying patients in particular those suffering from chronic diseases such as congestive heart failure, chronic obstructive pulmonary disease and diabetes, who are at risk of readmission. Several models have been developed with different levels of predictive ability. A structured and extensive literature search of several databases was conducted using the Preferred Reporting Items for Systematic Reviews and Meta-analysis strategy, and this yielded a total of 48,984 records.Findings Forty-three articles were selected for full-text and extensive review after following the screening process and according to the eligibility criteria. About 34 unique readmission risk prediction models were identified, in which their predictive ability ranged from poor to good (c statistic 0.5–0.86). Readmission rates ranged between 3.1 and 74.1% depending on the risk category. This review shows that readmission risk prediction is a complex process and is still relatively new as a concept and poorly understood. It confirms that readmission prediction models hold significant accuracy at identifying patients at higher risk for such an event within specific context.Research limitations/implications Since most prediction models were developed for specific populations, conditions or hospital settings, the generalisability and transferability of the predictions across wider or other contexts may be difficult to achieve. Therefore, the value of prediction models remains limited to hospital management. Future research is indicated in this regard.Originality/value This review is the first to cover readmission risk prediction tools that have been published in the literature since 2011, thereby providing an assessment of the relevance of this crucial KPI to health organisations and managers.


Author(s):  
Chenxi Huang ◽  
Shu-Xia Li ◽  
César Caraballo ◽  
Frederick A. Masoudi ◽  
John S. Rumsfeld ◽  
...  

Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics. Methods and Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics. Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.


2016 ◽  
Vol 24 (1) ◽  
pp. 198-208 ◽  
Author(s):  
Benjamin A Goldstein ◽  
Ann Marie Navar ◽  
Michael J Pencina ◽  
John P A Ioannidis

Objective: Electronic health records (EHRs) are an increasingly common data source for clinical risk prediction, presenting both unique analytic opportunities and challenges. We sought to evaluate the current state of EHR based risk prediction modeling through a systematic review of clinical prediction studies using EHR data. Methods: We searched PubMed for articles that reported on the use of an EHR to develop a risk prediction model from 2009 to 2014. Articles were extracted by two reviewers, and we abstracted information on study design, use of EHR data, model building, and performance from each publication and supplementary documentation. Results: We identified 107 articles from 15 different countries. Studies were generally very large (median sample size = 26 100) and utilized a diverse array of predictors. Most used validation techniques (n = 94 of 107) and reported model coefficients for reproducibility (n = 83). However, studies did not fully leverage the breadth of EHR data, as they uncommonly used longitudinal information (n = 37) and employed relatively few predictor variables (median = 27 variables). Less than half of the studies were multicenter (n = 50) and only 26 performed validation across sites. Many studies did not fully address biases of EHR data such as missing data or loss to follow-up. Average c-statistics for different outcomes were: mortality (0.84), clinical prediction (0.83), hospitalization (0.71), and service utilization (0.71). Conclusions: EHR data present both opportunities and challenges for clinical risk prediction. There is room for improvement in designing such studies.


Author(s):  
Theodoros Evgeniou ◽  
Mathilde Fekom ◽  
Anton Ovchinnikov ◽  
Raphael Porcher ◽  
Camille Pouchol ◽  
...  

Background: In early May 2020, following social distancing measures due to COVID-19, governments consider relaxing lock-down. We combined individual clinical risk predictions with epidemic modelling to examine simulations of risk based differential isolation and exit policies. Methods: We extended a standard susceptible-exposed-infected-removed (SEIR) model to account for personalised predictions of severity, defined by the risk of an individual needing intensive care if infected, and simulated differential isolation policies using COVID-19 data and estimates in France as of early May 2020. We also performed sensitivity analyses. The framework may be used with other epidemic models, with other risk predictions, and for other epidemic outbreaks. Findings: Simulations indicated that, assuming everything else the same, an exit policy considering clinical risk predictions starting on May 11, as planned by the French government, could enable to immediately relax restrictions for an extra 10% (6 700 000 people) or more of the lowest-risk population, and consequently relax the restrictions on the remaining population significantly faster -- while abiding to the current ICU capacity. Similar exit policies without risk predictions would exceed the ICU capacity by a multiple. Sensitivity analyses showed that when the assumed percentage of severe patients among the population decreased, or the prediction model discrimination improved, or the ICU capacity increased, policies based on risk models had a greater impact on the results of epidemic simulations. At the same time, sensitivity analyses also showed that differential isolation policies require the higher risk individuals to comply with recommended restrictions. In general, our simulations demonstrated that risk prediction models could improve policy effectiveness, keeping everything else constant. Interpretation: Clinical risk prediction models can inform new personalised isolation and exit policies, which may lead to both safer and faster outcomes than what can be achieved without such prediction models.


2021 ◽  
Author(s):  
Harvineet Singh ◽  
Vishwali Mhasawade ◽  
Rumi Chunara

Importance: Modern predictive models require large amounts of data for training and evaluation which can result in building models that are specific to certain locations, populations in them and clinical practices. Yet, best practices and guidelines for clinical risk prediction models have not yet considered such challenges to generalizability. Objectives: To investigate changes in measures of predictive discrimination, calibration, and algorithmic fairness when transferring models for predicting in-hospital mortality across ICUs in different populations. Also, to study the reasons for the lack of generalizability in these measures. Design, Setting, and Participants: In this multi-center cross-sectional study, electronic health records from 179 hospitals across the US with 70,126 hospitalizations were analyzed. Time of data collection ranged from 2014 to 2015. Main Outcomes and Measures: The main outcome is in-hospital mortality. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for discrimination and calibration metrics, namely area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm "Fast Causal Inference" (FCI) that infers paths of causal influence while identifying potential influences associated with unmeasured variables. Results: In-hospital mortality rates differed in the range of 3.9%-9.3% (1st-3rd quartile) across hospitals. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st to 3rd quartile; median 0.801); calibration slope from 0.725 to 0.983 (1st to 3rd quartile; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (1st to 3rd quartile; median 0.092). When transferring models across geographies, AUC ranged from 0.795 to 0.813 (1st to 3rd quartile; median 0.804); calibration slope from 0.904 to 1.018 (1st to 3rd quartile; median 0.968); and disparity in false negative rates from 0.018 to 0.074 (1st to 3rd quartile; median 0.040). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. Shifts in the race variable distribution and some clinical (vitals, labs and surgery) variables by hospital or region. Race variable also mediates differences in the relationship between clinical variables and mortality, by hospital/region. Conclusions and Relevance: Group-specific metrics should be assessed during generalizability checks to identify potential harms to the groups. In order to develop methods to improve and guarantee performance of prediction models in new environments for groups and individuals, better understanding and provenance of health processes as well as data generating processes by sub-group are needed to identify and mitigate sources of variation.


2021 ◽  
Author(s):  
Yixuan He ◽  
Chirag M Lakhani ◽  
Danielle Rasooly ◽  
Arjun K Manrai ◽  
Ioanna Tzoulaki ◽  
...  

OBJECTIVE: <p>Establish a polyexposure score for T2D incorporating 12 non-genetic exposure and examine whether a polyexposure and/or a polygenic risk score improves diabetes prediction beyond traditional clinical risk factors.</p> <h2><a></a>RESEARCH DESIGN AND METHODS:</h2> <p>We identified 356,621 unrelated individuals from the UK Biobank of white British ancestry with no prior diagnosis of T2D and normal HbA1c levels. Using self-reported and hospital admission information, we deployed a machine learning procedure to select the most predictive and robust factors out of 111 non-genetically ascertained exposure and lifestyle variables for the polyexposure risk score (PXS) in prospective T2D. We computed the clinical risk score (CRS) and polygenic risk score (PGS) by taking a weighted sum of eight established clinical risk factors and over six million SNPs, respectively.</p> <h2><a></a>RESULTS:</h2> <p>In the study population, 7,513 had incident T2D. The C-statistics for the PGS, PXS, and CRS models were 0.709, 0.762, and 0.839, respectively. Hazard ratios (HR) associated with risk score values in the top 10% percentile versus the remaining population is 2.00, 5.90, and 9.97 for PGS, PXS, and CRS respectively. Addition of PGS and PXS to CRS improves T2D classification accuracy with a continuous net reclassification index of 15.2% and 30.1% for cases, respectively, and 7.3% and 16.9% for controls, respectively. </p> <h2><a></a>CONCLUSIONS:</h2> <p>For T2D, the PXS provides modest incremental predictive value over established clinical risk factors. The concept of PXS merits further consideration in T2D risk stratification and is likely to be useful in other chronic disease risk prediction models.</p>


2021 ◽  
Author(s):  
Hong Sun ◽  
Kristof Depraetere ◽  
Laurent Meesseman ◽  
Patricia Cabanillas Silva ◽  
Ralph Szymanowsky ◽  
...  

BACKGROUND Machine learning (ML) algorithms are currently used in a wide array of clinical domains to produce models that can predict clinical risk events. Most models are developed and evaluated with retrospective data, very few are evaluated in a clinical workflow, and even fewer report performances in different hospitals. We provide detailed evaluations of clinical risk prediction models in live clinical workflows for three different use cases in three different hospitals. OBJECTIVE The main objective of this study is to evaluate the clinical risk prediction models in live clinical workflows and compare with their performance on retrospective data. We also aimed at generalizing the results by applying our investigation to three different use cases in three different hospitals. METHODS We trained clinical risk prediction models for three use cases (delirium, sepsis and acute kidney injury (AKI)) in three different hospitals with retrospective data. The models are deployed in these three hospitals and used in daily clinical practice. The predictions made by these models are logged and correlated with the diagnosis at discharge. We compared the performance with evaluations on retrospective data and conducted cross-hospital evaluations. RESULTS The performance of the prediction models in live clinical workflows is similar to the performance with retrospective data. The average value of area under the receiver-operating characteristic curve (AUROC) decreases slightly by 0.8 percentage point (from 89.4 % to 88.6%). The cross-hospital evaluations exhibit severe reduced performance, the averaged AUROC decreased by 8 percentage point (from 94.2% to 86.3%), which indicates the importance of model calibration with data from deployment hospitals. CONCLUSIONS Calibrating the prediction model with data from different deployment hospitals leads to a good performance in live settings. The performance degradation in the cross-hospital evaluation indicates limitations in developing a generic model for different hospitals. Designing a generic model development process to generate specialized prediction models for each hospital guarantees the model performance in different hospitals.


The Lancet ◽  
2017 ◽  
Vol 390 ◽  
pp. S40
Author(s):  
Benjamin J Gray ◽  
Jeffrey W Stephens ◽  
Michael Thomas ◽  
Sally P Williams ◽  
Christine A Davies ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document