scholarly journals Feasibility and Evaluation of a Large-Scale External Validation Approach for Patient-Level Prediction in an International Data Network: Validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation.

2020 ◽  
Author(s):  
Jenna Marie Reps ◽  
Ross D Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Background To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Methods Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation .Conclusion This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.

2020 ◽  
Author(s):  
Jenna Marie Reps ◽  
Ross Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Objective: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Materials & Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Discussion: This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. Conclusion : In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


Author(s):  
Jenna Marie Reps ◽  
Ross D Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Background: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Conclusion : This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


2019 ◽  
Author(s):  
Jenna Marie Reps ◽  
Ross Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Objective To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to externally validate patient-level prediction models at scale. Materials & Methods Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks and a network study was run that enabled the five models to be externally validated across nine datasets spanning three countries and five independent sites. Results The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and their performances in predicting stroke within 1 year of initial atrial fibrillation diagnosis for females were comparable with existing studies. The validation network study took 60 days once the models were replicated and an R package for the study was published to collaborators at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation. Discussion This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability and reproducibility of prediction models, but without collaborative approaches it can take three or more years to be validated by one independent researcher. Conclusion In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months.


Author(s):  
Jet M. J. Vonk ◽  
Jacoba P. Greving ◽  
Vilmundur Gudnason ◽  
Lenore J. Launer ◽  
Mirjam I. Geerlings

AbstractWe aimed to evaluate the external performance of prediction models for all-cause dementia or AD in the general population, which can aid selection of high-risk individuals for clinical trials and prevention. We identified 17 out of 36 eligible published prognostic models for external validation in the population-based AGES-Reykjavik Study. Predictive performance was assessed with c statistics and calibration plots. All five models with a c statistic > .75 (.76–.81) contained cognitive testing as a predictor, while all models with lower c statistics (.67–.75) did not. Calibration ranged from good to poor across all models, including systematic risk overestimation or overestimation for particularly the highest risk group. Models that overestimate risk may be acceptable for exclusion purposes, but lack the ability to accurately identify individuals at higher dementia risk. Both updating existing models or developing new models aimed at identifying high-risk individuals, as well as more external validation studies of dementia prediction models are warranted.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jill Hardin ◽  
Jenna M. Reps

Abstract Background The goal of our study is to examine the impact of the lookback length when engineering features to use in developing predictive models using observational healthcare data. Using a longer lookback for feature engineering gives more insight about patients but increases the issue of left-censoring. Methods We used five US observational databases to develop patient-level prediction models. A target cohort of subjects with hypertensive drug exposures and outcome cohorts of subjects with acute (stroke and gastrointestinal bleeding) and chronic outcomes (diabetes and chronic kidney disease) were developed. Candidate predictors that exist on or prior to the target index date were derived within the following lookback periods: 14, 30, 90, 180, 365, 730, and all days prior to index were evaluated. We predicted the risk of outcomes occurring 1 day until 365 days after index. Ten lasso logistic models for each lookback period were generated to create a distribution of area under the curve (AUC) metrics to evaluate the discriminative performance of the models. Calibration intercept and slope were also calculated. Impact on external validation performance was investigated across five databases. Results The maximum differences in AUCs for the models developed using different lookback periods within a database was < 0.04 for diabetes (in MDCR AUC of 0.593 with 14-day lookback vs. AUC of 0.631 with all-time lookback) and 0.012 for renal impairment (in MDCR AUC of 0.675 with 30-day lookback vs. AUC of 0.687 with 365-day lookback ). For the acute outcomes, the max difference in AUC across lookbacks within a database was 0.015 (in MDCD AUC of 0.767 with 14-day lookback vs. AUC 0.782 with 365-day lookback) for stroke and < 0.03 for gastrointestinal bleeding (in CCAE AUC of 0.631 with 14-day lookback vs. AUC of 0.660 with 730-day lookback). Conclusions In general the choice of covariate lookback had only a small impact on discrimination and calibration, with a short lookback (< 180 days) occasionally decreasing discrimination. Based on the results, if training a logistic regression model for prediction then using covariates with a 365 day lookback appear to be a good tradeoff between performance and interpretation.


2021 ◽  
Author(s):  
Sara Khalid ◽  
Cynthia Yang ◽  
Clair Blacketer ◽  
Talita Duarte-Salles ◽  
Sergio Fernández-Bertolín ◽  
...  

Background and ObjectiveAs a response to the ongoing COVID-19 pandemic, several prediction models have been rapidly developed, with the aim of providing evidence-based guidance. However, no COVID-19 prediction model in the existing literature has been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation and publicly providing all analytical source code).MethodsWe show step-by-step how to implement the pipeline for the question: ‘In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization’. We develop models using six different machine learning methods in a US claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the US.ResultsOur open-source tools enabled us to efficiently go end-to-end from problem design to reliable model development and evaluation. When predicting death in patients hospitalized for COVID-19 adaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated.ConclusionOur results show that following the OHDSI analytics pipeline for patient-level prediction can enable the rapid development towards reliable prediction models. The OHDSI tools and pipeline are open source and available to researchers around the world.


Gut ◽  
2018 ◽  
Vol 68 (4) ◽  
pp. 672-683 ◽  
Author(s):  
Todd Smith ◽  
David C Muller ◽  
Karel G M Moons ◽  
Amanda J Cross ◽  
Mattias Johansson ◽  
...  

ObjectiveTo systematically identify and validate published colorectal cancer risk prediction models that do not require invasive testing in two large population-based prospective cohorts.DesignModels were identified through an update of a published systematic review and validated in the European Prospective Investigation into Cancer and Nutrition (EPIC) and the UK Biobank. The performance of the models to predict the occurrence of colorectal cancer within 5 or 10 years after study enrolment was assessed by discrimination (C-statistic) and calibration (plots of observed vs predicted probability).ResultsThe systematic review and its update identified 16 models from 8 publications (8 colorectal, 5 colon and 3 rectal). The number of participants included in each model validation ranged from 41 587 to 396 515, and the number of cases ranged from 115 to 1781. Eligible and ineligible participants across the models were largely comparable. Calibration of the models, where assessable, was very good and further improved by recalibration. The C-statistics of the models were largely similar between validation cohorts with the highest values achieved being 0.70 (95% CI 0.68 to 0.72) in the UK Biobank and 0.71 (95% CI 0.67 to 0.74) in EPIC.ConclusionSeveral of these non-invasive models exhibited good calibration and discrimination within both external validation populations and are therefore potentially suitable candidates for the facilitation of risk stratification in population-based colorectal screening programmes. Future work should both evaluate this potential, through modelling and impact studies, and ascertain if further enhancement in their performance can be obtained.


2020 ◽  
Author(s):  
Chava L Ramspek ◽  
Kitty J Jager ◽  
Friedo W Dekker ◽  
Carmine Zoccali ◽  
Merel van Diepen

Abstract Prognostic models that aim to improve the prediction of clinical events, individualized treatment and decision-making are increasingly being developed and published. However, relatively few models are externally validated and validation by independent researchers is rare. External validation is necessary to determine a prediction model’s reproducibility and generalizability to new and different patients. Various methodological considerations are important when assessing or designing an external validation study. In this article, an overview is provided of these considerations, starting with what external validation is, what types of external validation can be distinguished and why such studies are a crucial step towards the clinical implementation of accurate prediction models. Statistical analyses and interpretation of external validation results are reviewed in an intuitive manner and considerations for selecting an appropriate existing prediction model and external validation population are discussed. This study enables clinicians and researchers to gain a deeper understanding of how to interpret model validation results and how to translate these results to their own patient population.


2021 ◽  
Vol 11 ◽  
Author(s):  
Le Kuai ◽  
Ying Zhang ◽  
Ying Luo ◽  
Wei Li ◽  
Xiao-dong Li ◽  
...  

ObjectiveA proportional hazard model was applied to develop a large-scale prognostic model and nomogram incorporating clinicopathological characteristics, histological type, tumor differentiation grade, and tumor deposit count to provide clinicians and patients diagnosed with colon cancer liver metastases (CLM) a more comprehensive and practical outcome measure.MethodsUsing the Transparent Reporting of multivariable prediction models for individual Prognosis or Diagnosis (TRIPOD) guidelines, this study identified 14,697 patients diagnosed with CLM from 1975 to 2017 in the Surveillance, Epidemiology, and End Results (SEER) 21 registry database. Patients were divided into a modeling group (n=9800), an internal validation group (n=4897) using computerized randomization. An independent external validation cohort (n=60) was obtained. Univariable and multivariate Cox analyses were performed to identify prognostic predictors for overall survival (OS). Subsequently, the nomogram was constructed, and the verification was undertaken by receiver operating curves (AUC) and calibration curves.ResultsHistological type, tumor differentiation grade, and tumor deposit count were independent prognostic predictors for CLM. The nomogram consisted of age, sex, primary site, T category, N category, metastasis of bone, brain or lung, surgery, and chemotherapy. The model achieved excellent prediction power on both internal (mean AUC=0.811) and external validation (mean AUC=0.727), respectively, which were significantly higher than the American Joint Committee on Cancer (AJCC) TNM system.ConclusionThis study proposes a prognostic nomogram for predicting 1- and 2-year survival based on histopathological and population-based data of CLM patients developed using TRIPOD guidelines. Compared with the TNM stage, our nomogram has better consistency and calibration for predicting the OS of CLM patients.


Sign in / Sign up

Export Citation Format

Share Document