scholarly journals Design matters in patient-level prediction: evaluation of a cohort vs. case-control design when developing predictive models in observational healthcare datasets

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Jenna M. Reps ◽  
Patrick B. Ryan ◽  
Peter R. Rijnbeek ◽  
Martijn J. Schuemie

Abstract Background The design used to create labelled data for training prediction models from observational healthcare databases (e.g., case-control and cohort) may impact the clinical usefulness. We aim to investigate hypothetical design issues and determine how the design impacts prediction model performance. Aim To empirically investigate differences between models developed using a case-control design and a cohort design. Methods Using a US claims database, we replicated two published prediction models (dementia and type 2 diabetes) which were developed using a case-control design, and trained models for the same prediction questions using cohort designs. We validated each model on data mimicking the point in time the models would be applied in clinical practice. We calculated the models’ discrimination and calibration-in-the-large performances. Results The dementia models obtained area under the receiver operating characteristics of 0.560 and 0.897 for the case-control and cohort designs respectively. The type 2 diabetes models obtained area under the receiver operating characteristics of 0.733 and 0.727 for the case-control and cohort designs respectively. The dementia and diabetes case-control models were both poorly calibrated, whereas the dementia cohort model achieved good calibration. We show that careful construction of a case-control design can lead to comparable discriminative performance as a cohort design, but case-control designs over-represent the outcome class leading to miscalibration. Conclusions Any case-control design can be converted to a cohort design. We recommend that researchers with observational data use the less subjective and generally better calibrated cohort design when extracting labelled data. However, if a carefully constructed case-control design is used, then the model must be prospectively validated using a cohort design for fair evaluation and be recalibrated.

2021 ◽  
Author(s):  
Jenna Reps ◽  
Patrick B. Ryan ◽  
Peter R. Rijnbeek ◽  
Martijn J. Schuemie

Abstract Background: The study design used to develop prediction models in observational healthcare databases (e.g., case-control and cohort) may impact the clinical usefulness. We aim to quantify how the choice of design impacts prediction model performance. Aim: To empirically investigate differences between models developed using a case-control design and a cohort design.Methods: Using a US claims database, we replicated two published prediction models (dementia and type 2 diabetes) which were developed using a case-control design, and also train models for the same prediction questions using cohort designs. We validated each model on data mimicking the point in time the models would be applied in clinical practice. We calculate the models’ discrimination and calibration-in-the-large performances.Results: The dementia models obtained area under the receiver operating characteristics of 0.560 and 0.897 for the case-control and cohort designs respectively. The type 2 diabetes models obtained area under the receiver operating characteristics of 0.733 and 0.727 for the case-control and cohort designs respectively. The dementia and diabetes case-control models were both poorly calibrated, whereas the dementia cohort model achieved good calibration. We show that careful construction of a case-control design can lead to comparable discriminative performance as a cohort design, but case-control designs generally oversample the outcome leading to miscalibration. Conclusion: Any case-control design can be converted to a cohort design. We recommend that researchers with observational data use the less subjective and generally better calibrated cohort design. However, if a carefully constructed case-control design is used, then the model must be prospectively validated using a cohort design for fair evaluation and be recalibrated.


Biostatistics ◽  
2016 ◽  
Vol 17 (3) ◽  
pp. 499-522 ◽  
Author(s):  
Ying Huang

Abstract Two-phase sampling design, where biomarkers are subsampled from a phase-one cohort sample representative of the target population, has become the gold standard in biomarker evaluation. Many two-phase case–control studies involve biased sampling of cases and/or controls in the second phase. For example, controls are often frequency-matched to cases with respect to other covariates. Ignoring biased sampling of cases and/or controls can lead to biased inference regarding biomarkers' classification accuracy. Considering the problems of estimating and comparing the area under the receiver operating characteristics curve (AUC) for a binary disease outcome, the impact of biased sampling of cases and/or controls on inference and the strategy to efficiently account for the sampling scheme have not been well studied. In this project, we investigate the inverse-probability-weighted method to adjust for biased sampling in estimating and comparing AUC. Asymptotic properties of the estimator and its inference procedure are developed for both Bernoulli sampling and finite-population stratified sampling. In simulation studies, the weighted estimators provide valid inference for estimation and hypothesis testing, while the standard empirical estimators can generate invalid inference. We demonstrate the use of the analytical variance formula for optimizing sampling schemes in biomarker study design and the application of the proposed AUC estimators to examples in HIV vaccine research and prostate cancer research.


2016 ◽  
Vol 22 ◽  
pp. 183
Author(s):  
Shahjada Selim ◽  
Shahjada Selim ◽  
Shahabul Chowdhury ◽  
Mohammad Saifuddin ◽  
Marufa Mustary ◽  
...  

Diagnostica ◽  
2019 ◽  
Vol 65 (3) ◽  
pp. 179-190 ◽  
Author(s):  
Vincent Mustapha ◽  
Renate Rau

Zusammenfassung. Cut-Off-Werte ermöglichen eine ökonomische, binäre Beurteilung von Summenscores. Für Beanspruchungsfragebögen, die personenbezogene Merkmale erfragen, sind Cut-Off-Werte häufig vorhanden und in der klinischen Diagnostik unerlässlich. Für die Bewertung von Arbeitsmerkmalen sind Cut-Off-Werte ebenfalls wünschenswert. Bislang fehlen sie jedoch für die Beurteilung von Arbeitsmerkmalen wie Arbeitsintensität und Tätigkeitsspielraum. Zwischen 2006 und 2016 wurden daher in verschiedenen Branchen 801 objektive Arbeitsplatzanalysen durchgeführt, welche eine Unterteilung in gut und schlecht gestalteten Tätigkeitsspielraum sowie gut und schlecht gestaltete Arbeitsintensität nach DIN EN ISO 6385 (2016) ermöglichen. Anhand dieser Unterteilung wurden mit der Receiver-Operating-Characteristics-Analyse Cut-Off-Werte für den subjektiv-bedingungsbezogen Fragebogen zum Erleben von Arbeitsintensität und Tätigkeitsspielraum (FIT; Richter et al., 2000 ) ermittelt. Für den Tätigkeitsspielraum weisen Summenscores ≤ 22 und für die Arbeitsintensität Summenscores ≥ 15 auf eine schlechte Gestaltung des jeweiligen Arbeitsmerkmals hin. Anhand einer weiteren Stichprobe von 1 076 Arbeitenden konnte gezeigt werden, dass Arbeitende mit schlecht gestaltetem Tätigkeitspielraum vital erschöpfter sowie weniger engagiert sind und Arbeitende mit schlecht gestalteter Arbeitsintensität eine höhere Erholungsunfähigkeit sowie vitale Erschöpfung aufweisen.


2018 ◽  
Vol 13 (3) ◽  
pp. 215-221 ◽  
Author(s):  
Francesco Ursini ◽  
Salvatore D`Angelo ◽  
Emilio Russo ◽  
Giorgio Ammerata ◽  
Ludovico Abenavoli ◽  
...  

BMJ Open ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. e044486 ◽  
Author(s):  
Per Svensson ◽  
Robin Hofmann ◽  
Henrike Häbel ◽  
Tomas Jernberg ◽  
Per Nordberg

AimsThe risks associated with diabetes, obesity and hypertension for severe COVID-19 may be confounded and differ by sociodemographic background. We assessed the risks associated with cardiometabolic factors for severe COVID-19 when accounting for socioeconomic factors and in subgroups by age, sex and region of birth.Methods and resultsIn this nationwide case–control study, 1.086 patients admitted to intensive care with COVID-19 requiring mechanical ventilation (cases), and 10.860 population-based controls matched for age, sex and district of residency were included from mandatory national registries. ORs with 95% CIs for associations between severe COVID-19 and exposures with adjustment for confounders were estimated using logistic regression. The median age was 62 years (IQR 52–70), and 3003 (24.9%) were women. Type 2 diabetes (OR, 2.3 (95% CI 1.9 to 2.7)), hypertension (OR, 1.7 (95% CI 1.5 to 2.0)), obesity (OR, 3.1 (95% CI 2.4 to 4.0)) and chronic kidney disease (OR, 2.5 (95% CI 1.7 to 3.7)) were all associated with severe COVID-19. In the younger subgroup (below 57 years), ORs were significantly higher for all cardiometabolic risk factors. The risk associated with type 2 diabetes was higher in women (p=0.001) and in patients with a region of birth outside European Union(EU) (p=0.004).ConclusionDiabetes, obesity and hypertension were all independently associated with severe COVID-19 with stronger associations in the younger population. Type 2 diabetes implied a greater risk among women and in non-EU immigrants. These findings, originating from high-quality Swedish registries, may be important to direct preventive measures such as vaccination to susceptible patient groups.Trial registration numberClinicaltrial.gov (NCT04426084).


Author(s):  
Onofre Pineda ◽  
Victoria Stepenka ◽  
Alejandra Rivas-Motenegro ◽  
Nelson Villasmil-Hernandez ◽  
Roberto Añez ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document