scholarly journals Addressing selection bias in the UK Biobank neurological imaging cohort

Author(s):  
Valerie C Bradley ◽  
Thomas E Nichols

The UK Biobank is a national prospective study of half a million participants between the ages of 40 and 69 at the time of recruitment between 2006 and 2010, established to facilitate research on diseases of aging. The imaging cohort is a subset of UK Biobank participants who have agreed to undergo extensive additional imaging assessments. However, Fry et al (2017) find evidence of "healthy volunteer bias" in the UK Biobank -- participants are less likely to smoke, be obese, consume alcohol daily than the target population of UK adults. Here we examine selection bias in the UK Biobank imaging cohort. We address two common misconceptions: first, that study size can compensate for bias in data collection, and second that selection bias does not affect estimates of associations, which are the primary interest of the UK Biobank. We introduce inverse probability weighting (IPW) as an approach commonly used in survey research that can be used to address selection bias in volunteer health studies like the UK Biobank. We discuss 6 such methods -- five existing and one novel --, assess relative performance in simulation studies, and apply them to the UK Biobank imaging cohort. We find that our novel method, BART for predicting the probability of selection combined with raking, performs well relative to existing methods, and helps alleviate selection bias in the UK Biobank imaging cohort.

2021 ◽  
Vol 11 ◽  
Author(s):  
C. M. Schooling ◽  
P. M. Lopez ◽  
Z. Yang ◽  
J. V. Zhao ◽  
Shiu Lun Au Yeung ◽  
...  

Background: Mendelian randomization (MR) provides unconfounded estimates. MR is open to selection bias when the underlying sample is selected on surviving to recruitment on the genetically instrumented exposure and competing risk of the outcome. Few methods to address this bias exist.Methods: We show that this selection bias can sometimes be addressed by adjusting for common causes of survival and outcome. We use multivariable MR to obtain a corrected MR estimate for statins on stroke. Statins affect survival, and stroke typically occurs later in life than ischemic heart disease (IHD), making estimates for stroke open to bias from competing risk.Results: In univariable MR in the UK Biobank, genetically instrumented statins did not protect against stroke [odds ratio (OR) 1.33, 95% confidence interval (CI) 0.80–2.20] but did in multivariable MR (OR 0.81, 95% CI 0.68–0.98) adjusted for major causes of survival and stroke [blood pressure, body mass index (BMI), and smoking initiation] with a multivariable Q-statistic indicating absence of selection bias. However, the MR estimate for statins on stroke using MEGASTROKE remained positive and the Q statistic indicated pleiotropy.Conclusion: MR studies of harmful exposures on late-onset diseases with shared etiology need to be conceptualized within a mechanistic understanding so as to identify any potential bias due to survival to recruitment on both genetically instrumented exposure and competing risk of the outcome, which may then be investigated using multivariable MR or estimated analytically and results interpreted accordingly.


2019 ◽  
Author(s):  
C Mary Schooling ◽  
Priscilla M Lopez ◽  
Zhao Yang ◽  
J V Zhao ◽  
SL Au Yeung ◽  
...  

AbstractBackgroundMendelian randomization (MR) provides unconfounded estimates. MR is open to selection bias particularly when the underlying sample is selected on surviving the genetically instrumented exposure and other conditions that share etiology with the outcome (competing risk before recruitment). Few methods to address this bias exist.MethodsWe use directed acyclic graphs to show this selection bias can be addressed by adjusting for common causes of survival and outcome. We use multivariable MR to obtain a corrected MR estimate, specifically, the effect of statin use on ischemic stroke, because statins affect survival and stroke typically occurs later in life than ischemic heart disease so is open to competing risk.ResultsIn univariable MR the genetically instrumented effect of statin use on ischemic stroke was in a harmful direction in MEGASTROKE and the UK Biobank (odds ratio (OR) 1.33, 95% confidence interval (CI) 0.80 to 2.20). In multivariable MR adjusted for major causes of survival and ischemic stroke, (blood pressure, body mass index and smoking initiation) the effect of statin use on stroke in the UK Biobank was as expected (OR 0.81, 95% CI 0.68 to 0.98) with a Q-statistic indicating absence of genetic pleiotropy or selection bias, but not in MEGASTROKE.ConclusionMR studies concerning late onset chronic conditions with shared etiology based on samples recruited in later life need to be conceptualized within a mechanistic understanding, so as to any identify potential bias due to competing risk before recruitment, and to inform the analysis and interpretation.


The Lancet ◽  
2012 ◽  
Vol 380 (9837) ◽  
pp. 110 ◽  
Author(s):  
James M Swanson
Keyword(s):  

Author(s):  
Gareth Griffith ◽  
Tim T Morris ◽  
Matt Tudball ◽  
Annie Herbert ◽  
Giulia Mancano ◽  
...  

StandfirstObservational data on COVID-19 including hypothesised risk factors for infection and progression are accruing rapidly. Here, we highlight the challenge of interpreting observational evidence from non-random samples of the population, which may be affected by collider bias. We illustrate these issues using data from the UK Biobank in which individuals tested for COVID-19 are highly selected for a wide range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. We discuss the sampling mechanisms that leave aetiological studies of COVID-19 infection and progression particularly susceptible to collider bias. We also describe several tools and strategies that could help mitigate the effects of collider bias in extant studies of COVID-19 and make available a web app for performing sensitivity analyses. While bias due to non-random sampling should be explored in existing studies, the optimal way to mitigate the problem is to use appropriate sampling strategies at the study design stage.Key messagesCollider bias can occur in studies that non-randomly sample people from the population of interest. This bias can distort associations between variables or induce spurious associations.It may be possible to estimate the underlying selection model or run sensitivity analyses to examine the credibility of the threat of collider bias, but it is difficult to prove that bias has been reduced or eliminated.Tested samples in the UK Biobank cohort are highly selected for a range of traits.Sampling strategies that are resilient to collider bias issues should be used at the design stage of data collection where possible.Where this is not possible, linkage or collection of data on the target population can help in sensitivity and validation analyses.


2019 ◽  
Author(s):  
Elizabeth Curtis ◽  
Justin Liu ◽  
Kate Ward ◽  
Karen Jameson ◽  
Zahra Raisi-Estabragh ◽  
...  

2020 ◽  
Author(s):  
John E. McGeary ◽  
Chelsie Benca-Bachman ◽  
Victoria Risner ◽  
Christopher G Beevers ◽  
Brandon Gibb ◽  
...  

Twin studies indicate that 30-40% of the disease liability for depression can be attributed to genetic differences. Here, we assess the explanatory ability of polygenic scores (PGS) based on broad- (PGSBD) and clinical- (PGSMDD) depression summary statistics from the UK Biobank using independent cohorts of adults (N=210; 100% European Ancestry) and children (N=728; 70% European Ancestry) who have been extensively phenotyped for depression and related neurocognitive phenotypes. PGS associations with depression severity and diagnosis were generally modest, and larger in adults than children. Polygenic prediction of depression-related phenotypes was mixed and varied by PGS. Higher PGSBD, in adults, was associated with a higher likelihood of having suicidal ideation, increased brooding and anhedonia, and lower levels of cognitive reappraisal; PGSMDD was positively associated with brooding and negatively related to cognitive reappraisal. Overall, PGS based on both broad and clinical depression phenotypes have modest utility in adult and child samples of depression.


Sign in / Sign up

Export Citation Format

Share Document