scholarly journals Evaluating and implementing block jackknife resampling Mendelian randomization to mitigate bias induced by overlapping samples

Author(s):  
Si Fang ◽  
Gibran Hemani ◽  
Tom G Richardson ◽  
Tom R Gaunt ◽  
George Davey Smith

AbstractParticipant overlap has been thought to induce overfitting bias into Mendelian randomization (MR) and polygenic risk score (PRS) studies. This hinders the potential research into many unique traits and disease outcomes from large-scale biobanks. Here, we evaluated a block jackknife resampling framework for genome-wide association studies (GWAS) and PRS construction to mitigate the influence of overfitting bias on MR analyses compared to alternative approaches and implemented this study design in causal inference setting using data from the UK Biobank.We simulated PRS and MR under three scenarios: (1) using weighted SNP estimates from an external GWAS, (2) using weighted SNP estimates from an overlapping GWAS sample and (3) using a block jackknife resampling framework. Based on a conventional P-value threshold to derive genetic instruments for MR studies (P<5×10−8), our block-jackknifing PRS did not suffer from overfitting bias (mean R2=0.034) compared to the externally weighted PRS (mean R2=0.040). In contrast, genetic instruments derived from overlapping samples explained a higher proportion of variance (mean R2=0.048) compared to the externally derived score. The detrimental impact of overfitting bias became considerably larger when using a more liberal P-value threshold to construct PRS (e.g., P<0.05, mean R2=0.103), whereas estimates using jackknife score remained robust to overfitting (mean R2=0.084).In an applied setting, we examined (A) the effects of body mass index on circulating biomarkers and (B) the effect of childhood body size on levels of testosterone in adulthood using methods described above. In the first applied analysis, overlapping sample PRS and block jackknife resampled PRS led to comparable effect sizes, whereas narrower confidence intervals were identified when using the overlapping sample instrument. In the second example, through sex-stratified multivariable and bi-directional MR, we demonstrate that childhood body size indirectly leads to lower testosterone levels in adulthood in males, an effect mediated through adult body size.Author summaryUsing genetic variants as instrumental variables for risk factors, Mendelian randomization (MR) provides an approach to explore the genetically predicted effects of modifiable risk factors on disease which is robust to confounding and reverse causation. Genetic instrumental variables are conventionally selected from results of genome-wide association studies on an independent dataset whose sample does not overlap with the dataset being analysed using MR analysis, as this can lead to overfitting bias. This can often be challenging to entirely avoid however, as such association studies are increasingly being performed by meta-analysing several biobanks to achieve the maximum power to detect variants with smaller effect sizes. Moreover, when investigating exposures and outcomes which only a single biobank has measured in sufficiently large samples, avoiding participant overlap requires splitting the study population into subgroups which can limit statistical power. Block jackknife resampling MR provides a solution to conduct causal inference under these circumstances with the maximum statistical power while avoiding bias due to overlapping participants. In this study, we evaluated this study design with simulated dataset in comparison to MR using genetic variants discovered from an external dataset or one with overlapping samples. We applied this approach using UK Biobank to investigate the role of body mass index on circulating biomarkers, as well as the causal relationship between childhood adiposity and testosterone levels in adulthood.

Author(s):  
Daniel B. Rosoff ◽  
Toni-Kim Clarke ◽  
Mark J. Adams ◽  
Andrew M. McIntosh ◽  
George Davey Smith ◽  
...  

Abstract Observational studies suggest that lower educational attainment (EA) may be associated with risky alcohol use behaviors; however, these findings may be biased by confounding and reverse causality. We performed two-sample Mendelian randomization (MR) using summary statistics from recent genome-wide association studies (GWAS) with >780,000 participants to assess the causal effects of EA on alcohol use behaviors and alcohol dependence (AD). Fifty-three independent genome-wide significant SNPs previously associated with EA were tested for association with alcohol use behaviors. We show that while genetic instruments associated with increased EA are not associated with total amount of weekly drinks, they are associated with reduced frequency of binge drinking ≥6 drinks (ßIVW = −0.198, 95% CI, −0.297 to –0.099, PIVW = 9.14 × 10−5), reduced total drinks consumed per drinking day (ßIVW = −0.207, 95% CI, −0.293 to –0.120, PIVW = 2.87 × 10−6), as well as lower weekly distilled spirits intake (ßIVW = −0.148, 95% CI, −0.188 to –0.107, PIVW = 6.24 × 10−13). Conversely, genetic instruments for increased EA were associated with increased alcohol intake frequency (ßIVW = 0.331, 95% CI, 0.267–0.396, PIVW = 4.62 × 10−24), and increased weekly white wine (ßIVW = 0.199, 95% CI, 0.159–0.238, PIVW = 7.96 × 10−23) and red wine intake (ßIVW = 0.204, 95% CI, 0.161–0.248, PIVW = 6.67 × 10−20). Genetic instruments associated with increased EA reduced AD risk: an additional 3.61 years schooling reduced the risk by ~50% (ORIVW = 0.508, 95% CI, 0.315–0.819, PIVW = 5.52 × 10−3). Consistency of results across complementary MR methods accommodating different assumptions about genetic pleiotropy strengthened causal inference. Our findings suggest EA may have important effects on alcohol consumption patterns and may provide potential mechanisms explaining reported associations between EA and adverse health outcomes.


2020 ◽  
Author(s):  
Jingshu Wang ◽  
Qingyuan Zhao ◽  
Jack Bowden ◽  
Gilbran Hemani ◽  
George Davey Smith ◽  
...  

Over a decade of genome-wide association studies have led to the finding that significant genetic associations tend to spread across the genome for complex traits. The extreme polygenicity where "all genes affect every complex trait" complicates Mendelian Randomization studies, where natural genetic variations are used as instruments to infer the causal effect of heritable risk factors. We reexamine the assumptions of existing Mendelian Randomization methods and show how they need to be clarified to allow for pervasive horizontal pleiotropy and heterogeneous effect sizes. We propose a comprehensive framework GRAPPLE (Genome-wide mR Analysis under Pervasive PLEiotropy) to analyze the causal effect of target risk factors with heterogeneous genetic instruments and identify possible pleiotropic patterns from data. By using summary statistics from genome-wide association studies, GRAPPLE can efficiently use both strong and weak genetic instruments, detect the existence of multiple pleiotropic pathways, adjust for confounding risk factors, and determine the causal direction. With GRAPPLE, we analyze the effect of blood lipids, body mass index, and systolic blood pressure on 25 disease outcomes, gaining new information on their causal relationships and the potential pleiotropic pathways.


2015 ◽  
Author(s):  
Guo-Bo Chen ◽  
Sang Hong Lee ◽  
Matthew R Robinson ◽  
Maciej Trzaskowski ◽  
Zhi-Xiang Zhu ◽  
...  

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.


2017 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
Albert Tenesa

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.


2020 ◽  
Author(s):  
Lanlan Chen ◽  
Aowen Tian ◽  
Zhipeng Liu ◽  
Miaoran Zhang ◽  
Xingchen Pan ◽  
...  

ABSTRACTBackgroundIt remains controversial whether daytime napping is beneficial for human health.ObjectiveTo examine the causal relationship between daytime napping and the risk for various human diseases.DesignPhenotype-wide Mendelian randomization study.SettingNon-UK Biobank cohorts reported in published genome-wide association studies (GWAS) provided the outcome phenotypes in the discovery stage. The UK Biobank cohort provided the outcome phenotypes in the validation stage.ParticipantsThe UK Biobank GWAS included 361,194 European-ancestry residents in the UK. Non-UKBB GWAS included various numbers of participants.ExposureSelf-reported daytime napping frequency.Main outcome measureA wide-spectrum of human health outcomes including obesity, major depressive disorder, and high cholesterol.MethodsWe examined the causal relationship between daytime napping frequency in the UK Biobank as exposure and a panel of 1,146 health outcomes reported in genome-wide association studies (GWAS), using a two-sample Mendelian randomization analysis. The significant findings were further validated in the UK Biobank health outcomes of 4,203 human traits and diseases. The causal effects were estimated using a fixed-effect inverse variance weighted model. MR-Egger intercept test was applied to detect horizontal pleiotropy, along with Cochran’s Q test to assess heterogeneity among the causal effects of IVs.FindingsThere were significant causal relationships between daytime napping frequency and a wide spectrum of human health outcomes. In particular, we validated that frequent daytime napping increased the risks of major depressive disorder, obesity and abnormal lipid profile.InterpretationThe current study showed that frequent daytime napping mainly had adverse impacts on physical and mental health. Cautions should be taken for health recommendations on daytime napping. Further studies are necessary to precisely define the best daytime napping strategies.


Author(s):  
Venexia M Walker ◽  
Sean Harrison ◽  
Alice R Carter ◽  
Dipender Gill ◽  
Ioanna Tzoulaki ◽  
...  

Introduction: Genome-wide association studies (GWASs) often adjust for covariates, correct for medication use, or select on medication users. If these summary statistics are used in two-sample Mendelian randomization analyses, estimates may be biased. We used simulations to investigate how GWAS adjustment, correction and selection affects these estimates and performed an analysis in UK Biobank to provide an empirical example. Methods: We simulated six GWASs: no adjustment for a covariate, correction for medication use, or selection on medication users; adjustment only; selection only; correction only; both adjustment and selection; and both adjustment and correction. We then ran two-sample Mendelian randomization analyses using these GWASs to evaluate bias. We also performed equivalent GWASs using empirical data from 318,147 participants in UK Biobank with systolic blood pressure as the exposure and body mass index as the covariate and ran two-sample Mendelian randomization with coronary heart disease as the outcome. Results: The simulation showed that estimates from GWASs with selection can produce biased two-sample Mendelian randomization estimates. Yet, we observed relatively little difference between empirical estimates of the effect of systolic blood pressure on coronary artery disease across the six scenarios. Conclusions: Given the potential for bias from using GWASs with selection on Mendelian randomization estimates demonstrated in our simulation, and the reduced sample size of these GWAS, this approach should be deprioritized. However, based on our empirical results, using adjusted, corrected or selected GWASs is unlikely to make a large difference to two-sample Mendelian randomization estimates in practice.


2021 ◽  
Author(s):  
Cancan Li ◽  
Mingyun Niu ◽  
Zheng Guo ◽  
Pengcheng Liu ◽  
Yulu Zheng ◽  
...  

Abstract Background Tea consumption is considered as a protective factor for obesity. This study aimed to verify the casual association between tea consumption and obesity through a two-sample Mendelian randomization (MR) analysis in general population-based datasets. Methods The genetic instruments, single nucleotide polymorphisms (SNPs) associated with tea consumption habits, were obtained from genome-wide association studies (GWAS): UK Biobank, Nurses’ Health Study, Health Professionals Follow-up Study and Women’s Genome Health Study. The effect of the genetic instruments on obesity was analyzed using UK Biobank dataset (among ~ 500,000 participants). The causal relationship between tea consumption and obesity risk was analyzed by five methods of MR analyses: inverse variance weighted (IVW) method, MR-Egger regression method, weighted median estimator (WME), weighted mode and simple mode. Results Ninety-one SNPs were identified as genetic instruments in our study. A significant result was observed in IVW analysis (odds ratio [OR] = 0.998, 95% confidence interval [CI] = 0.996 to 1.000, P = 0.049]), which is the commonly used approach of two-sample MR analysis. Conclusion Our findings evidenced a mild causal relationship between tea consumption and the decreased risk for obesity. Further studies are needed to clarify the effects of tea consumption on obesity-related health problems in detail.


2021 ◽  
Vol 6 ◽  
pp. 103
Author(s):  
Venexia Walker ◽  
Sean Harrison ◽  
Alice Carter ◽  
Dipender Gill ◽  
Ioanna Tzoulaki ◽  
...  

Introduction: Genome-wide association studies (GWASs) often adjust for covariates, correct for medication use, or select on medication users. If these summary statistics are used in two-sample Mendelian randomization analyses, estimates may be biased. We used simulations to investigate how GWAS adjustment, correction and selection affects these estimates and performed an analysis in UK Biobank to provide an empirical example. Methods: We simulated six GWASs: no adjustment for a covariate, correction for medication use, or selection on medication users; adjustment only; selection only; correction only; both adjustment and selection; and both adjustment and correction. We then ran two-sample Mendelian randomization analyses using these GWASs to evaluate bias. We also performed equivalent GWASs using empirical data from 306,560 participants in UK Biobank with systolic blood pressure as the exposure and body mass index as the covariate and ran two-sample Mendelian randomization with coronary heart disease as the outcome. Results: The simulation showed that estimates from GWASs with selection can produce biased two-sample Mendelian randomization estimates. Yet, we observed relatively little difference between empirical estimates of the effect of systolic blood pressure on coronary artery disease across the six scenarios. Conclusions: Given the potential for bias from using GWASs with selection on Mendelian randomization estimates demonstrated in our simulation, careful consideration before using this approach is warranted. However, based on our empirical results, using adjusted, corrected or selected GWASs is unlikely to make a large difference to two-sample Mendelian randomization estimates in practice.


2021 ◽  
Vol 10 ◽  
pp. 204800402110236
Author(s):  
Julia Ramírez ◽  
Stefan van Duijvenboden ◽  
William J Young ◽  
Michele Orini ◽  
Aled R Jones ◽  
...  

The electrocardiogram (ECG) is a commonly used clinical tool that reflects cardiac excitability and disease. Many parameters are can be measured and with the improvement of methodology can now be quantified in an automated fashion, with accuracy and at scale. Furthermore, these measurements can be heritable and thus genome wide association studies inform the underpinning biological mechanisms. In this review we describe how we have used the resources in UK Biobank to undertake such work. In particular, we focus on a substudy uniquely describing the response to exercise performed at scale with accompanying genetic information.


2021 ◽  
Vol 22 (11) ◽  
pp. 6083
Author(s):  
Aintzane Rueda-Martínez ◽  
Aiara Garitazelaia ◽  
Ariadna Cilleros-Portet ◽  
Sergi Marí ◽  
Rebeca Arauzo ◽  
...  

Endometriosis is a common gynecological disorder that has been associated with endometrial, breast and epithelial ovarian cancers in epidemiological studies. Since complex diseases are a result of multiple environmental and genetic factors, we hypothesized that the biological mechanism underlying their comorbidity might be explained, at least in part, by shared genetics. To assess their potential genetic relationship, we performed a two-sample mendelian randomization (2SMR) analysis on results from public genome-wide association studies (GWAS). This analysis confirmed previously reported genetic pleiotropy between endometriosis and endometrial cancer. We present robust evidence supporting a causal genetic association between endometriosis and ovarian cancer, particularly with the clear cell and endometrioid subtypes. Our study also identified genetic variants that could explain those associations, opening the door to further functional experiments. Overall, this work demonstrates the value of genomic analyses to support epidemiological data, and to identify targets of relevance in multiple disorders.


Sign in / Sign up

Export Citation Format

Share Document