Exploiting collider bias to apply two-sample summary data Mendelian randomization methods to one-sample individual level data

Over the last decade the availability of SNP-trait associations from genome-wide association studies has led to an array of methods for performing Mendelian randomization studies using only summary statistics. A common feature of these methods, besides their intuitive simplicity, is the ability to combine data from several sources, incorporate multiple variants and account for biases due to weak instruments and pleiotropy. With the advent of large and accessible fully-genotyped cohorts such as UK Biobank, there is now increasing interest in understanding how best to apply these well developed summary data methods to individual level data, and to explore the use of more sophisticated causal methods allowing for non-linearity and effect modification. In this paper we describe a general procedure for optimally applying any two sample summary data method using one sample data. Our procedure first performs a meta-analysis of summary data estimates that are intentionally contaminated by collider bias between the genetic instruments and unmeasured confounders, due to conditioning on the observed exposure. These estimates are then used to correct the standard observational association between an exposure and outcome. Simulations are conducted to demonstrate the method’s performance against naive applications of two sample summary data MR. We apply the approach to the UK Biobank cohort to investigate the causal role of sleep disturbance on HbA1c levels, an important determinant of diabetes. Our approach can be viewed as a generalization of Dudbridge et al. (Nat. Comm. 10: 1561), who developed a technique to adjust for index event bias when uncovering genetic predictors of disease progression based on case-only data. Our work serves to clarify that in any one sample MR analysis, it can be advantageous to estimate causal relationships by artificially inducing and then correcting for collider bias.

Download Full-text

Exploiting collider bias to apply two-sample summary data Mendelian randomization methods to one-sample individual level data

10.1101/2020.10.20.20216358 ◽

2020 ◽

Author(s):

Ciarrah Barry ◽

Junxi Liu ◽

Rebecca Richmond ◽

Martin K Rutter ◽

Deborah A Lawlor ◽

...

Keyword(s):

Mendelian Randomization ◽

Association Studies ◽

General Procedure ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Individual Level ◽

Level Data ◽

Summary Data ◽

Collider Bias

AbstractOver the last decade the availability of SNP-trait associations from genome-wide association studies data has led to an array of methods for performing Mendelian randomization studies using only summary statistics. A common feature of these methods, besides their intuitive simplicity, is the ability to combine data from several sources, incorporate multiple variants and account for biases due to weak instruments and pleiotropy. With the advent of large and accessible fully-genotyped cohorts such as UK Biobank, there is now increasing interest in understanding how best to apply these well developed summary data methods to individual level data, and to explore the use of more sophisticated causal methods allowing for non-linearity and effect modification.In this paper we describe a general procedure for optimally applying any two sample summary data method using one sample data. Our procedure first performs a meta-analysis of summary data estimates that are intentionally contaminated by collider bias between the genetic instruments and unmeasured confounders, due to conditioning on the observed exposure. A weighted sum of these estimates is then used to correct the standard observational association between an exposure and outcome. Simulations are conducted to demonstrate the method’s performance against naive applications of two sample summary data MR. We apply the approach to the UK Biobank cohort to investigate the causal role of sleep disturbance on HbA1c levels, an important determinant of diabetes.Our approach is closely related to the work of Dudbridge et al. (Nat. Comm. 10: 1561), who developed a technique to adjust for index event bias when uncovering genetic predictors of disease progression based on case-only data. Our paper serves to clarify that in any one sample MR analysis, it can be advantageous to estimate causal relationships by artificially inducing and then correcting for collider bias.

Download Full-text

Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction

10.1101/2020.11.27.401141 ◽

2020 ◽

Author(s):

Clara Albiñana ◽

Jakob Grove ◽

John J. McGrath ◽

Esben Agerbo ◽

Naomi R. Wray ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Training Sample ◽

Risk Scores ◽

Large Individual ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Uk Biobank ◽

Individual Level ◽

Level Data

AbstractThe accuracy of polygenic risk scores (PRSs) to predict complex diseases increases with the training sample size. PRSs are generally derived based on summary statistics from large meta-analyses of multiple genome-wide association studies (GWAS). However, it is now common for researchers to have access to large individual-level data as well, such as the UK biobank data. To the best of our knowledge, it has not yet been explored how to best combine both types of data (summary statistics and individual-level data) to optimize polygenic prediction. The most widely used approach to combine data is the meta-analysis of GWAS summary statistics (Meta-GWAS), but we show that it does not always provide the most accurate PRS. Through simulations and using twelve real case-control and quantitative traits from both iPSYCH and UK Biobank along with external GWAS summary statistics, we compare Meta-GWAS with two alternative data-combining approaches, stacked clumping and thresholding (SCT) and Meta-PRS. We find that, when large individual-level data is available, the linear combination of PRSs (Meta-PRS) is both a simple alternative to Meta-GWAS and often more accurate.

Download Full-text

Estimating genetic correlation jointly using individual-level and summary-level GWAS data

10.1101/2021.08.18.456908 ◽

2021 ◽

Author(s):

Yiliang Zhang ◽

Youshu Cheng ◽

Yixuan Ye ◽

Wei Jiang ◽

Qiongshi Lu ◽

...

Keyword(s):

Genetic Correlation ◽

Association Studies ◽

Real Data ◽

Efficient Estimation ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Individual Level ◽

Correlation Estimation ◽

Level Data ◽

Summary Data

AbstractWith the increasing accessibility of individual-level data from genome wide association studies, it is now common for researchers to have individual-level data of some traits in one specific population. For some traits, we can only access public released summary-level data due to privacy and safety concerns. The current methods to estimate genetic correlation can only be applied when the input data type of the two traits of interest is either both individual-level or both summary-level. When researchers have access to individual-level data for one trait and summary-level data for the other, they have to transform the individual-level data to summary-level data first and then apply summary data-based methods to estimate the genetic correlation. This procedure is computationally and statistically inefficient and introduces information loss. We introduce GENJI (Genetic correlation EstimatioN Jointly using Individual-level and summary data), a method that can estimate within-population or transethnic genetic correlation based on individual-level data for one trait and summary-level data for another trait. Through extensive simulations and analyses of real data on within-population and transethnic genetic correlation estimation, we show that GENJI produces more reliable and efficient estimation than summary data-based methods. Besides, when individual-level data are available for both traits, GENJI can achieve comparable performance than individual-level data-based methods. Downstream applications of genetic correlation can benefit from more accurate estimates. In particular, we show that more accurate genetic correlation estimation facilitates the predictability of cross-population polygenic risk scores.

Download Full-text

Improved polygenic prediction by Bayesian multiple regression on summary statistics

Nature Communications ◽

10.1038/s41467-019-12653-0 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 34

Author(s):

Luke R. Lloyd-Jones ◽

Jian Zeng ◽

Julia Sidorenko ◽

Loïc Yengo ◽

Gerhard Moser ◽

...

Keyword(s):

Multiple Regression ◽

Association Studies ◽

Meta Analysis ◽

Multiple Regression Model ◽

Data Sets ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Level Data ◽

The Uk

Abstract Accurate prediction of an individual’s phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n ≈ 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.

Download Full-text

Estimating genetic correlation jointly using individual-level and summary-level GWAS data

10.21203/rs.3.rs-830770/v1 ◽

2021 ◽

Author(s):

Hongyu Zhao ◽

Yiliang Zhang ◽

Youshu Cheng ◽

Yixuan Ye ◽

Wei Jiang ◽

...

Keyword(s):

Genetic Correlation ◽

Association Studies ◽

Real Data ◽

Efficient Estimation ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Individual Level ◽

Correlation Estimation ◽

Level Data ◽

Summary Data

Abstract With the increasing accessibility of individual-level data from genome wide association studies, it is now common for researchers to have individual-level data of some traits in one specific population. For some traits, we can only access public released summary-level data due to privacy and safety concerns. The current methods to estimate genetic correlation can only be applied when the input data type of the two traits of interest is either both individual-level or both summary-level. When researchers have access to individual-level data for one trait and summary-level data for the other, they have to transform the individual-level data to summary-level data first and then apply summary data-based methods to estimate the genetic correlation. This procedure is computationally and statistically inefficient and introduces information loss. We introduce GENJI (Genetic correlation EstimatioN Jointly using Individual-level and summary data), a method that can estimate within-population or transethnic genetic correlation based on individual-level data for one trait and summary-level data for another trait. Through extensive simulations and analyses of real data on within-population and transethnic genetic correlation estimation, we show that GENJI produces more reliable and efficient estimation than summary data-based methods. Besides, when individual-level data are available for both traits, GENJI can achieve comparable performance than individual-level data-based methods. Downstream applications of genetic correlation can benefit from more accurate estimates. In particular, we show that more accurate genetic correlation estimation facilitates the predictability of cross-population polygenic risk scores.

Download Full-text

Life Course Adiposity and Alzheimer’s Disease: A Mendelian Randomization Study

Journal of Alzheimer s Disease ◽

10.3233/jad-210345 ◽

2021 ◽

pp. 1-10

Author(s):

Xian Li ◽

Yan Tian ◽

Yu-Xiang Yang ◽

Ya-Hui Ma ◽

Xue-Ning Shen ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Life Course ◽

Mendelian Randomization ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Fat Percentage ◽

Regression Methods ◽

Weighted Median

Background: Several studies showed that life course adiposity was associated with Alzheimer’s disease (AD). However, the underlying causality remains unclear. Objective: We aimed to examine the causal relationship between life course adiposity and AD using Mendelian randomization (MR) analysis. Methods: Instrumental variants were obtained from large genome-wide association studies (GWAS) for life course adiposity, including birth weight (BW), childhood body mass index (BMI), adult BMI, waist circumference (WC), waist-to-hip ratio (WHR), and body fat percentage (BFP). A meta-analysis of GWAS for AD including 71,880 cases and 383,378 controls was used in this study. MR analyses were performed using inverse variance weighted (IVW), weighted median, and MR-Egger regression methods. We calculated odds ratios (ORs) per genetically predicted standard deviation (1-SD) unit increase in each trait for AD. Results: Genetically predicted 1-SD increase in adult BMI was significantly associated with higher risk of AD (IVW: OR = 1.03, 95% confidence interval [CI] = 1.01–1.05, p = 2.7×10–3) after Bonferroni correction. The weighted median method indicated a significant association between BW and AD (OR = 0.94, 95% CI = 0.90–0.98, p = 1.8×10–3). We also found suggestive associations of AD with WC (IVW: OR = 1.03, 95% CI = 1.00–1.07, p = 0.048) and WHR (weighted median: OR = 1.04, 95% CI = 1.00–1.07, p = 0.029). No association was detected of AD with childhood BMI and BFP. Conclusion: Our study demonstrated that lower BW and higher adult BMI had causal effects on increased AD risk.

Download Full-text

Causal effect of renal function on venous thromboembolism: a two-sample Mendelian randomization investigation

Journal of Thrombosis and Thrombolysis ◽

10.1007/s11239-021-02494-4 ◽

2021 ◽

Author(s):

Shuai Yuan ◽

Maria Bruzelius ◽

Susanna C. Larsson

Keyword(s):

Renal Function ◽

Venous Thromboembolism ◽

Mendelian Randomization ◽

Causal Effect ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genome Wide ◽

Increased Risk ◽

Mr Study

AbstractWhether renal function is causally associated with venous thromboembolism (VTE) is not yet fully elucidated. We conducted a two-sample Mendelian randomization (MR) study to determine the causal effect of renal function, measured as estimated glomerular filtration rate (eGFR), on VTE. Single-nucleotide polymorphisms associated with eGFR were selected as instrumental variables at the genome-wide significance level (p < 5 × 10−8) from a meta-analysis of 122 genome-wide association studies including up to 1,046,070 individuals. Summary-level data for VTE were obtained from the FinnGen consortium (6913 VTE cases and 169,986 non-cases) and UK Biobank study (4620 VTE cases and 356,574 non-cases). MR estimates were calculated using the random-effects inverse-variance weighted method and combined using fixed-effects meta-analysis. Genetically predicted decreased eGFR was significantly associated with an increased risk of VTE in both FinnGen and UK Biobank. For one-unit decrease in log-transformed eGFR, the odds ratios of VTE were 2.93 (95% confidence interval (CI) 1.25, 6.84) and 4.46 (95% CI 1.59, 12.5) when using data from FinnGen and UK Biobank, respectively. The combined odds ratio was 3.47 (95% CI 1.80, 6.68). Results were consistent in all sensitivity analyses and no horizontal pleiotropy was detected. This MR-study supported a casual role of impaired renal function in VTE.

Download Full-text

Cigarette smoking and personality: interrogating causality using Mendelian randomisation

Psychological Medicine ◽

10.1017/s0033291718003069 ◽

2018 ◽

Vol 49 (13) ◽

pp. 2197-2205 ◽

Cited By ~ 1

Author(s):

Hannah M. Sallis ◽

George Davey Smith ◽

Marcus R. Munafò

Keyword(s):

Personality Traits ◽

Association Studies ◽

Smoking Initiation ◽

Mendelian Randomisation ◽

Genome Wide Association Studies ◽

Individual Level ◽

Causal Pathways ◽

Genome Wide ◽

Level Data ◽

Causal Nature

AbstractBackgroundDespite the well-documented association between smoking and personality traits such as neuroticism and extraversion, little is known about the potential causal nature of these findings. If it were possible to unpick the association between personality and smoking, it may be possible to develop tailored smoking interventions that could lead to both improved uptake and efficacy.MethodsRecent genome-wide association studies (GWAS) have identified variants robustly associated with both smoking phenotypes and personality traits. Here we use publicly available GWAS summary statistics in addition to individual-level data from UK Biobank to investigate the link between smoking and personality. We first estimate genetic overlap between traits using LD score regression and then use bidirectional Mendelian randomisation methods to unpick the nature of this relationship.ResultsWe found clear evidence of a modest genetic correlation between smoking behaviours and both neuroticism and extraversion. We found some evidence that personality traits are causally linked to certain smoking phenotypes: among current smokers each additional neuroticism risk allele was associated with smoking an additional 0.07 cigarettes per day (95% CI 0.02–0.12, p = 0.009), and each additional extraversion effect allele was associated with an elevated odds of smoking initiation (OR 1.015, 95% CI 1.01–1.02, p = 9.6 × 10−7).ConclusionWe found some evidence for specific causal pathways from personality to smoking phenotypes, and weaker evidence of an association from smoking initiation to personality. These findings could be used to inform future smoking interventions or to tailor existing schemes.

Download Full-text

Software application profile: mrrobust—a tool for performing two-sample summary Mendelian randomization analyses

International Journal of Epidemiology ◽

10.1093/ije/dyy195 ◽

2018 ◽

Vol 48 (3) ◽

pp. 684-690 ◽

Cited By ~ 34

Author(s):

Wes Spiller ◽

Neil M Davies ◽

Tom M Palmer

Keyword(s):

Mendelian Randomization ◽

Association Studies ◽

Estimation Methods ◽

Genome Wide Association Studies ◽

Software Application ◽

Genome Wide ◽

Inverse Variance ◽

Randomization Analysis ◽

Summary Data ◽

Weighted Estimation

Abstract Motivation In recent years, Mendelian randomization analysis using summary data from genome-wide association studies has become a popular approach for investigating causal relationships in epidemiology. The mrrobust Stata package implements several of the recently developed methods. Implementation mrrobust is freely available as a Stata package. General features The package includes inverse variance weighted estimation, as well as a range of median, modal and MR-Egger estimation methods. Using mrrobust, plots can be constructed visualizing each estimate either individually or simultaneously. The package also provides statistics such as IGX2, which are useful in assessing attenuation bias in causal estimates. Availability The software is freely available from GitHub [https://raw.github.com/remlapmot/mrrobust/master/].

Download Full-text

Population stratification in GWAS meta-analysis should be standardized to the best available reference datasets

10.1101/2020.09.03.281568 ◽

2020 ◽

Author(s):

Aliya Sarmanova ◽

Tim Morris ◽

Daniel John Lawson

Keyword(s):

Population Stratification ◽

Association Studies ◽

Meta Analysis ◽

Principal Component ◽

Underlying Structure ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

External Reference ◽

Major Disadvantage ◽

The Uk

AbstractPopulation stratification has recently been demonstrated to bias genetic studies even in relatively homogeneous populations such as within the British Isles. A key component to correcting for stratification in genome-wide association studies (GWAS) is accurately identifying and controlling for the underlying structure present in the sample. Meta-analysis across cohorts is increasingly important for achieving very large sample sizes, but comes with the major disadvantage that each individual cohort corrects for different population stratification. Here we demonstrate that correcting for structure against an external reference adds significant value to meta-analysis. We treat the UK Biobank as a collection of smaller studies, each of which is geographically localised. We provide software to standardize an external dataset against a reference, provide the UK Biobank principal component loadings for this purpose, and demonstrate the value of this with an analysis of the geographically sampled ALSPAC cohort.

Download Full-text