scholarly journals Inferring Causal Direction Between Two Traits in the Presence of Horizontal Pleiotropy with GWAS Summary Data

2020 ◽  
Author(s):  
Haoran Xue ◽  
Wei Pan

AbstractOrienting the causal relationship between pairs of traits is a fundamental task in scientific research with significant implications in practice, such as in prioritizing molecular targets and modifiable risk factors for developing therapeutic and interventional strategies for complex diseases. A recent method, called Steiger’s method, using a single SNP as an instrument variable (IV) in the framework of Mendelian randomization (MR), has since been widely applied. We report the following new contributions. First, we propose a single SNP-based alternative, overcoming a severe limitation of Steiger’s method in simply assuming, instead of inferring, the existence of a causal relationship. We also clarify a condition necessary for the validity of the methods in the presence of hidden confounding. Second, to improve statistical power, we propose combining the results from multiple, and possibly correlated, SNPs. as multiple instruments. Third, we develop three goodness-of-fit tests to check modeling assumptions, including those required for valid IVs. Fourth, by relaxing one of the three IV assumptions in MR, we propose methods, including one Egger regression-like approach and its multivariable version (analogous to multivariable MR), to account for horizontal pleiotropy of the SNPs/IVs, which is often unavoidable in practice. All our methods can simultaneously infer both the existence and (if so) the direction of a causal relationship, largely expanding their applicability over that of Steiger’s method. Although we focus on uni-directional causal relationships, we also briefly discuss an extension to bi-directional relationships. Through extensive simulations and an application to infer the causal directions between low density lipoprotein (LDL) cholesterol, or high density lipoprotein (HDL) cholesterol, and coronary artery disease (CAD), we demonstrate the superior performance and advantage of our proposed methods over Steiger’s method and bi-directional MR. In particular, after accounting for horizontal pleiotropy, our method confirmed the well known causal direction from LDL to CAD, while other methods, including bi-directional MR, failed.Author SummaryIn spite of its importance, due to technical challenges, orienting causal relationships between pairs of traits has been largely under-studied. Mendelian randomization (MR) Steiger’s method has become increasingly used in the last two years. Here we point out several limitations with MR Steiger’s method and propose alternative approaches. First, MR Steiger’s method is based on using only one single SNP as the instrument variable (IV), for which we propose a correlation ratio-based method, called Causal Direction-Ratio, or simply CD-Ratio. An advantage of CD-Ratio is its inference of both the existence and (if so) the direction of a causal relationship, in contrast to MR Steiger’s prior assumption of the existence and its poor performance if the assumption is violated. Furthermore, CD-Ratio can be extended to combine the results from multiple, possibly correlated, SNPs with improved statistical power. Second, we propose two methods, called CD-Egger and CD-GLS, for multiple and possibly correlated SNPs while allowing horizontal pleiotropy. Third, we propose three goodness-of-fit tests to check modeling assumptions for the three proposed methods. Finally, we introduce multivariable CD-Egger, analogous to multivariable MR, as a more robust approach, and an extension of CD-Ratio to cases with possibly bi-directional causal relationships. Our numerical studies demonstrated superior performance of our proposed methods over MR Steiger and bi-directional MR. Our proposed methods, along with freely available software, are expected to be useful in practice for causal inference.

PLoS Genetics ◽  
2020 ◽  
Vol 16 (11) ◽  
pp. e1009105
Author(s):  
Haoran Xue ◽  
Wei Pan

Orienting the causal relationship between pairs of traits is a fundamental task in scientific research with significant implications in practice, such as in prioritizing molecular targets and modifiable risk factors for developing therapeutic and interventional strategies for complex diseases. A recent method, called Steiger’s method, using a single SNP as an instrument variable (IV) in the framework of Mendelian randomization (MR), has since been widely applied. We report the following new contributions. First, we propose a single SNP-based alternative, overcoming a severe limitation of Steiger’s method in simply assuming, instead of inferring, the existence of a causal relationship. We also clarify a condition necessary for the validity of the methods in the presence of hidden confounding. Second, to improve statistical power, we propose combining the results from multiple, and possibly correlated, SNPs as multiple instruments. Third, we develop three goodness-of-fit tests to check modeling assumptions, including those required for valid IVs. Fourth, by relaxing one of the three IV assumptions in MR, we propose several methods, including an Egger regression-like approach and its multivariable version (analogous to multivariable MR), to account for horizontal pleiotropy of the SNPs/IVs, which is often unavoidable in practice. All our methods can simultaneously infer both the existence and (if so) the direction of a causal relationship, largely expanding their applicability over that of Steiger’s method. Although we focus on uni-directional causal relationships, we also briefly discuss an extension to bi-directional relationships. Through extensive simulations and an application to infer the causal directions between low density lipoprotein (LDL) cholesterol, or high density lipoprotein (HDL) cholesterol, and coronary artery disease (CAD), we demonstrate the superior performance and advantage of our proposed methods over Steiger’s method and bi-directional MR. In particular, after accounting for horizontal pleiotropy, our method confirmed the well known causal direction from LDL to CAD, while other methods, including bi-directional MR, might fail.


2022 ◽  
Vol 12 ◽  
Author(s):  
Chenglin Duan ◽  
Jingjing Shi ◽  
Guozhen Yuan ◽  
Xintian Shou ◽  
Ting Chen ◽  
...  

Background: Traditional observational studies have demonstrated an association between heart failure and Alzheimer’s disease. The strengths of observational studies lie in their speed of implementation, cost, and applicability to rare diseases. However, observational studies have several limitations, such as uncontrollable confounders. Therefore, we employed Mendelian randomization of genetic variants to evaluate the causal relationships existing between AD and HF, which can avoid these limitations.Materials and Methods: A two-sample bidirectional MR analysis was employed. All datasets were results from the UK’s Medical Research Council Integrative Epidemiology Unit genome-wide association study database, and we conducted a series of control steps to select the most suitable single-nucleotide polymorphisms for MR analysis, for which five primary methods are offered. We reversed the functions of exposure and outcomes to explore the causal direction of HF and AD. Sensitivity analysis was used to conduct several tests to avoid heterogeneity and pleiotropic bias in the MR results.Results: Our MR studies did not support a meaningful causal relationship between AD on HF (MR-Egger, p = 0.634 > 0.05; weighted median (WM), p = 0.337 > 0.05; inverse variance weighted (IVW), p = 0.471 > 0.05; simple mode, p = 0.454 > 0.05; weighted mode, p = 0.401 > 0.05). At the same time, we did not find a significant causal relationship between HF and AD with four of the methods (MR-Egger, p = 0.195 > 0.05; IVW, p = 0.0879 > 0.05; simple mode, p = 0.170 > 0.05; weighted mode, p = 0.110 > 0.05), but the WM method indicated a significant effect of HF on AD (p = 0.025 < 0.05). Because the statistical powers of IVW and MR-Egger are more than that of WM, we think that there is no causal effect of HF on AD. Sensitivity analysis and horizontal pleiotropy were not detected in the MR analysis.Conclusion: Our results did not provide significant evidence indicating any causal relationships between HF and AD in the European population. Therefore, more large-scale datasets or datasets related to similar factors are expected for further MR analysis.


2019 ◽  
Author(s):  
Adriaan van der Graaf ◽  
Annique Claringbould ◽  
Antoine Rimbert ◽  
Harm-Jan Westra ◽  
Yang Li ◽  
...  

AbstractRobust inference of causal relationships between gene expression and complex traits using Mendelian Randomization (MR) approaches is confounded by pleiotropy and linkage disequilibrium (LD) between gene expression quantitative loci (eQTLs). Here we propose a new MR method, MR-link, that accounts for unobserved pleiotropy and LD by leveraging information from individual-level data. In simulations, MR-link shows false positive rates close to expectation (median 0.05) and high power (up to 0.89), outperforming all other MR methods we tested, even when only one eQTL variant is present. Application of MR-link to low-density lipoprotein cholesterol (LDL-C) measurements in 12,449 individuals and eQTLs summary statistics from whole blood and liver identified 19 genes causally linked to LDL-C. These include the previously functionally validatedSORT1gene, and thePVRL2gene, located in theAPOElocus, for which a causal role in liver was yet unknown. Our results showcase the strength of MR-link for transcriptome-wide causal inferences.


2019 ◽  
Vol 48 (5) ◽  
pp. 1478-1492 ◽  
Author(s):  
Qingyuan Zhao ◽  
Yang Chen ◽  
Jingshu Wang ◽  
Dylan S Small

Abstract Background Summary-data Mendelian randomization (MR) has become a popular research design to estimate the causal effect of risk exposures. With the sample size of GWAS continuing to increase, it is now possible to use genetic instruments that are only weakly associated with the exposure. Development We propose a three-sample genome-wide design where typically 1000 independent genetic instruments across the whole genome are used. We develop an empirical partially Bayes statistical analysis approach where instruments are weighted according to their strength; thus weak instruments bring less variation to the estimator. The estimator is highly efficient with many weak genetic instruments and is robust to balanced and/or sparse pleiotropy. Application We apply our method to estimate the causal effect of body mass index (BMI) and major blood lipids on cardiovascular disease outcomes, and obtain substantially shorter confidence intervals (CIs). In particular, the estimated causal odds ratio of BMI on ischaemic stroke is 1.19 (95% CI: 1.07–1.32, P-value <0.001); the estimated causal odds ratio of high-density lipoprotein cholesterol (HDL-C) on coronary artery disease (CAD) is 0.78 (95% CI: 0.73–0.84, P-value <0.001). However, the estimated effect of HDL-C attenuates and become statistically non-significant when we only use strong instruments. Conclusions A genome-wide design can greatly improve the statistical power of MR studies. Robust statistical methods may alleviate but not solve the problem of horizontal pleiotropy. Our empirical results suggest that the relationship between HDL-C and CAD is heterogeneous, and it may be too soon to completely dismiss the HDL hypothesis.


2012 ◽  
Vol 60 (6) ◽  
pp. 381 ◽  
Author(s):  
Evan Watkins ◽  
Julian Di Stefano

Hypotheses relating to the annual frequency distribution of mammalian births are commonly tested using a goodness-of-fit procedure. Several interacting factors influence the statistical power of these tests, but no power studies have been conducted using scenarios derived from biological hypotheses. Corresponding to theories relating reproductive output to seasonal resource fluctuation, we simulated data reflecting a winter reduction in birth frequency to test the effect of four factors (sample size, maximum effect size, the temporal pattern of response and the number of categories used for analysis) on the power of three goodness-of-fit procedures – the G and Chi-square tests and Watson’s U2 test. Analyses resulting in high power all had a large maximum effect size (60%) and were associated with a sample size of 200 on most occasions. The G-test was the most powerful when data were analysed using two temporal categories (winter and other) while Watson’s U2 test achieved the highest power when 12 monthly categories were used. Overall, the power of most modelled scenarios was low. Consequently, we recommend using power analysis as a research planning tool, and have provided a spreadsheet enabling a priori power calculations for the three tests considered.


2018 ◽  
Author(s):  
Hon-Cheong So ◽  
Carlos Kwan-long Chau ◽  
Yu-ying Cheng ◽  
Pak C. Sham

AbstractBackgroundThe etiology of depression remains poorly understood. Changes in blood lipid levels were reported to be associated with depression and suicide, however study findings were mixed.MethodsWe performed a two-sample Mendelian randomization (MR) analysis to investigate the causal relationship between blood lipids and depression phenotypes, based on large-scale GWAS summary statistics (N=188,577/480,359 for lipid/depression traits respectively). Five depression-related phenotypes were included, namely major depressive disorder (MDD; from PGC), depressive symptoms (DS; from SSGAC), longest duration and number of episodes of low mood, and history of deliberate self-harm (DSH)/suicide (from UK Biobank). MR was conducted with inverse-variance weighted (MR-IVW), Egger and Generalized Summary-data-based MR(GSMR) methods.ResultsThere was consistent evidence that triglyceride (TG) is causally associated with DS (MR-IVW beta for one-SD increase in TG=0.0346, 95% CI=0.0114-0.0578), supported by MR-IVW and GSMR and multiple r2 clumping thresholds. We also observed relatively consistent associations of TG with DSH/suicide (MR-Egger OR= 2.514, CI: 1.579-4.003). There was moderate evidence for positive associations of TG with MDD and the number of episodes of low mood. For HDL-c, we observed moderate evidence for causal associations with DS and MDD. LDL-c and TC did not show robust causal relationships with depression phenotypes, except for weak evidence that LDL-c is inversely related to DSH/suicide. We did not detect significant associations when depression phenotypes were treated as exposures.ConclusionsThis study provides evidence to a causal relationship between TG, and to a lesser extent, altered cholesterol levels with depression phenotypes. Further studies on its mechanistic basis and the effects of lipid-lowering therapies are warranted.


2019 ◽  
Vol 105 (3) ◽  
pp. 908-919
Author(s):  
Chia-Ni Hsiung ◽  
Yi-Cheng Chang ◽  
Chien-Wei Lin ◽  
Chia-Wei Chang ◽  
Wen-Cheng Chou ◽  
...  

Abstract Context The association between circulating triglyceride (TG) and glycated hemoglobin A1c (HbA1c), a biomarker for type 2 diabetes, has been widely addressed, but the causal direction of the relationship is still ambiguous. Objective To confirm the causal relationship between TG and HbA1c by using bidirectional and 2-step Mendelian randomization (MR) approaches. Methods We carried out a bidirectional MR approach using the summarized results from the public database to examine any potential causal effects between serum TG and HbA1c in 16 000 individuals of the Taiwan Biobank cohort. We used the MR estimate and the MR inverse variance–weighted method to reveal that relationship between TG and HbA1c. To further determine whether the DNA methylation at specific sequences mediate the causal pathway between TG and HbA1c, using the 2-step MR approach. Results We identified that a single-unit increase in TG measured via log transformation of mg/dL data was associated with a significant increase of 10 units of HbA1c (95% CI = 1.05−18.95, P = 0.029). In contrast, the genetic determinants of HbA1c do not contribute to the amount of circulating TG (beta = 1.75, 95% CI = –11.50 to 14.90). Sensitivity analyses, included the weighted-median approach and MR-Egger regression, were performed to confirm no pleiotropic effect among these instrumental variables. Furthermore, we identified the genetic variant, rs1823200, is associated with both methylation of the CpG site adjacent to CADPS gene and HbA1c level. Conclusion Our study suggests that higher circulating TG can have an affect on genomic methylation status, ultimately causing elevated level of circulating HbA1c.


2019 ◽  
Author(s):  
Kelsey E. Johnson ◽  
Katherine M. Siewert ◽  
Derek Klarin ◽  
Scott M. Damrauer ◽  
Kyong-Mi Chang ◽  
...  

AbstractObjectiveTo assess a potential causal relationship between genetic variants associated with plasma lipid traits (high-density lipoprotein cholesterol, HDL; low-density lipoprotein cholesterol, LDL; triglycerides, TG) with risk for breast cancer.DesignMendelian randomization (MR) study.Setting and ParticipantsData from genome-wide association studies in up to 215,551 subjects from the Million Veterans Project were used to construct genetic instruments for plasma lipid traits. The effect of these instruments on breast cancer risk was evaluated using genetic data from the BCAC consortium based on 122,977 breast cancer cases and 105,974 controls.ExposuresGenetically modified plasma levels of LDL, HDL, or TG.Main Outcomes and MeasuresOdds ratio (OR) for breast cancer risk per standard-deviation increase in HDL, LDL, or TG.ResultsWe observed that a 1-SD genetically determined increase in HDL levels is associated with an increased risk for all breast cancers (HDL: OR=1.08, 95% CI=1.04-1.13, P=7.4×10−5).Multivariable MR analysis, which adjusted for the effects of LDL, TG, body mass index, and age at menarche, corroborated this observation for HDL (OR=1.06, 95% CI=1.03-1.10, P=4.9×10−4) and also a relationship between LDL and breast cancer risk (OR=1.03, 95% CI=1.01-1.07, P=0.02). We did not observe a difference in these relationships when stratified by breast tumor estrogen receptor status. We repeated this analysis using genetic variants independent of the leading association at core HDL pathway genes and found that these variants were also associated with risk for breast cancers (OR=1.11, 95% CI=1.06–1.16, P=1.5×10−6), including gene-specific associations at ABCA1, APOE-APOC1-APOC4-APOC2 and CETP. In addition, we find evidence that genetic variation at the ABO locus affects both lipid levels and breast cancer.ConclusionsGenetically elevated plasma HDL levels appear to increase breast cancer risk. Future studies are required to understand the mechanism underlying this putative causal relationship, with the goal to develop potential therapeutic strategies aimed at altering the HDL-mediated effect on breast cancer risk.


2018 ◽  
Author(s):  
Ping Zeng ◽  
Xiang Zhou

AbstractAmyotrophic lateral sclerosis (ALS) is a late-onset fatal neurodegenerative disorder that is predicted to increase across the globe by ~70% in the following decades. Understanding the disease causal mechanism underlying ALS and identifying modifiable risks factors for ALS hold the key for the development of effective preventative and treatment strategies. Here, we investigate the causal effects of four blood lipid traits that include high density lipoprotein (HDL), low density lipoprotein (LDL), total cholesterol (TC), and triglycerides (TG) on the risk of ALS. By leveraging instrument variables from multiple large-scale genome-wide association studies in both European and East Asian populations, we carry out one of the largest and most comprehensive Mendelian randomization analyses performed to date on the causal relationship between lipids and ALS. Among the four lipids, we found that only LDL is causally associated with ALS and that higher LDL level increases the risk of ALS in both the European and East Asian populations. Specifically, the odds ratio of ALS per one standard deviation (i.e. 39.0 mg/dL) increase of LDL is estimated to be 1.14 (95% CI 1.05 - 1.24, p = 1.38E-3) in the European and population and 1.06 (95% CI 1.00 - 1.12, p = 0.044) in the East Asian population. The identified causal relationship between LDL and ALS is robust with respect to the choice of statistical methods and is validated through extensive sensitivity analyses that guard against various model assumption violations. Our study provides important evidence supporting the causal role of higher LDL on increasing the risk of ALS, paving ways for the development of preventative strategies for reducing the disease burden of ALS across multiple nations.


Sign in / Sign up

Export Citation Format

Share Document