scholarly journals Distinguishing genetic correlation from causation across 52 diseases and complex traits

2017 ◽  
Author(s):  
Luke J. O’Connor ◽  
Alkes L. Price

AbstractMendelian randomization (MR) is widely used to identify causal relationships among heritable traits, but it can be confounded by genetic correlations reflecting shared etiology. We propose a model in which a latent causal variable mediates the genetic correlation between two traits. Under the latent causal variable (LCV) model, trait 1 is fully genetically causal for trait 2 if it is perfectly genetically correlated with the latent causal variable, implying that the entire genetic component of trait 1 is causal for trait 2; it is partially genetically causal for trait 2 if it has a high genetic correlation with the latent variable, implying that part of the genetic component of trait 1 is causal for trait 2. To quantify the degree of partial genetic causality, we define the genetic causality proportion (gcp). We fit this model using mixed fourth moments E(α1α2) and E(α1α2) of marginal effect sizes for each trait, exploiting the fact that if trait 1 is causal for trait 2 then SNPs affecting trait 1 (large ) will have correlated effects on trait 2 (large α1α2), but not vice versa. We performed simulations under a wide range of genetic architectures and determined that LCV, unlike state-of-the-art MR methods, produced well-calibrated false positive rates and reliable gcp estimates in the presence of genetic correlations and asymmetric genetic architectures; we also determined that LCV is well-powered to detect a causal effect. We applied LCV to GWAS summary statistics for 52 traits (average N=331k), identifying partially or fully genetically causal effects (1% FDR) for 59 pairs of traits, including 30 pairs of traits with high gcp estimates (gĉp > 0.6). Results consistent with the published literature included genetically causal effects on myocardial infarction (MI) for LDL, triglycerides and BMI. Novel findings included a genetically causal effect of LDL on bone mineral density, consistent with clinical trials of statins in osteoporosis. These results demonstrate that it is possible to distinguish between genetic correlation and causation using genetic data.


Author(s):  
Liza Darrous ◽  
Ninon Mounier ◽  
Zoltán Kutalik

AbstractMendelian Randomisation (MR), an increasingly popular method that estimates the causal effects of risk factors on complex human traits, has seen several extensions that relax its basic assumptions. However, most of these extensions suffer from two major limitations; their under-exploitation of genome-wide markers, and sensitivity to the presence of a heritable confounder of the exposure-outcome relationship. To overcome these limitations, we propose a Latent Heritable Confounder MR (LHC-MR) method applicable to association summary statistics, which estimates bi-directional causal effects, direct heritability, and confounder effects while accounting for sample overlap. We demonstrate that LHC-MR out-performs several existing MR methods in a wide range of simulation settings and apply it to summary statistics of 13 complex traits. Besides several concordant results, LHC-MR unravelled new mechanisms (how being diagnosed for certain diseases might lead to improved lifestyle) and revealed potential false positive findings of standard MR methods (apparent causal effect of body mass index on educational attainment may be driven by a strong ignored confounder). Phenome-wide search to identify LHC-implied heritable confounders showed remarkable agreement between the LHC-estimated causal effects of the latent confounder and those for the potentially identified ones. Finally, LHC-MR naturally decomposes genetic correlation to causal effect-driven and confounder-driven contributions, demonstrating that the genetic correlation between systolic blood pressure and diabetes is predominantly confounder-driven.



2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Liza Darrous ◽  
Ninon Mounier ◽  
Zoltán Kutalik

AbstractMendelian Randomisation (MR) is an increasingly popular approach that estimates the causal effect of risk factors on complex human traits. While it has seen several extensions that relax its basic assumptions, most suffer from two major limitations; their under-exploitation of genome-wide markers, and sensitivity to the presence of a heritable confounder of the exposure-outcome relationship. To overcome these limitations, we propose a Latent Heritable Confounder MR (LHC-MR) method applicable to association summary statistics, which estimates bi-directional causal effects, direct heritabilities, and confounder effects while accounting for sample overlap. We demonstrate that LHC-MR outperforms several existing MR methods in a wide range of simulation settings and apply it to summary statistics of 13 complex traits. Besides several concordant results with other MR methods, LHC-MR unravels new mechanisms (how disease diagnosis might lead to improved lifestyle) and reveals new causal effects (e.g. HDL cholesterol being protective against high systolic blood pressure), hidden from standard MR methods due to a heritable confounder of opposite effect direction.



2019 ◽  
Author(s):  
Huwenbo Shi ◽  
Steven Gazal ◽  
Masahiro Kanai ◽  
Evan M. Koch ◽  
Armin P. Schoech ◽  
...  

AbstractMany diseases and complex traits exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We developed a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and applied S-LDXR to genome-wide association summary statistics for 31 diseases and complex traits in East Asians (EAS) and Europeans (EUR) (average NEAS=90K, NEUR=267K) with an average trans-ethnic genetic correlation of 0.85 (s.e. 0.01). We determined that squared trans-ethnic genetic correlation was 0.82× (s.e. 0.01) smaller than the genome-wide average at SNPs in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes were more population-specific in functionally important regions, including conserved and regulatory regions. In analyses of regions surrounding specifically expressed genes, causal effect sizes were most population-specific for skin and immune genes and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.



2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Huwenbo Shi ◽  
Steven Gazal ◽  
Masahiro Kanai ◽  
Evan M. Koch ◽  
Armin P. Schoech ◽  
...  

AbstractMany diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.



2017 ◽  
Author(s):  
Jorien L. Treur ◽  
Mark Gibson ◽  
Amy E Taylor ◽  
Peter J Rogers ◽  
Marcus R Munafò

AbstractStudy Objectives:Higher caffeine consumption has been linked to poorer sleep and insomnia complaints. We investigated whether these observational associations are the result of genetic risk factors influencing both caffeine consumption and poorer sleep, and/or whether they reflect (possibly bidirectional) causal effects.Methods:Summary-level data were available from genome-wide association studies (GWAS) on caffeine consumption (n=91,462), sleep duration, and chronotype (i.e., being a ‘morning’ versus an ‘evening’ person) (both n=128,266), and insomnia complaints (n=113,006). Linkage disequilibrium (LD) score regression was used to calculate genetic correlations, reflecting the extent to which genetic variants influencing caffeine consumption and sleep behaviours overlap. Causal effects were tested with bidirectional, two-sample Mendelian randomization (MR), an instrumental variable approach that utilizes genetic variants robustly associated with an exposure variable as an instrument to test causal effects. Estimates from individual genetic variants were combined using inverse-variance weighted meta-analysis, weighted median regression and MR Egger regression methods.Results:There was no clear evidence for genetic correlation between caffeine consumption and sleep duration (rg=0.000,p=0.998), chronotype (rg=0.086,p=0.192) or insomnia (rg=-0.034,p=0.700). Two-sample Mendelian randomization analyses did not support causal effects from caffeine consumption to sleep behaviours, or the other way around.Conclusions:We found no evidence in support of genetic correlation or causal effects between caffeine consumption and sleep. While caffeine may have acute effects on sleep when taken shortly before habitual bedtime, our findings suggest that a more sustained pattern of high caffeine consumption is likely associated with poorer sleep through shared environmental factors.



2007 ◽  
Vol 37 (1) ◽  
pp. 393-434 ◽  
Author(s):  
Jennie E. Brand ◽  
Yu Xie

We develop an approach to identifying and estimating causal effects in longitudinal settings with time-varying treatments and time-varying outcomes. The classic potential outcome approach to causal inference generally involves two time periods: units of analysis are exposed to one of two possible values of the causal variable, treatment or control, at a given point in time, and values for an outcome are assessed some time subsequent to exposure. In this paper, we develop a potential outcome approach for longitudinal situations in which both exposure to treatment and the effects of treatment are time-varying. In this longitudinal setting, the research interest centers not on only two potential outcomes, but on a whole matrix of potential outcomes, requiring a complicated conceptualization of many potential counterfactuals. Motivated by sociological applications, we develop a simplification scheme—a weighted composite causal effect that allows identification and estimation of effects with a number of possible solutions. Our approach is illustrated via an analysis of the effects of disability on subsequent employment status using panel data from the Wisconsin Longitudinal Study.



Author(s):  
Yiliang Zhang ◽  
Youshu Cheng ◽  
Wei Jiang ◽  
Yixuan Ye ◽  
Qiongshi Lu ◽  
...  

AbstractGenetic correlation is the correlation of additive genetic effects on two phenotypes. It is an informative metric to quantify the overall genetic similarity between complex traits, which provides insights into their polygenic genetic architecture. Several methods have been proposed to estimate genetic correlations based on data collected from genome-wide association studies (GWAS). Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. Here, we present a benchmark study for different summary-statistics-based genetic correlation estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic correlation: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To assess the performance of different methods in the presence of these two challenges, we first conducted comprehensive simulations with diverse LD patterns and sample overlaps. Then we applied these methods to real GWAS summary statistics for a wide spectrum of complex traits. Based on these experiments, we conclude that methods relying on accurate LD estimation are less robust in real data applications compared to other methods due to the imprecision of LD obtained from reference panels. Our findings offer a guidance on how to appropriately choose the method for genetic correlation estimation in post-GWAS analysis in interpretation.



2015 ◽  
Author(s):  
Brendan Bulik-Sullivan ◽  
Hilary K Finucane ◽  
Verneri Anttila ◽  
Alexander Gusev ◽  
Felix R Day ◽  
...  

Identifying genetic correlations between complex traits and diseases can provide useful etiological insights and help prioritize likely causal relationships. The major challenges preventing estimation of genetic correlation from genome-wide association study (GWAS) data with current methods are the lack of availability of individual genotype data and widespread sample overlap among meta-analyses. We circumvent these difficulties by introducing a technique for estimating genetic correlation that requires only GWAS summary statistics and is not biased by sample overlap. We use our method to estimate 300 genetic correlations among 25 traits, totaling more than 1.5 million unique phenotype measurements. Our results include genetic correlations between anorexia nervosa and schizophrenia/ body mass index and associations between educational attainment and several diseases. These results highlight the power of a polygenic modeling framework, since there currently are no genome-wide significant SNPs for anorexia nervosa and only three for educational attainment.



2021 ◽  
Author(s):  
Gui-Juan Feng ◽  
Qian Xu ◽  
Jing-Jing Ni ◽  
Shan-Shan Yang ◽  
Bai-Xue Han ◽  
...  

Abstract Age at menarche (AAM) is a sign of puberty of females. It is a heritable trait associated with various adult diseases. However, the genetic mechanism that determines AAM and links it to disease risk is poorly understood. Aiming to uncover the genetic basis for AAM, we conducted a joint association study in up to 438,089 participants from 3 genome-wide association studies of European and East Asian ancestries. Twenty-one novel genomic loci were identified at the genome-wide significance level. Besides, we observed significant genetic correlations between AAM and 67 complex traits, and the highest genetic correlation was observed between AAM and body mass index (rg=-0.19, P=6.11×10−31). Latent causal variable analyses demonstrate that there is a genetically causal effect of AAM on high blood pressure (GCP=0.47, P=0.02), forced vital capacity (GCP=0.63, P=0.02), age at first live birth (GCP=0.51, P=0.03), impedance of right arm (GCP=0.41, P<1×10-7) and right leg fat percentage (GCP=-0.10, P=0.02), etc. Enrichment analysis identified 5 enriched tissues and 51 enriched gene sets. Four of the five enriched tissues were related to the nervous system, including the hypothalamus middle, hypothalamo hypophyseal system, neurosecretory systems and hypothalamus. The fifth tissue was the retina in the sensory organ. The most significant gene set was the ‘decreased circulating luteinizing hormone level’ (P=2.45×10-6). Our findings may provide useful insights that elucidate the mechanisms determining AAM and the genetic interplay between AAM and some traits of women.



Author(s):  
Yiliang Zhang ◽  
Qiongshi Lu ◽  
Yixuan Ye ◽  
Kunling Huang ◽  
Wei Liu ◽  
...  

AbstractLocal genetic correlation quantifies the genetic similarity of complex traits in specific genomic regions, which could shed unique light on etiologic sharing and provide additional mechanistic insights into the genetic basis of complex traits compared to global genetic correlation. However, accurate estimation of local genetic correlation remains challenging, in part due to extensive linkage disequilibrium in local genomic regions and pervasive sample overlap across studies. We introduce SUPERGNOVA, a unified framework to estimate both global and local genetic correlations using summary statistics from genome-wide association studies. Through extensive simulations and analyses of 30 complex traits, we demonstrate that SUPERGNOVA substantially outperforms existing methods and identifies 150 trait pairs with significant local genetic correlations. In particular, we show that the positive, consistently-identified, yet paradoxical genetic correlation between autism spectrum disorder and cognitive performance could be explained by two etiologically-distinct genetic signatures with bidirectional local genetic correlations. We believe that statistically-rigorous local genetic correlation analysis could accelerate progress in complex trait genetics research.



Sign in / Sign up

Export Citation Format

Share Document