Complex-Traits Genetics Virtual Lab: A community-driven web platform for post-GWAS analyses

2019 ◽  
Author(s):  
Gabriel Cuellar-Partida ◽  
Mischa Lundberg ◽  
Pik Fang Kho ◽  
Shannon D’Urso ◽  
Luis F. Gutierrez-Mondragon ◽  
...  

AbstractBackgroundGenome-wide association studies (GWAS) are an important method for mapping genetic variation underlying complex traits and diseases. Tools to visualize, annotate and analyse results from these studies can be used to generate hypotheses about the molecular mechanisms underlying the associations.FindingsThe Complex-Traits Genetics Virtual Lab (CTG-VL) integrates over a thousand publicly-available GWAS summary statistics, a suite of analysis tools, visualization functions and diverse data sets for genomic annotations. CTG-VL also makes available results from gene, pathway and tissue-based analyses from over 1,500 complex-traits allowing to assess pleiotropy not only at the genetic variant level but also at the gene, pathway and tissue levels. In this manuscript, we showcase the platform by analysing GWAS summary statistics of mood swings derived from UK Biobank. Using analysis tools in CTG-VL we highlight hippocampus as a potential tissue involved in mood swings, and that pathways including neuron apoptotic process may underlie the genetic associations. Further, we report a negative genetic correlation with educational attainment rG = −0.41 ± 0.018 and a potential causal effect of BMI on mood swings OR = 1.01 (95% CI = 1.00–1.02). Using CTG-VL’s database, we show that pathways and tissues associated with mood swings are also associated with neurological traits including reaction time and neuroticism, as well as traits such age at menopause and age at first live birth.ConclusionsCTG-VL is a platform with the most complete set of tools to carry out post-GWAS analyses. The CTG-VL is freely available at https://genoma.io as an online web application.

Author(s):  
Xiaofeng Zhu ◽  
Xiaoyin Li ◽  
Rong Xu ◽  
Tao Wang

Abstract Motivation The overall association evidence of a genetic variant with multiple traits can be evaluated by cross-phenotype association analysis using summary statistics from genome-wide association studies. Further dissecting the association pathways from a variant to multiple traits is important to understand the biological causal relationships among complex traits. Results Here, we introduce a flexible and computationally efficient Iterative Mendelian Randomization and Pleiotropy (IMRP) approach to simultaneously search for horizontal pleiotropic variants and estimate causal effect. Extensive simulations and real data applications suggest that IMRP has similar or better performance than existing Mendelian Randomization methods for both causal effect estimation and pleiotropic variant detection. The developed pleiotropy test is further extended to detect colocalization for multiple variants at a locus. IMRP will greatly facilitate our understanding of causal relationships underlying complex traits, in particular, when a large number of genetic instrumental variables are used for evaluating multiple traits. Availability and implementation The software IMRP is available at https://github.com/XiaofengZhuCase/IMRP. The simulation codes can be downloaded at http://hal.case.edu/∼xxz10/zhu-web/ under the link: MR Simulations software. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Zhongshang Yuan ◽  
Huanhuan Zhu ◽  
Ping Zeng ◽  
Sheng Yang ◽  
Shiquan Sun ◽  
...  

AbstractIntegrating association results from both genome-wide association studies (GWASs) and expression quantitative trait locus (eQTL) mapping studies has the potential to shed light on the molecular mechanisms underlying disease etiology. Several statistical methods have been recently developed to integrate GWASs with eQTL studies in the form of transcriptome-wide association studies (TWASs). These existing methods can all be viewed as a form of two sample Mendelian randomization (MR) analysis, which has been widely applied in various GWASs for inferring the causal relationship among complex traits. Unfortunately, most existing TWAS and MR methods make an unrealistic modeling assumption and assume that instrumental variables do not exhibit horizontal pleiotropic effects. However, horizontal pleiotropic effects have been recently discovered to be wide spread across complex traits, and, as we will show here, are also wide spread across gene expression traits. Therefore, not allowing for horizontal pleiotropic effects can be overly restrictive, and, as we will be show here, can lead to a substantial inflation of test statistics and subsequently false discoveries in TWAS applications. Here, we present a probabilistic MR method, which we refer to as PMR-Egger, for testing and controlling for horizontal pleiotropic effects in TWAS applications. PMR-Egger relies on an MR likelihood framework that unifies many existing TWAS and MR methods, accommodates multiple correlated instruments, tests the causal effect of gene on trait in the presence of horizontal pleiotropy, and, with a newly developed parameter expansion version of the expectation maximization algorithm, is scalable to hundreds of thousands of individuals. With extensive simulations, we show that PMR-Egger provides calibrated type I error control for causal effect testing in the presence of horizontal pleiotropic effects, is reasonably robust for various types of horizontal pleiotropic effect mis-specifications, is more powerful than existing MR approaches, and, as a by-product, can directly test for horizontal pleiotropy. We illustrate the benefits of PMR-Egger in applications to 39 diseases and complex traits obtained from three GWASs including the UK Biobank. In these applications, we show how PMR-Egger can lead to new biological discoveries through integrative analysis.


2018 ◽  
Author(s):  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.


2020 ◽  
Author(s):  
Jingshu Wang ◽  
Qingyuan Zhao ◽  
Jack Bowden ◽  
Gilbran Hemani ◽  
George Davey Smith ◽  
...  

Over a decade of genome-wide association studies have led to the finding that significant genetic associations tend to spread across the genome for complex traits. The extreme polygenicity where "all genes affect every complex trait" complicates Mendelian Randomization studies, where natural genetic variations are used as instruments to infer the causal effect of heritable risk factors. We reexamine the assumptions of existing Mendelian Randomization methods and show how they need to be clarified to allow for pervasive horizontal pleiotropy and heterogeneous effect sizes. We propose a comprehensive framework GRAPPLE (Genome-wide mR Analysis under Pervasive PLEiotropy) to analyze the causal effect of target risk factors with heterogeneous genetic instruments and identify possible pleiotropic patterns from data. By using summary statistics from genome-wide association studies, GRAPPLE can efficiently use both strong and weak genetic instruments, detect the existence of multiple pleiotropic pathways, adjust for confounding risk factors, and determine the causal direction. With GRAPPLE, we analyze the effect of blood lipids, body mass index, and systolic blood pressure on 25 disease outcomes, gaining new information on their causal relationships and the potential pleiotropic pathways.


Author(s):  
Yiliang Zhang ◽  
Youshu Cheng ◽  
Wei Jiang ◽  
Yixuan Ye ◽  
Qiongshi Lu ◽  
...  

AbstractGenetic correlation is the correlation of additive genetic effects on two phenotypes. It is an informative metric to quantify the overall genetic similarity between complex traits, which provides insights into their polygenic genetic architecture. Several methods have been proposed to estimate genetic correlations based on data collected from genome-wide association studies (GWAS). Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. Here, we present a benchmark study for different summary-statistics-based genetic correlation estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic correlation: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To assess the performance of different methods in the presence of these two challenges, we first conducted comprehensive simulations with diverse LD patterns and sample overlaps. Then we applied these methods to real GWAS summary statistics for a wide spectrum of complex traits. Based on these experiments, we conclude that methods relying on accurate LD estimation are less robust in real data applications compared to other methods due to the imprecision of LD obtained from reference panels. Our findings offer a guidance on how to appropriately choose the method for genetic correlation estimation in post-GWAS analysis in interpretation.


2015 ◽  
Author(s):  
Guo-Bo Chen ◽  
Sang Hong Lee ◽  
Matthew R Robinson ◽  
Maciej Trzaskowski ◽  
Zhi-Xiang Zhu ◽  
...  

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.


2021 ◽  
Author(s):  
Gui-Juan Feng ◽  
Qian Xu ◽  
Jing-Jing Ni ◽  
Shan-Shan Yang ◽  
Bai-Xue Han ◽  
...  

Abstract Age at menarche (AAM) is a sign of puberty of females. It is a heritable trait associated with various adult diseases. However, the genetic mechanism that determines AAM and links it to disease risk is poorly understood. Aiming to uncover the genetic basis for AAM, we conducted a joint association study in up to 438,089 participants from 3 genome-wide association studies of European and East Asian ancestries. Twenty-one novel genomic loci were identified at the genome-wide significance level. Besides, we observed significant genetic correlations between AAM and 67 complex traits, and the highest genetic correlation was observed between AAM and body mass index (rg=-0.19, P=6.11×10−31). Latent causal variable analyses demonstrate that there is a genetically causal effect of AAM on high blood pressure (GCP=0.47, P=0.02), forced vital capacity (GCP=0.63, P=0.02), age at first live birth (GCP=0.51, P=0.03), impedance of right arm (GCP=0.41, P<1×10-7) and right leg fat percentage (GCP=-0.10, P=0.02), etc. Enrichment analysis identified 5 enriched tissues and 51 enriched gene sets. Four of the five enriched tissues were related to the nervous system, including the hypothalamus middle, hypothalamo hypophyseal system, neurosecretory systems and hypothalamus. The fifth tissue was the retina in the sensory organ. The most significant gene set was the ‘decreased circulating luteinizing hormone level’ (P=2.45×10-6). Our findings may provide useful insights that elucidate the mechanisms determining AAM and the genetic interplay between AAM and some traits of women.


2016 ◽  
Author(s):  
Daniela Zanetti ◽  
Michael E. Weale

AbstractThrough genome-wide association studies (GWASs), researchers have identified hundreds of genetic variants associated with particular complex traits. Previous studies have compared the pattern of association signals across different populations in real data, and these have detected differences in the strength and sometimes even the direction of GWAS signals. These differences could be due to a combination of (1) lack of power (insufficient sample sizes); (2) minor allele frequency (MAF) differences (again affecting power); (3) linkage disequilibrium (LD) differences (affecting power to ‘tag’ the causal variant); and (4) true differences in causal variant effect sizes (defined by relative risks).In the present work, we sought to assess whether the first three of these reasons are sufficient on their own to explain the observed incidence of trans-ethnic differences in replications of GWAS signals, or whether the fourth reason is also required. We simulated case-control data of European, Asian and African ancestry, drawing on observed MAF and LD patterns seen in the 1000-Genomes reference dataset and assuming the true causal relative risks were the same in all three populations.We found that a combination of Euro-centric SNP selection and between-population differences in LD, accentuated by the lower SNP density typical of older GWAS panels, was sufficient to explain the rate of trans-ethnic differences previously reported, without the need to assume between-population differences in true causal SNP effect size. This suggests a cross-population consistency that has implications for our understanding of the interplay between genetics and environment in the aetiology of complex human diseases.


2019 ◽  
Author(s):  
Jia Zhao ◽  
Jingsi Ming ◽  
Xianghong Hu ◽  
Gang Chen ◽  
Jin Liu ◽  
...  

Abstract Motivation The results from Genome-Wide Association Studies (GWAS) on thousands of phenotypes provide an unprecedented opportunity to infer the causal effect of one phenotype (exposure) on another (outcome). Mendelian randomization (MR), an instrumental variable (IV) method, has been introduced for causal inference using GWAS data. Due to the polygenic architecture of complex traits/diseases and the ubiquity of pleiotropy, however, MR has many unique challenges compared to conventional IV methods. Results We propose a Bayesian weighted Mendelian randomization (BWMR) for causal inference to address these challenges. In our BWMR model, the uncertainty of weak effects owing to polygenicity has been taken into account and the violation of IV assumption due to pleiotropy has been addressed through outlier detection by Bayesian weighting. To make the causal inference based on BWMR computationally stable and efficient, we developed a variational expectation-maximization (VEM) algorithm. Moreover, we have also derived an exact closed-form formula to correct the posterior covariance which is often underestimated in variational inference. Through comprehensive simulation studies, we evaluated the performance of BWMR, demonstrating the advantage of BWMR over its competitors. Then we applied BWMR to make causal inference between 130 metabolites and 93 complex human traits, uncovering novel causal relationship between exposure and outcome traits. Availability and implementation The BWMR software is available at https://github.com/jiazhao97/BWMR. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 32 (1) ◽  
pp. 47-56
Author(s):  
Thomas W. Mühleisen ◽  
Andreas J. Forstner ◽  
Per Hoffmann ◽  
Sven Cichon

Abstract Brain imaging genomics is an emerging discipline in which genomic and brain imaging data are integrated in order to elucidate the molecular mechanisms that underly brain phenotypes and diseases, including neuropsychiatric disorders. As with all genetic analyses of complex traits and diseases, brain imaging genomics has evolved from small, individual candidate gene investigations towards large, collaborative genome-wide association studies. Recent investigations, mostly population-based, have studied well-powered cohorts comprising tens of thousands of individuals and identified multiple robust associations of single-nucleotide polymorphisms and copy number variants with structural and functional brain phenotypes. Such systematic genomic screens of millions of genetic variants have generated initial insights into the genetic architecture of brain phenotypes and demonstrated that their etiology is polygenic in nature, involving multiple common variants with small effect sizes and rare variants with larger effect sizes. Ongoing international collaborative initiatives are now working to obtain a more complete picture of the underlying biology. As in other complex phenotypes, novel approaches – such as gene–gene interaction, gene–environment interaction, and epigenetic analyses – are being implemented in order to investigate their contribution to the observed phenotypic variability. An important consideration for future research will be the translation of brain imaging genomics findings into clinical practice.


Sign in / Sign up

Export Citation Format

Share Document