scholarly journals Novel Aggregative trans-eQTL Association Analysis of Known Genetic Variants Detect Trait-specific Target Gene-sets

Author(s):  
Diptavo Dutta ◽  
Yuan He ◽  
Ashis Saha ◽  
Marios Arvanitis ◽  
Alexis Battle ◽  
...  

AbstractLarge scale genetic association studies have identified many trait-associated variants and understanding the role of these variants in downstream regulation of gene-expressions can uncover important mediating biological mechanisms. In this study, we propose Aggregative tRans assoCiation to detect pHenotype specIfic gEne-sets (ARCHIE), as a method to establish links between sets of known genetic variants associated with a trait and sets of co-regulated gene-expressions through trans associations. ARCHIE employs sparse canonical correlation analysis based on summary statistics from trans-eQTL mapping and genotype and expression correlation matrices constructed from external data sources. We propose a resampling based procedure to test for significant trait-specific trans-association patterns in the background of highly polygenic regulation of gene-expression. By applying ARCHIE to available trans-eQTL summary statistics reported by the eQTLGen consortium, we identify 71 gene networks which have significant evidence of trans-association with groups of known genetic variants across 29 complex traits. A majority (50.7%) of the genes do not have any strong trans-associations and could not have been detected by standard trans-eQTL mapping. We provide further evidence for causal basis of the target genes through a series of follow-up analyses. These results show ARCHIE is a powerful tool for identifying sets of genes whose trans regulation may be related to specific complex traits.

2021 ◽  
Author(s):  
Diptavo Dutta ◽  
Yuan He ◽  
Ashis Saha ◽  
Marios Arvanitis ◽  
Alexis Battle ◽  
...  

Abstract Large scale genetic association studies have identified many trait-associated variants and understanding the role of these variants in downstream regulation of gene-expressions can uncover important mediating biological mechanisms. In this study, we propose Aggregative tRans assoCiation to detect pHenotype specIfic gEne-sets (ARCHIE), as a method to establish links between sets of known genetic variants associated with a trait and sets of co-regulated gene-expressions through trans associations. ARCHIE employs sparse canonical correlation analysis based on summary statistics from trans-eQTL mapping and genotype and expression correlation matrices constructed from external data sources. A resampling based procedure is then used to test for significant trait-specific trans-association patterns in the background of highly polygenic regulation of gene-expression. Simulation studies show that compared to standard trans-eQTL analysis, ARCHIE is better suited to identify “core”-like genes through which effects of many other genes may be mediated and which can explain disease specific patterns of genetic associations. By applying ARCHIE to available trans-eQTL summary statistics reported by the eQTLGen consortium, we identify 71 gene networks which have significant evidence of trans-association with groups of known genetic variants across 29 complex traits. Around half (50.7%) of the selected genes do not have any strong trans-associations and could not have been detected by standard trans-eQTL mapping. We provide further evidence for causal basis of the target genes through a series of follow-up analyses. These results show ARCHIE is a powerful tool for identifying sets of genes whose trans regulation may be related to specific complex traits. The method has potential for broader applications for identification of networks of various types of molecular traits which mediates complex traits genetic associations.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yuhua Zhang ◽  
◽  
Corbin Quick ◽  
Ketian Yu ◽  
Alvaro Barbeira ◽  
...  

Abstract We propose a new computational framework, probabilistic transcriptome-wide association study (PTWAS), to investigate causal relationships between gene expressions and complex traits. PTWAS applies the established principles from instrumental variables analysis and takes advantage of probabilistic eQTL annotations to delineate and tackle the unique challenges arising in TWAS. PTWAS not only confers higher power than the existing methods but also provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type-specific gene-to-trait effects. We illustrate the power of PTWAS by analyzing the eQTL data across 49 tissues from GTEx (v8) and GWAS summary statistics from 114 complex traits.


2019 ◽  
Author(s):  
Yi Yang ◽  
Xingjie Shi ◽  
Yuling Jiao ◽  
Jian Huang ◽  
Min Chen ◽  
...  

AbstractMotivationAlthough genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) [42] was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required.ResultsIn this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS [email protected] and implementationThe implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM.Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2009-2016 ◽  
Author(s):  
Yi Yang ◽  
Xingjie Shi ◽  
Yuling Jiao ◽  
Jian Huang ◽  
Min Chen ◽  
...  

Abstract Motivation Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. Results In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. Availability and implementation The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Anyi Yang ◽  
Jingqi Chen ◽  
Xing-Ming Zhao

AbstractMotivationAnnotating genetic variants from summary statistics of genome-wide association studies (GWAS) is crucial for predicting risk genes of various disorders. The multi-marker analysis of genomic annotation (MAGMA) is one of the most popular tools for this purpose, where MAGMA aggregates signals of single nucleotide polymorphisms (SNPs) to their nearby genes. However, SNPs may also affect genes in a distance, thus missed by MAGMA. Although different upgrades of MAGMA have been proposed to extend gene-wise variant annotations with more information (e.g. Hi-C or eQTL), the regulatory relationships among genes and the tissue-specificity of signals have not been taken into account.ResultsWe propose a new approach, namely network-enhanced MAGMA (nMAGMA), for gene-wise annotation of variants from GWAS summary statistics. Compared with MAGMA and H-MAGMA, nMAGMA significantly extends the lists of genes that can be annotated to SNPs by integrating local signals, long-range regulation signals, and tissue-specific gene networks. When applied to schizophrenia, nMAGMA is able to detect more risk genes (217% more than MAGMA and 57% more than H-MAGMA) that are reasonably involved in schizophrenia compared to MAGMA and H-MAGMA. Some disease-related functions (e.g. the ATPase pathway in Cortex) tissues are also uncovered in nMAGMA but not in MAGMA or H-MAGMA. Moreover, nMAGMA provides tissue-specific risk signals, which are useful for understanding disorders with multi-tissue origins.


2018 ◽  
Author(s):  
Eleonora Porcu ◽  
Sina Rüeger ◽  
Kaido Lepik ◽  
Federico A. Santoni ◽  
Alexandre Reymond ◽  
...  

AbstractGenome-wide association studies (GWAS) identified thousands of variants associated with complex traits, but their biological interpretation often remains unclear. Most of these variants overlap with expression QTLs (eQTLs), indicating their potential involvement in the regulation of gene expression.Here, we propose an advanced transcriptome-wide summary statistics-based Mendelian Randomization approach (called TWMR) that uses multiple SNPs jointly as instruments and multiple gene expression traits as exposures, simultaneously.When applied to 43 human phenotypes it uncovered 2,369 genes whose blood expression is putatively associated with at least one phenotype resulting in 3,913 gene-trait associations; of note, 36% of them had no genome-wide significant SNP nearby in previous GWAS analysis. Using independent association summary statistics (UKBiobank), we confirmed that the majority of these loci were missed by conventional GWAS due to power issues. Noteworthy among these novel links is educational attainment-associated BSCL2, known to carry mutations leading to a mendelian form of encephalopathy. We similarly unraveled novel pleiotropic causal effects suggestive of mechanistic connections, e.g. the shared genetic effects of GSDMB in rheumatoid arthritis, ulcerative colitis and Crohn’s disease.Our advanced Mendelian Randomization unlocks hidden value from published GWAS through higher power in detecting associations. It better accounts for pleiotropy and unravels new biological mechanisms underlying complex and clinical traits.


Author(s):  
Anyi Yang ◽  
Jingqi Chen ◽  
Xing-Ming Zhao

Abstract Motivation: Annotating genetic variants from summary statistics of genome-wide association studies (GWAS) is crucial for predicting risk genes of various disorders. The multimarker analysis of genomic annotation (MAGMA) is one of the most popular tools for this purpose, where MAGMA aggregates signals of single nucleotide polymorphisms (SNPs) to their nearby genes. In biology, SNPs may also affect genes that are far away in the genome, thus missed by MAGMA. Although different upgrades of MAGMA have been proposed to extend gene-wise variant annotations with more information (e.g. Hi-C or eQTL), the regulatory relationships among genes and the tissue specificity of signals have not been taken into account. Results: We propose a new approach, namely network-enhanced MAGMA (nMAGMA), for gene-wise annotation of variants from GWAS summary statistics. Compared with MAGMA and H-MAGMA, nMAGMA significantly extends the lists of genes that can be annotated to SNPs by integrating local signals, long-range regulation signals (i.e. interactions between distal DNA elements), and tissue-specific gene networks. When applied to schizophrenia (SCZ), nMAGMA is able to detect more risk genes (217% more than MAGMA and 57% more than H-MAGMA) that are involved in SCZ compared with MAGMA and H-MAGMA, and more of nMAGMA results can be validated with known SCZ risk genes. Some disease-related functions (e.g. the ATPase pathway in Cortex) are also uncovered in nMAGMA but not in MAGMA or H-MAGMA. Moreover, nMAGMA provides tissue-specific risk signals, which are useful for understanding disorders with multitissue origins.


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.


2018 ◽  
Author(s):  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.


Sign in / Sign up

Export Citation Format

Share Document