Analyzing and Reconciling Colocalization and Transcriptome-wide Association Studies from the Perspective of Inferential Reproducibility

Transcriptome-wide association studies and colocalization analysis are popular computational approaches for integrating genetic association data from molecular and complex traits. They show the unique ability to go beyond variant-level genetic association evidence and implicate critical functional units, e.g., genes, in disease etiology. However, in practice, when the two approaches are applied to the same molecular and complex trait data, the inference results can be markedly different. This paper systematically investigates the inferential reproducibility between the two approaches through theoretical derivation, numerical experiments, and analyses of 4 complex trait GWAS and GTEx eQTL data. We identify two classes of inconsistent inference results. We find that the first class of inconsistent results may suggest an interesting biological phenomenon, i.e., horizontal pleiotropy; thus, the two approaches are truly complementary. The inconsistency in the second class can be understood and effectively reconciled. To this end, we propose a novel approach for locus-level colocalization analysis. We demonstrate that the joint TWAS and locus-level colocalization analysis improves specificity and sensitivity for implicating biological-relevant genes.

Download Full-text

Investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis

10.1101/808295 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yuhua Zhang ◽

Corbin Quick ◽

Ketian Yu ◽

Alvaro Barbeira ◽

Francesca Luca ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Large Scale ◽

Molecular Mechanisms ◽

Association Studies ◽

Complex Trait ◽

Causal Effects ◽

Biological Mechanisms ◽

Integrative Framework ◽

Eqtl Data

AbstractTranscriptome-wide association studies (TWAS), an integrative framework using expression quantitative trait loci (eQTLs) to construct proxies for gene expression, have emerged as a promising method to investigate the biological mechanisms underlying associations between genotypes and complex traits. However, challenges remain in interpreting TWAS results, especially regarding their causality implications. In this paper, we describe a new computational framework, probabilistic TWAS (PTWAS), to detect associations and investigate causal relationships between gene expression and complex traits. We use established concepts and principles from instrumental variables (IV) analysis to delineate and address the unique challenges that arise in TWAS. PTWAS utilizes probabilistic eQTL annotations derived from multi-variant Bayesian fine-mapping analysis conferring higher power to detect TWAS associations than existing methods. Additionally, PTWAS provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type specific causal effects of gene expression on complex traits. These features make PTWAS uniquely suited for in-depth investigations of the biological mechanisms that contribute to complex trait variation. Using eQTL data across 49 tissues from GTEx v8, we apply PTWAS to analyze 114 complex traits using GWAS summary statistics from several large-scale projects, including the UK Biobank. Our analysis reveals an abundance of genes with strong evidence of eQTL-mediated causal effects on complex traits and highlights the heterogeneity and tissue-relevance of these effects across complex traits. We distribute software and eQTL annotations to enable users performing rigorous TWAS analysis by leveraging the full potentials of the latest GTEx multi-tissue eQTL data.

Download Full-text

A Novel Approach for the Simultaneous Analysis of Common and Rare Variants in Complex Traits

Bioinformatics and Biology Insights ◽

10.4137/bbi.s8852 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S8852 ◽

Cited By ~ 4

Author(s):

Ao Yuan ◽

Guanjie Chen ◽

Yanxun Zhou ◽

Amy Bentley ◽

Charles Rotimi

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Simultaneous Analysis ◽

Genome Wide Association Studies ◽

Common Variants ◽

Disease Etiology ◽

Novel Approach ◽

Common Genetic Variants

Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.

Download Full-text

Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis

10.1101/447367 ◽

2018 ◽

Cited By ~ 99

Author(s):

Urmo Võsa ◽

Annique Claringbould ◽

Harm-Jan Westra ◽

Marc Jan Bonder ◽

Patrick Deelen ◽

...

Keyword(s):

Complex Traits ◽

Genome Wide Association Study ◽

Association Studies ◽

Complex Trait ◽

Genome Wide Association ◽

Eqtl Analysis ◽

Genome Wide Association Studies ◽

Disease Etiology ◽

Genome Wide ◽

Polygenic Scores

SummaryWhile many disease-associated variants have been identified through genome-wide association studies, their downstream molecular consequences remain unclear.To identify these effects, we performedcis-andtrans-expressionquantitative trait locus (eQTL) analysis in blood from 31,684 individuals through the eQTLGen Consortium.We observed thatcis-eQTLs can be detected for 88% of the studied genes, but that they have a different genetic architecture compared to disease-associated variants, limiting our ability to usecis-eQTLs to pinpoint causal genes within susceptibility loci.In contrast, trans-eQTLs (detected for 37% of 10,317 studied trait-associated variants) were more informative. Multiple unlinked variants, associated to the same complex trait, often converged on trans-genes that are known to play central roles in disease etiology.We observed the same when ascertaining the effect of polygenic scores calculated for 1,263 genome-wide association study (GWAS) traits. Expression levels of 13% of the studied genes correlated with polygenic scores, and many resulting genes are known to drive these traits.

Download Full-text

Genetic control of RNA splicing and its distinctive role in complex trait variation

10.21203/rs.3.rs-155233/v1 ◽

2021 ◽

Author(s):

Jian Yang ◽

Ting Qi ◽

Yang Wu ◽

Futao Zhang ◽

Jian Zeng

Keyword(s):

Complex Traits ◽

Rna Splicing ◽

Association Studies ◽

Genetic Regulation ◽

Complex Trait ◽

Regulation Of Transcription ◽

Genome Wide Association Studies ◽

Trait Variation ◽

Eqtl Data ◽

Distinctive Role

Abstract Most genetic variants identified from genome-wide association studies (GWAS) in humans are noncoding, indicating their role in gene regulation. Prior studies have shown considerable links of GWAS signals to expression quantitative trait loci (eQTLs), but the links to other genetic regulatory mechanisms such as splicing QTLs (sQTLs) are underexplored. Here, we introduce a transcript-based sQTL method (named THISTLE) with improved power for sQTL detection. Applying THISTLE along with LeafCutter, an event-based sQTL method, to brain transcriptomic data (n=1,073), we identified 7,491 genes with sQTLs with P<5×10^(-8) (the largest brain cis-sQTL collection to date), ~68% of which were distinct from eQTLs. Integrating the sQTL data into GWAS for ten brain-related complex traits (including diseases), we identified 107 genes associated with the traits through the sQTLs, ~68% of which could not be discovered using eQTL data. Our study demonstrates the distinctive role of most sQTLs in genetic regulation of transcription and complex trait variation.

Download Full-text

Quantitative Human Paleogenetics: What can Ancient DNA Tell us About Complex Trait Evolution?

Frontiers in Genetics ◽

10.3389/fgene.2021.703541 ◽

2021 ◽

Vol 12 ◽

Author(s):

Evan K. Irving-Pease ◽

Rasa Muktupavela ◽

Michael Dannemann ◽

Fernando Racimo

Keyword(s):

Ancient Dna ◽

Complex Traits ◽

Large Scale ◽

Association Studies ◽

Complex Trait ◽

New Wave ◽

Trait Evolution ◽

The Past ◽

Different Populations ◽

Association Data

Genetic association data from national biobanks and large-scale association studies have provided new prospects for understanding the genetic evolution of complex traits and diseases in humans. In turn, genomes from ancient human archaeological remains are now easier than ever to obtain, and provide a direct window into changes in frequencies of trait-associated alleles in the past. This has generated a new wave of studies aiming to analyse the genetic component of traits in historic and prehistoric times using ancient DNA, and to determine whether any such traits were subject to natural selection. In humans, however, issues about the portability and robustness of complex trait inference across different populations are particularly concerning when predictions are extended to individuals that died thousands of years ago, and for which little, if any, phenotypic validation is possible. In this review, we discuss the advantages of incorporating ancient genomes into studies of trait-associated variants, the need for models that can better accommodate ancient genomes into quantitative genetic frameworks, and the existing limits to inferences about complex trait evolution, particularly with respect to past populations.

Download Full-text

GPA-MDS: A Visualization Approach to Investigate Genetic Architecture among Phenotypes Using GWAS Results

International Journal of Genomics ◽

10.1155/2016/6589843 ◽

2016 ◽

Vol 2016 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Wei Wei ◽

Paula S. Ramos ◽

Kelly J. Hunt ◽

Bethany J. Wolf ◽

Gary Hardiman ◽

...

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Association Studies ◽

Genetic Relationships ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Risk Variants ◽

Novel Approach ◽

Medical Benefits ◽

Rigorous Framework

Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. Recently, there has been accumulating evidence suggesting that different complex traits share a common risk basis, namely, pleiotropy. Previously, a statistical method, namely, GPA (Genetic analysis incorporating Pleiotropy and Annotation), was developed to improve identification of risk variants and to investigate pleiotropic structure through a joint analysis of multiple GWAS datasets. While GPA provides a statistically rigorous framework to evaluate pleiotropy between phenotypes, it is still not trivial to investigate genetic relationships among a large number of phenotypes using the GPA framework. In order to address this challenge, in this paper, we propose a novel approach, GPA-MDS, to visualize genetic relationships among phenotypes using the GPA algorithm and multidimensional scaling (MDS). This tool will help researchers to investigate common etiology among diseases, which can potentially lead to development of common treatments across diseases. We evaluate the proposed GPA-MDS framework using a simulation study and apply it to jointly analyze GWAS datasets examining 18 unique phenotypes, which helps reveal the shared genetic architecture of these phenotypes.

Download Full-text

Informing disease modelling with brain-relevant functional genomic annotations

Brain ◽

10.1093/brain/awz295 ◽

2019 ◽

Vol 142 (12) ◽

pp. 3694-3712 ◽

Cited By ~ 4

Author(s):

Regina H Reynolds ◽

John Hardy ◽

Mina Ryten ◽

Sarah A Gagliano Taliun

Keyword(s):

Genetic Association ◽

Association Studies ◽

Therapeutic Targets ◽

Genome Wide Association ◽

Disease Modelling ◽

Genome Wide Association Studies ◽

Functional Genomic ◽

Genome Wide ◽

Neuropsychiatric Diseases ◽

Association Data

How can we best translate the success of genome-wide association studies for neurological and neuropsychiatric diseases into therapeutic targets? Reynolds et al. critically assess existing brain-relevant functional genomic annotations and the tools available for integrating such annotations with summary-level genetic association data.

Download Full-text

Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples

Genetics ◽

10.1534/genetics.118.301865 ◽

2019 ◽

Vol 212 (3) ◽

pp. 919-929

Author(s):

Daniel A. Skelly ◽

Narayanan Raghupathy ◽

Raymond F. Robledo ◽

Joel H. Graber ◽

Elissa J. Chesler

Keyword(s):

Gene Expression ◽

Canonical Correlation ◽

Complex Traits ◽

Behavioral Genetics ◽

Association Studies ◽

Complex Trait ◽

Integrated Analysis ◽

Data Set ◽

Trait Analysis ◽

Molecular Features

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.

Download Full-text

Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improve the power of transcriptome-wide association studies

10.1101/2020.07.03.186247 ◽

2020 ◽

Author(s):

Helian Feng ◽

Nicholas Mancuso ◽

Alexander Gusev ◽

Arunabha Majumdar ◽

Megan Major ◽

...

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Complex Traits ◽

Association Studies ◽

Tissue Expression ◽

Expression Levels ◽

Sparse Canonical Correlation Analysis ◽

Eqtl Data

AbstractTranscriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.Author summaryTranscriptome-wide association studies (TWAS) can improve the statistical power of genetic association studies by leveraging the relationship between genetically predicted transcript expression levels and an outcome. We propose a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. We generate cross-tissue expression features using sparse canonical correlation analysis and then combine evidence for expression-outcome association across cross- and single-tissue features using the aggregate Cauchy association test. We show that this approach has substantially higher power than traditional single-tissue TWAS methods. Application of these methods to publicly available summary statistics for ten complex traits also identifies associations missed by single-tissue methods.

Download Full-text

Inferring relevant tissues and cell types for complex traits in genome-wide association studies

10.1101/2021.06.09.447805 ◽

2021 ◽

Author(s):

Rujin Wang ◽

Danyu Lin ◽

Yuchao Jiang

Keyword(s):

Single Cell ◽

Complex Traits ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Cell Type ◽

Disease Etiology ◽

Genome Wide ◽

Cell Type Specific

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.

Download Full-text