Reference trait analysis reveals correlations between gene expression and quantitative traits in disjoint samples

ABSTRACTSystems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTLs. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript-trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint sub-samples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest in order to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the dataset and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait datasets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of the reference trait method for identifying relations between complex traits and their molecular substrates.AUTHOR SUMMARYSystems genetics exploits natural genetic variation and high-throughput measurements of molecular intermediates to dissect genetic contributions to complex traits. An important goal of this strategy is to correlate molecular features, such as transcript or protein abundance, with complex traits. For practical, technical, or financial reasons, it may be impossible to measure complex traits and molecular intermediates on the same individuals. Instead, in some cases these two sets of traits may be measured on independent cohorts. We outline a method, reference trait analysis, for identifying molecular correlates of complex traits in this scenario. We show that our method powerfully identifies complex trait correlates across a wide range of parameters that are biologically plausible and experimentally practical. Furthermore, we show that reference trait analysis can identify transcripts correlated to a complex trait more accurately than approaches such as TWAS that use genetic variation to predict gene expression. Reference trait analysis will contribute to furthering our understanding of variation in complex traits by identifying molecular correlates of complex traits that are measured in different individuals.

Download Full-text

Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples

Genetics ◽

10.1534/genetics.118.301865 ◽

2019 ◽

Vol 212 (3) ◽

pp. 919-929

Author(s):

Daniel A. Skelly ◽

Narayanan Raghupathy ◽

Raymond F. Robledo ◽

Joel H. Graber ◽

Elissa J. Chesler

Keyword(s):

Gene Expression ◽

Canonical Correlation ◽

Complex Traits ◽

Behavioral Genetics ◽

Association Studies ◽

Complex Trait ◽

Integrated Analysis ◽

Data Set ◽

Trait Analysis ◽

Molecular Features

Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript–trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative “reference” traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.

Download Full-text

Leveraging expression from multiple tissues using sparse canonical correlation analysis and aggregate tests improve the power of transcriptome-wide association studies

10.1101/2020.07.03.186247 ◽

2020 ◽

Author(s):

Helian Feng ◽

Nicholas Mancuso ◽

Alexander Gusev ◽

Arunabha Majumdar ◽

Megan Major ◽

...

Keyword(s):

Gene Expression ◽

Correlation Analysis ◽

Canonical Correlation Analysis ◽

Canonical Correlation ◽

Complex Traits ◽

Association Studies ◽

Tissue Expression ◽

Expression Levels ◽

Sparse Canonical Correlation Analysis ◽

Eqtl Data

AbstractTranscriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.Author summaryTranscriptome-wide association studies (TWAS) can improve the statistical power of genetic association studies by leveraging the relationship between genetically predicted transcript expression levels and an outcome. We propose a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. We generate cross-tissue expression features using sparse canonical correlation analysis and then combine evidence for expression-outcome association across cross- and single-tissue features using the aggregate Cauchy association test. We show that this approach has substantially higher power than traditional single-tissue TWAS methods. Application of these methods to publicly available summary statistics for ten complex traits also identifies associations missed by single-tissue methods.

Download Full-text

Investigating tissue-relevant causal molecular mechanisms of complex traits using probabilistic TWAS analysis

10.1101/808295 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yuhua Zhang ◽

Corbin Quick ◽

Ketian Yu ◽

Alvaro Barbeira ◽

Francesca Luca ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Large Scale ◽

Molecular Mechanisms ◽

Association Studies ◽

Complex Trait ◽

Causal Effects ◽

Biological Mechanisms ◽

Integrative Framework ◽

Eqtl Data

AbstractTranscriptome-wide association studies (TWAS), an integrative framework using expression quantitative trait loci (eQTLs) to construct proxies for gene expression, have emerged as a promising method to investigate the biological mechanisms underlying associations between genotypes and complex traits. However, challenges remain in interpreting TWAS results, especially regarding their causality implications. In this paper, we describe a new computational framework, probabilistic TWAS (PTWAS), to detect associations and investigate causal relationships between gene expression and complex traits. We use established concepts and principles from instrumental variables (IV) analysis to delineate and address the unique challenges that arise in TWAS. PTWAS utilizes probabilistic eQTL annotations derived from multi-variant Bayesian fine-mapping analysis conferring higher power to detect TWAS associations than existing methods. Additionally, PTWAS provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type specific causal effects of gene expression on complex traits. These features make PTWAS uniquely suited for in-depth investigations of the biological mechanisms that contribute to complex trait variation. Using eQTL data across 49 tissues from GTEx v8, we apply PTWAS to analyze 114 complex traits using GWAS summary statistics from several large-scale projects, including the UK Biobank. Our analysis reveals an abundance of genes with strong evidence of eQTL-mediated causal effects on complex traits and highlights the heterogeneity and tissue-relevance of these effects across complex traits. We distribute software and eQTL annotations to enable users performing rigorous TWAS analysis by leveraging the full potentials of the latest GTEx multi-tissue eQTL data.

Download Full-text

TIGAR: An Improved Bayesian Tool for Transcriptomic Data Imputation Enhances Gene Mapping of Complex Traits

10.1101/507525 ◽

2018 ◽

Cited By ~ 3

Author(s):

Sini Nagpal ◽

Xiaoran Meng ◽

Michael P. Epstein ◽

Lam C. Tsoi ◽

Matthew Patrick ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Bayesian Model ◽

Genetic Architecture ◽

Bayesian Method ◽

Association Studies ◽

Gwas Data ◽

Nonparametric Bayesian ◽

Transcriptomic Data ◽

Special Cases

AbstractThe transcriptome-wide association studies (TWAS) that test for association between the study trait and the imputed gene expression levels from cis-acting expression quantitative trait loci (cis-eQTL) genotypes have successfully enhanced the discovery of genetic risk loci for complex traits. By using the gene expression imputation models fitted from reference datasets that have both genetic and transcriptomic data, TWAS facilitates gene-based tests with GWAS data while accounting for the reference transcriptomic data. The existing TWAS tools like PrediXcan and FUSION use parametric imputation models that have limitations for modeling the complex genetic architecture of transcriptomic data. Therefore, we propose an improved Bayesian method that assumes a data-driven nonparametric prior to impute gene expression. Our method is general and flexible and includes both the parametric imputation models used by PrediXcan and FUSION as special cases. Our simulation studies showed that the nonparametric Bayesian model improved both imputation R2 for transcriptomic data and the TWAS power over PrediXcan. In real applications, our nonparametric Bayesian method fitted transcriptomic imputation models for 2X number of genes with 1.7X average regression R2 over PrediXcan, thus improving the power of follow-up TWAS. Hence, the nonparametric Bayesian model is preferred for modeling the complex genetic architecture of transcriptomes and is expected to enhance transcriptome-integrated genetic association studies. We implement our Bayesian approach in a convenient software tool “TIGAR” (Transcriptome-Integrated Genetic Association Resource), which imputes transcriptomic data and performs subsequent TWAS using individual-level or summary-level GWAS data.

Download Full-text

Leveraging DNA methylation quantitative trait loci to characterize the relationship between methylomic variation, gene expression and complex traits

10.1101/297176 ◽

2018 ◽

Author(s):

Eilis Hannon ◽

Tyler J Gorrie-Stone ◽

Melissa C Smart ◽

Joe Burrage ◽

Amanda Hughes ◽

...

Keyword(s):

Gene Expression ◽

Dna Methylation ◽

Genetic Variation ◽

Quantitative Trait Loci ◽

Quantitative Trait ◽

Complex Traits ◽

Common Genetic Variation ◽

Online Data ◽

The Relationship ◽

Methylation Quantitative Trait Loci

ABSTRACTCharacterizing the complex relationship between genetic, epigenetic and transcriptomic variation has the potential to increase understanding about the mechanisms underpinning health and disease phenotypes. In this study, we describe the most comprehensive analysis of common genetic variation on DNA methylation (DNAm) to date, using the Illumina EPIC array to profile samples from the UK Household Longitudinal study. We identified 12,689,548 significant DNA methylation quantitative trait loci (mQTL) associations (P < 6.52x10-14) occurring between 2,907,234 genetic variants and 93,268 DNAm sites, including a large number not identified using previous DNAm-profiling methods. We demonstrate the utility of these data for interpreting the functional consequences of common genetic variation associated with > 60 human traits, using Summary data–based Mendelian Randomization (SMR) to identify 1,662 pleiotropic associations between 36 complex traits and 1,246 DNAm sites. We also use SMR to characterize the relationship between DNAm and gene expression, identifying 6,798 pleiotropic associations between 5,420 DNAm sites and the transcription of 1,702 genes. Our mQTL database and SMR results are available via a searchable online database (http://www.epigenomicslab.com/online-data-resources/) as a resource to the research community.

Download Full-text

A transcriptome-wide Mendelian randomization study to uncover tissue-dependent regulatory mechanisms across the human phenome

10.1101/563379 ◽

2019 ◽

Cited By ~ 2

Author(s):

Tom G Richardson ◽

Gibran Hemani ◽

Tom R Gaunt ◽

Caroline L Relton ◽

George Davey Smith

Keyword(s):

Gene Expression ◽

Genetic Variants ◽

Complex Traits ◽

Mendelian Randomization ◽

Drug Repositioning ◽

Association Studies ◽

Thyroid Tissue ◽

Genome Wide Association Studies ◽

Tissue Specific ◽

Genome Wide

AbstractBackgroundDeveloping insight into tissue-specific transcriptional mechanisms can help improve our understanding of how genetic variants exert their effects on complex traits and disease. By applying the principles of Mendelian randomization, we have undertaken a systematic analysis to evaluate transcriptome-wide associations between gene expression across 48 different tissue types and 395 complex traits.ResultsOverall, we identified 100,025 gene-trait associations based on conventional genome-wide corrections (P < 5 × 10−08) that also provided evidence of genetic colocalization. These results indicated that genetic variants which influence gene expression levels in multiple tissues are more likely to influence multiple complex traits. We identified many examples of tissue-specific effects, such as genetically-predicted TPO, NR3C2 and SPATA13 expression only associating with thyroid disease in thyroid tissue. Additionally, FBN2 expression was associated with both cardiovascular and lung function traits, but only when analysed in heart and lung tissue respectively.We also demonstrate that conducting phenome-wide evaluations of our results can help flag adverse on-target side effects for therapeutic intervention, as well as propose drug repositioning opportunities. Moreover, we find that exploring the tissue-dependency of associations identified by genome-wide association studies (GWAS) can help elucidate the causal genes and tissues responsible for effects, as well as uncover putative novel associations.ConclusionsThe atlas of tissue-dependent associations we have constructed should prove extremely valuable to future studies investigating the genetic determinants of complex disease. The follow-up analyses we have performed in this study are merely a guide for future research. Conducting similar evaluations can be undertaken systematically at http://mrcieu.mrsoftware.org/Tissue_MR_atlas/.

Download Full-text

The Y Chromosome: A Complex Locus for Genetic Analyses of Complex Human Traits

Genes ◽

10.3390/genes11111273 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1273

Author(s):

Katherine Parker ◽

A. Mesut Erzurumluoglu ◽

Santiago Rodriguez

Keyword(s):

Population Genetics ◽

Genetic Variation ◽

Y Chromosome ◽

Complex Traits ◽

Association Studies ◽

Epidemiological Studies ◽

Complex Locus ◽

Complex Human Traits ◽

Human Complex ◽

Human Y Chromosome

The Human Y chromosome (ChrY) has been demonstrated to be a powerful tool for phylogenetics, population genetics, genetic genealogy and forensics. However, the importance of ChrY genetic variation in relation to human complex traits is less clear. In this review, we summarise existing evidence about the inherent complexities of ChrY variation and their use in association studies of human complex traits. We present and discuss the specific particularities of ChrY genetic variation, including Y chromosomal haplogroups, that need to be considered in the design and interpretation of genetic epidemiological studies involving ChrY.

Download Full-text

Finding the molecular basis of complex genetic variation in humans and mice

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2005.1798 ◽

2006 ◽

Vol 361 (1467) ◽

pp. 393-401 ◽

Cited By ~ 13

Author(s):

Richard Mott

Keyword(s):

Genetic Variation ◽

Association Mapping ◽

Molecular Basis ◽

State Of The Art ◽

Complex Trait ◽

The State ◽

Rodent Model ◽

Model Systems ◽

Trait Analysis

I survey the state of the art in complex trait analysis, including the use of new experimental and computational technologies and resources becoming available, and the challenges facing us. I also discuss how the prospects of rodent model systems compare with association mapping in humans.

Download Full-text

Prioritization of SNPs for genome-wide association studies using an interaction model of genetic variation, gene expression, and trait variation

Molecules and Cells ◽

10.1007/s10059-012-2264-7 ◽

2012 ◽

Vol 33 (4) ◽

pp. 351-361 ◽

Cited By ~ 1

Author(s):

Hyojung Paik ◽

Junho Kim ◽

Sunjae Lee ◽

Hyoung-Sam Heo ◽

Cheol-Goo Hur ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variation ◽

Association Studies ◽

Interaction Model ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Trait Variation ◽

Genome Wide

Download Full-text

Imputed gene associations identify replicable trans-acting genes enriched in transcription pathways and complex traits

10.1101/471748 ◽

2018 ◽

Cited By ~ 1

Author(s):

Heather E. Wheeler ◽

Sally Ploch ◽

Alvaro N. Barbeira ◽

Rodrigo Bonazzola ◽

Angela Andaleon ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variation ◽

Complex Traits ◽

Target Gene ◽

Target Genes ◽

Regulation Of Gene Expression ◽

Nucleic Acid Binding ◽

Expression Of Genes ◽

Gene Associations ◽

Made In

AbstractRegulation of gene expression is an important mechanism through which genetic variation can affect complex traits. A substantial portion of gene expression variation can be explained by both local (cis) and distal (trans) genetic variation. Much progress has been made in uncovering cis-acting expression quantitative trait loci (cis-eQTL), but trans-eQTL have been more difficult to identify and replicate. Here we take advantage of our ability to predict the cis component of gene expression coupled with gene mapping methods such as PrediXcan to identify high confidence candidate trans-acting genes and their targets. That is, we correlate the cis component of gene expression with observed expression of genes in different chromosomes. Leveraging the shared cis-acting regulation across tissues, we combine the evidence of association across all available GTEx tissues and find 2356 trans-acting/target gene pairs with high mappability scores. Reassuringly, trans-acting genes are enriched in transcription and nucleic acid binding pathways and target genes are enriched in known transcription factor binding sites. Interestingly, trans-acting genes are more significantly associated with selected complex traits and diseases than target or background genes, consistent with percolating trans effects. Our scripts and summary statistics are publicly available for future studies of trans-acting gene regulation.

Download Full-text