scholarly journals PhenomeXcan: Mapping the genome to the phenome through the transcriptome

2020 ◽  
Vol 6 (37) ◽  
pp. eaba2083 ◽  
Milton Pividori ◽  
Padma S. Rajagopal ◽  
Alvaro Barbeira ◽  
Yanyu Liang ◽  
Owen Melia ◽  

Large-scale genomic and transcriptomic initiatives offer unprecedented insight into complex traits, but clinical translation remains limited by variant-level associations without biological context and lack of analytic resources. Our resource, PhenomeXcan, synthesizes 8.87 million variants from genome-wide association study summary statistics on 4091 traits with transcriptomic data from 49 tissues in Genotype-Tissue Expression v8 into a gene-based, queryable platform including 22,515 genes. We developed a novel Bayesian colocalization method, fast enrichment estimation aided colocalization analysis (fastENLOC), to prioritize likely causal gene-trait associations. We successfully replicate associations from the phenome-wide association studies (PheWAS) catalog Online Mendelian Inheritance in Man, and an evidence-based curated gene list. Using PhenomeXcan results, we provide examples of novel and underreported genome-to-phenome associations, complex gene-trait clusters, shared causal genes between common and rare diseases via further integration of PhenomeXcan with ClinVar, and potential therapeutic targets. PhenomeXcan ( provides broad, user-friendly access to complex data for translational researchers.

2019 ◽  
Milton Pividori ◽  
Padma S. Rajagopal ◽  
Alvaro Barbeira ◽  
Yanyu Liang ◽  
Owen Melia ◽  

AbstractLarge-scale genomic and transcriptomic initiatives offer unprecedented ability to study the biology of complex traits and identify target genes for precision prevention or therapy. Translation to clinical contexts, however, has been slow and challenging due to lack of biological context for identified variant-level associations. Moreover, many translational researchers lack the computational or analytic infrastructures required to fully use these resources. We integrate genome-wide association study (GWAS) summary statistics from multiple publicly available sources and data from Genotype-Tissue Expression (GTEx) v8 using PrediXcan and provide a user-friendly platform for translational researchers based on state-of-the-art algorithms. We develop a novel Bayesian colocalization method, fastENLOC, to prioritize the most likely causal gene-trait associations. Our resource, PhenomeXcan, synthesizes 8.87 million variants from GWAS on 4,091 traits with transcriptome regulation data from 49 tissues in GTEx v8 into an innovative, gene-based resource including 22,255 genes. Across the entire genome/phenome space, we find 65,603 significant associations (Bonferroni-corrected p-value of 5.5 × 10−10), where 19,579 (29.8 percent) were colocalized (locus regional colocalization probability > 0.1). We successfully replicate associations from PheWAS Catalog (AUC=0.61) and OMIM (AUC=0.64). We provide examples of (a) finding novel and underreported genome-to-phenome associations, (b) exploring complex gene-trait clusters within PhenomeXcan, (c) studying phenome-to-phenome relationships between common and rare diseases via further integration of PhenomeXcan with ClinVar, and (d) evaluating potential therapeutic targets. PhenomeXcan ( broadens access to complex genomic and transcriptomic data and empowers translational researchers.One-Sentence SummaryPhenomeXcan is a gene-based resource of gene-trait associations with biological context that supports translational research.

2021 ◽  
Basel M Al-Barghouthi ◽  
Will T Rosenow ◽  
Kang-Ping Du ◽  
Jinho Heo ◽  
Robert Maynard ◽  

Genome-wide association studies (GWASs) for bone mineral density (BMD) have identified over 1,100 associations to date. However, identifying causal genes implicated by such studies has been challenging. Recent advances in the development of transcriptome reference datasets and computational approaches such as transcriptome-wide association studies (TWASs) and expression quantitative trait loci (eQTL) colocalization have proven to be informative in identifying putatively causal genes underlying GWAS associations. Here, we used TWAS/eQTL colocalization in conjunction with transcriptomic data from the Genotype-Tissue Expression (GTEx) project to identify potentially causal genes for the largest BMD GWAS performed to date. Using this approach, we identified 512 genes as significant (Bonferroni <= 0.05) using both TWAS and eQTL colocalization. This set of genes was enriched for regulators of BMD and members of bone relevant biological processes. To investigate the significance of our findings, we selected PPP6R3, the gene with the strongest support from our analysis which was not previously implicated in the regulation of BMD, for further investigation. We observed that Ppp6r3 deletion in mice decreased BMD. In this work, we provide an updated resource of putatively causal BMD genes and demonstrate that PPP6R3 is a putatively causal BMD GWAS gene. These data increase our understanding of the genetics of BMD and provide further evidence for the utility of combined TWAS/colocalization approaches in untangling the genetics of complex traits.

2021 ◽  
Vol 22 (1) ◽  
Alvaro N. Barbeira ◽  
Rodrigo Bonazzola ◽  
Eric R. Gamazon ◽  
Yanyu Liang ◽  

AbstractThe resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.

2018 ◽  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

2021 ◽  
Vol 12 ◽  
Martina Rauner ◽  
Ines Foessl ◽  
Melissa M. Formosa ◽  
Erika Kague ◽  
Vid Prijatelj ◽  

The availability of large human datasets for genome-wide association studies (GWAS) and the advancement of sequencing technologies have boosted the identification of genetic variants in complex and rare diseases in the skeletal field. Yet, interpreting results from human association studies remains a challenge. To bridge the gap between genetic association and causality, a systematic functional investigation is necessary. Multiple unknowns exist for putative causal genes, including cellular localization of the molecular function. Intermediate traits (“endophenotypes”), e.g. molecular quantitative trait loci (molQTLs), are needed to identify mechanisms of underlying associations. Furthermore, index variants often reside in non-coding regions of the genome, therefore challenging for interpretation. Knowledge of non-coding variance (e.g. ncRNAs), repetitive sequences, and regulatory interactions between enhancers and their target genes is central for understanding causal genes in skeletal conditions. Animal models with deep skeletal phenotyping and cell culture models have already facilitated fine mapping of some association signals, elucidated gene mechanisms, and revealed disease-relevant biology. However, to accelerate research towards bridging the current gap between association and causality in skeletal diseases, alternative in vivo platforms need to be used and developed in parallel with the current -omics and traditional in vivo resources. Therefore, we argue that as a field we need to establish resource-sharing standards to collectively address complex research questions. These standards will promote data integration from various -omics technologies and functional dissection of human complex traits. In this mission statement, we review the current available resources and as a group propose a consensus to facilitate resource sharing using existing and future resources. Such coordination efforts will maximize the acquisition of knowledge from different approaches and thus reduce redundancy and duplication of resources. These measures will help to understand the pathogenesis of osteoporosis and other skeletal diseases towards defining new and more efficient therapeutic targets.

Elle M Weeks ◽  
Jacob C Ulirsch ◽  
Nathan Y Cheng ◽  
Brian L Trippe ◽  
Rebecca S Fine ◽  

Genome-wide association studies (GWAS) are a valuable tool for understanding the biology of complex traits, but the associations found rarely point directly to causal genes. Here, we introduce a new method to identify the causal genes by integrating GWAS summary statistics with gene expression, biological pathway, and predicted protein-protein interaction data. We further propose an approach that effectively leverages both polygenic and locus-specific genetic signals by combining results across multiple gene prioritization methods, increasing confidence in prioritized genes. Using a large set of gold standard genes to evaluate our approach, we prioritize 8,402 unique gene-trait pairs with greater than 75% estimated precision across 113 complex traits and diseases, including known genes such as SORT1 for LDL cholesterol, SMIM1 for red blood cell count, and DRD2 for schizophrenia, as well as novel genes such as TTC39B for cholelithiasis. Our results demonstrate that a polygenic approach is a powerful tool for gene prioritization and, in combination with locus-specific signal, improves upon existing methods.

2020 ◽  
Helian Feng ◽  
Nicholas Mancuso ◽  
Alexander Gusev ◽  
Arunabha Majumdar ◽  
Megan Major ◽  

AbstractTranscriptome-wide association studies (TWAS) test the association between traits and genetically predicted gene expression levels. The power of a TWAS depends in part on the strength of the correlation between a genetic predictor of gene expression and the causally relevant gene expression values. Consequently, TWAS power can be low when expression quantitative trait locus (eQTL) data used to train the genetic predictors have small sample sizes, or when data from causally relevant tissues are not available. Here, we propose to address these issues by integrating multiple tissues in the TWAS using sparse canonical correlation analysis (sCCA). We show that sCCA-TWAS combined with single-tissue TWAS using an aggregate Cauchy association test (ACAT) outperforms traditional single-tissue TWAS. In empirically motivated simulations, the sCCA+ACAT approach yielded the highest power to detect a gene associated with phenotype, even when expression in the causal tissue was not directly measured, while controlling the Type I error when there is no association between gene expression and phenotype. For example, when gene expression explains 2% of the variability in outcome, and the GWAS sample size is 20,000, the average power difference between the ACAT combined test of sCCA features and single-tissue, versus single-tissue combined with Generalized Berk-Jones (GBJ) method, single-tissue combined with S-MultiXcan or summarizing cross-tissue expression patterns using Principal Component Analysis (PCA) approaches was 5%, 8%, and 38%, respectively. The gain in power is likely due to sCCA cross-tissue features being more likely to be detectably heritable. When applied to publicly available summary statistics from 10 complex traits, the sCCA+ACAT test was able to increase the number of testable genes and identify on average an additional 400 additional gene-trait associations that single-trait TWAS missed. Our results suggest that aggregating eQTL data across multiple tissues using sCCA can improve the sensitivity of TWAS while controlling for the false positive rate.Author summaryTranscriptome-wide association studies (TWAS) can improve the statistical power of genetic association studies by leveraging the relationship between genetically predicted transcript expression levels and an outcome. We propose a new TWAS pipeline that integrates data on the genetic regulation of expression levels across multiple tissues. We generate cross-tissue expression features using sparse canonical correlation analysis and then combine evidence for expression-outcome association across cross- and single-tissue features using the aggregate Cauchy association test. We show that this approach has substantially higher power than traditional single-tissue TWAS methods. Application of these methods to publicly available summary statistics for ten complex traits also identifies associations missed by single-tissue methods.

2019 ◽  
Yuhua Zhang ◽  
Corbin Quick ◽  
Ketian Yu ◽  
Alvaro Barbeira ◽  
Francesca Luca ◽  

AbstractTranscriptome-wide association studies (TWAS), an integrative framework using expression quantitative trait loci (eQTLs) to construct proxies for gene expression, have emerged as a promising method to investigate the biological mechanisms underlying associations between genotypes and complex traits. However, challenges remain in interpreting TWAS results, especially regarding their causality implications. In this paper, we describe a new computational framework, probabilistic TWAS (PTWAS), to detect associations and investigate causal relationships between gene expression and complex traits. We use established concepts and principles from instrumental variables (IV) analysis to delineate and address the unique challenges that arise in TWAS. PTWAS utilizes probabilistic eQTL annotations derived from multi-variant Bayesian fine-mapping analysis conferring higher power to detect TWAS associations than existing methods. Additionally, PTWAS provides novel functionalities to evaluate the causal assumptions and estimate tissue- or cell-type specific causal effects of gene expression on complex traits. These features make PTWAS uniquely suited for in-depth investigations of the biological mechanisms that contribute to complex trait variation. Using eQTL data across 49 tissues from GTEx v8, we apply PTWAS to analyze 114 complex traits using GWAS summary statistics from several large-scale projects, including the UK Biobank. Our analysis reveals an abundance of genes with strong evidence of eQTL-mediated causal effects on complex traits and highlights the heterogeneity and tissue-relevance of these effects across complex traits. We distribute software and eQTL annotations to enable users performing rigorous TWAS analysis by leveraging the full potentials of the latest GTEx multi-tissue eQTL data.

2021 ◽  
Alex N. Nguyen Ba ◽  
Katherine R. Lawrence ◽  
Artur Rego-Costa ◽  
Shreyas Gopalakrishnan ◽  
Daniel Temko ◽  

Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.Significance statementUnderstanding the genetic basis of important phenotypes is a central goal of genetics. However, the highly polygenic architectures of complex traits inferred by large-scale genome-wide association studies (GWAS) in humans stand in contrast to the results of quantitative trait locus (QTL) mapping studies in model organisms. Here, we use a barcoding approach to conduct QTL mapping in budding yeast at a scale two orders of magnitude larger than the previous state of the art. The resulting increase in power reveals the polygenic nature of complex traits in yeast, and offers insight into widespread patterns of pleiotropy and epistasis. Our data and analysis methods offer opportunities for future work in systems biology, and have implications for large-scale GWAS in human populations.

Sign in / Sign up

Export Citation Format

Share Document