Exploiting the GTEx resources to decipher the mechanisms at GWAS loci

AbstractThe resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.

Download Full-text

Exploiting the GTEx resources to decipher the mechanisms at GWAS loci

10.1101/814350 ◽

2019 ◽

Cited By ~ 17

Author(s):

Alvaro N Barbeira ◽

Rodrigo Bonazzola ◽

Eric R Gamazon ◽

Yanyu Liang ◽

YoSon Park ◽

...

Keyword(s):

Complex Traits ◽

Target Genes ◽

Genome Wide Association Study ◽

Data Driven ◽

Functional Interpretation ◽

Transcriptome Regulation ◽

Genome Wide ◽

Causal Genes ◽

Dose Dependent ◽

Single Approach

Download Full-text

Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity

10.1101/2021.08.02.21261488 ◽

2021 ◽

Author(s):

Steven Gazal ◽

Omer Weissbrod ◽

Farhad Hormozdiari ◽

Kushal Dey ◽

Joseph Nasser ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Target Genes ◽

Disease Risk ◽

Association Studies ◽

Common Disease ◽

Disease Genes ◽

Genome Wide Association Studies ◽

Functional Interpretation ◽

Genome Wide

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies.

Download Full-text

Trait Association and Prediction Through Integrative K-mer Analysis

10.1101/2021.11.17.468725 ◽

2021 ◽

Author(s):

Cheng He ◽

Jacob D Washburn ◽

Yangfan Hao ◽

Zhiwu Zhang ◽

Jinliang Yang ◽

...

Keyword(s):

Complex Traits ◽

Genome Wide Association Study ◽

Leaf Angle ◽

Phenotypic Traits ◽

Nucleotide Polymorphisms ◽

Kernel Oil ◽

Kernel Color ◽

Maize Populations ◽

Genome Wide ◽

Causal Genes

Genome-wide association study (GWAS) with single nucleotide polymorphisms (SNPs) has been widely used to explore genetic controls of phenotypic traits. Here we employed an GWAS approach using k-mers, short substrings from sequencing reads. Using maize cob and kernel color traits, we demonstrated that k-mer GWAS can effectively identify associated k-mers. Co-expression analysis of kernel color k-mers and pathway genes directly found k-mers from causal genes. Analyzing complex traits of kernel oil and leaf angle resulted in k-mers from both known and candidate genes. Evolution analysis revealed most k-mers positively correlated with kernel oil were strongly selected against in maize populations, while most k-mers for upright leaf angle were positively selected. In addition, phenotypic prediction of kernel oil, leaf angle, and flowering time using k-mer data showed at least a similarly high prediction accuracy to the standard SNP-based method. Collectively, our results demonstrated the bridging role of k-mers for data integration and functional gene discovery.

Download Full-text

A powerful and efficient two-stage method for detecting gene-to-gene interactions in GWAS

Biostatistics ◽

10.1093/biostatistics/kxw060 ◽

2017 ◽

Vol 18 (3) ◽

pp. 477-494 ◽

Cited By ~ 5

Author(s):

Jakub Pecanka ◽

Marianne A. Jonker ◽

Zoltan Bochdanovits ◽

Aad W. Van Der Vaart ◽

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Statistical Power ◽

Genome Wide Association Study ◽

Score Test ◽

Interaction Model ◽

Type I ◽

Two Stage ◽

Genome Wide ◽

Strong Control

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.

Download Full-text

Perspective of the GEMSTONE Consortium on Current and Future Approaches to Functional Validation for Skeletal Genetic Disease Using Cellular, Molecular and Animal-Modeling Techniques

Frontiers in Endocrinology ◽

10.3389/fendo.2021.731217 ◽

2021 ◽

Vol 12 ◽

Author(s):

Martina Rauner ◽

Ines Foessl ◽

Melissa M. Formosa ◽

Erika Kague ◽

Vid Prijatelj ◽

...

Keyword(s):

Resource Sharing ◽

Complex Traits ◽

Cellular Localization ◽

Target Genes ◽

Mission Statement ◽

Association Studies ◽

Repetitive Sequences ◽

Genome Wide Association Studies ◽

Causal Genes

The availability of large human datasets for genome-wide association studies (GWAS) and the advancement of sequencing technologies have boosted the identification of genetic variants in complex and rare diseases in the skeletal field. Yet, interpreting results from human association studies remains a challenge. To bridge the gap between genetic association and causality, a systematic functional investigation is necessary. Multiple unknowns exist for putative causal genes, including cellular localization of the molecular function. Intermediate traits (“endophenotypes”), e.g. molecular quantitative trait loci (molQTLs), are needed to identify mechanisms of underlying associations. Furthermore, index variants often reside in non-coding regions of the genome, therefore challenging for interpretation. Knowledge of non-coding variance (e.g. ncRNAs), repetitive sequences, and regulatory interactions between enhancers and their target genes is central for understanding causal genes in skeletal conditions. Animal models with deep skeletal phenotyping and cell culture models have already facilitated fine mapping of some association signals, elucidated gene mechanisms, and revealed disease-relevant biology. However, to accelerate research towards bridging the current gap between association and causality in skeletal diseases, alternative in vivo platforms need to be used and developed in parallel with the current -omics and traditional in vivo resources. Therefore, we argue that as a field we need to establish resource-sharing standards to collectively address complex research questions. These standards will promote data integration from various -omics technologies and functional dissection of human complex traits. In this mission statement, we review the current available resources and as a group propose a consensus to facilitate resource sharing using existing and future resources. Such coordination efforts will maximize the acquisition of knowledge from different approaches and thus reduce redundancy and duplication of resources. These measures will help to understand the pathogenesis of osteoporosis and other skeletal diseases towards defining new and more efficient therapeutic targets.

Download Full-text

Genome-wide association study identifies 48 common genetic variants associated with handedness

10.1101/831321 ◽

2019 ◽

Author(s):

Gabriel Cuellar Partida ◽

Joyce Y Tung ◽

Nicholas Eriksson ◽

Eva Albrecht ◽

Fazil Aliev ◽

...

Keyword(s):

Association Study ◽

Genetic Variants ◽

Complex Traits ◽

Genome Wide Association Study ◽

Genetic Correlations ◽

Genome Wide Association ◽

Left Handedness ◽

Left Handed ◽

Genome Wide ◽

Common Genetic Variants

AbstractHandedness, a consistent asymmetry in skill or use of the hands, has been studied extensively because of its relationship with language and the over-representation of left-handers in some neurodevelopmental disorders. Using data from the UK Biobank, 23andMe and 32 studies from the International Handedness Consortium, we conducted the world’s largest genome-wide association study of handedness (1,534,836 right-handed, 194,198 (11.0%) left-handed and 37,637 (2.1%) ambidextrous individuals). We found 41 genetic loci associated with left-handedness and seven associated with ambidexterity at genome-wide levels of significance (P < 5×10−8). Tissue enrichment analysis implicated the central nervous system and brain tissues including the hippocampus and cerebrum in the etiology of left-handedness. Pathways including regulation of microtubules, neurogenesis, axonogenesis and hippocampus morphology were also highlighted. We found suggestive positive genetic correlations between being left-handed and some neuropsychiatric traits including schizophrenia and bipolar disorder. SNP heritability analyses indicated that additive genetic effects of genotyped variants explained 5.9% (95% CI = 5.8% – 6.0%) of the underlying liability of being left-handed, while the narrow sense heritability was estimated at 12% (95% CI = 7.2% – 17.7%). Further, we show that genetic correlation between left-handedness and ambidexterity is low (rg = 0.26; 95% CI = 0.08 – 0.43) implying that these traits are largely influenced by different genetic mechanisms. In conclusion, our findings suggest that handedness, like many other complex traits is highly polygenic, and that the genetic variants that predispose to left-handedness may underlie part of the association with some psychiatric disorders that has been observed in multiple observational studies.

Download Full-text

Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases

10.1101/2020.09.08.20190561 ◽

2020 ◽

Cited By ~ 1

Author(s):

Elle M Weeks ◽

Jacob C Ulirsch ◽

Nathan Y Cheng ◽

Brian L Trippe ◽

Rebecca S Fine ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

Gene Prioritization ◽

Protein Interaction Data ◽

Large Set ◽

Genome Wide Association Studies ◽

Protein Protein Interaction ◽

Genome Wide ◽

Causal Genes ◽

Red Blood Cell Count

Genome-wide association studies (GWAS) are a valuable tool for understanding the biology of complex traits, but the associations found rarely point directly to causal genes. Here, we introduce a new method to identify the causal genes by integrating GWAS summary statistics with gene expression, biological pathway, and predicted protein-protein interaction data. We further propose an approach that effectively leverages both polygenic and locus-specific genetic signals by combining results across multiple gene prioritization methods, increasing confidence in prioritized genes. Using a large set of gold standard genes to evaluate our approach, we prioritize 8,402 unique gene-trait pairs with greater than 75% estimated precision across 113 complex traits and diseases, including known genes such as SORT1 for LDL cholesterol, SMIM1 for red blood cell count, and DRD2 for schizophrenia, as well as novel genes such as TTC39B for cholelithiasis. Our results demonstrate that a polygenic approach is a powerful tool for gene prioritization and, in combination with locus-specific signal, improves upon existing methods.

Download Full-text

A compendium of uniformly processed human gene expression and splicing quantitative trait loci

Nature Genetics ◽

10.1038/s41588-021-00924-w ◽

2021 ◽

Vol 53 (9) ◽

pp. 1290-1299

Author(s):

Nurlan Kerimov ◽

James D. Hayhurst ◽

Kateryna Peikova ◽

Jonathan R. Manning ◽

Peter Walter ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Trait ◽

Target Genes ◽

Genome Wide Association Study ◽

Cell Types ◽

Summary Statistics ◽

Genome Wide ◽

Cell Type Specific ◽

Trait Locus ◽

Complex Human Traits

AbstractMany gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue (https://www.ebi.ac.uk/eqtl), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.

Download Full-text

Bayes Factor-Based Regulatory Gene Network Analysis of Genome-Wide Association Study of Economic Traits in a Purebred Swine Population

Genes ◽

10.3390/genes10040293 ◽

2019 ◽

Vol 10 (4) ◽

pp. 293 ◽

Cited By ~ 3

Author(s):

Lee ◽

Kang ◽

Kim

Keyword(s):

Association Study ◽

Gene Network ◽

Bayes Factor ◽

Target Genes ◽

Genome Wide Association Study ◽

Function Analysis ◽

Regulatory Gene ◽

Genome Wide Association ◽

Regulatory Sequence ◽

Genome Wide

: Early stage prediction of economic trait performance is important and directly linked to profitability of farm pig production. Genome-wide association study (GWAS) has been applied to find causative genomic regions of traits. This study established a regulatory gene network using GWAS for critical economic pig characteristics, centered on easily measurable body fat thickness in live animals. We genotyped 2,681 pigs using Illumina Porcine SNP60, followed by GWAS to calculate Bayes factors for 47,697 single nucleotide polymorphisms (SNPs) of seven traits. Using this information, SNPs were annotated with specific genes near genome locations to establish the association weight matrix. The entire network consisted of 226 nodes and 6,921 significant edges. For in silico validation of their interactions, we conducted regulatory sequence analysis of predicted target genes of transcription factors (TFs). Three key regulatory TFs were identified to guarantee maximum coverage: AT-rich interaction domain 3B (ARID3B), glial cell missing homolog 1 (GCM1), and GLI family zinc finger 2 (GLI2). We identified numerous genes targeted by ARID3B, associated with cellular processes. GCM1 and GLI2 were involved in developmental processes, and their shared target genes regulated multicellular organismal process. This system biology-based function analysis might contribute to enhancing understanding of economic pig traits.

Download Full-text

Optimizing the Power to Identify the Genetic Basis of Complex Traits with Evolve and Resequence Studies

Molecular Biology and Evolution ◽

10.1093/molbev/msz183 ◽

2019 ◽

Vol 36 (12) ◽

pp. 2890-2905 ◽

Cited By ~ 2

Author(s):

Christos Vlachos ◽

Robert Kofler

Keyword(s):

Complex Traits ◽

Quantitative Traits ◽

Genetic Basis ◽

Genome Wide Association Study ◽

Selection Regime ◽

Genome Wide ◽

A Genome ◽

Higher Power ◽

Powerful Approach ◽

Next Generation Sequencing Ngs

Abstract Evolve and resequence (E&R) studies are frequently used to dissect the genetic basis of quantitative traits. By subjecting a population to truncating selection for several generations and estimating the allele frequency differences between selected and nonselected populations using next-generation sequencing (NGS), the loci contributing to the selected trait may be identified. The role of different parameters, such as, the population size or the number of replicate populations has been examined in previous works. However, the influence of the selection regime, that is the strength of truncating selection during the experiment, remains little explored. Using whole genome, individual based forward simulations of E&R studies, we found that the power to identify the causative alleles may be maximized by gradually increasing the strength of truncating selection during the experiment. Notably, such an optimal selection regime comes at no or little additional cost in terms of sequencing effort and experimental time. Interestingly, we also found that a selection regime which optimizes the power to identify the causative loci is not necessarily identical to a regime that maximizes the phenotypic response. Finally, our simulations suggest that an E&R study with an optimized selection regime may have a higher power to identify the genetic basis of quantitative traits than a genome-wide association study, highlighting that E&R is a powerful approach for finding the loci underlying complex traits.

Download Full-text