Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases

Genome-wide association studies (GWAS) are a valuable tool for understanding the biology of complex traits, but the associations found rarely point directly to causal genes. Here, we introduce a new method to identify the causal genes by integrating GWAS summary statistics with gene expression, biological pathway, and predicted protein-protein interaction data. We further propose an approach that effectively leverages both polygenic and locus-specific genetic signals by combining results across multiple gene prioritization methods, increasing confidence in prioritized genes. Using a large set of gold standard genes to evaluate our approach, we prioritize 8,402 unique gene-trait pairs with greater than 75% estimated precision across 113 complex traits and diseases, including known genes such as SORT1 for LDL cholesterol, SMIM1 for red blood cell count, and DRD2 for schizophrenia, as well as novel genes such as TTC39B for cholelithiasis. Our results demonstrate that a polygenic approach is a powerful tool for gene prioritization and, in combination with locus-specific signal, improves upon existing methods.

Download Full-text

Genome-Wide Association Studies of CKD and Related Traits

Clinical Journal of the American Society of Nephrology ◽

10.2215/cjn.00020120 ◽

2020 ◽

Vol 15 (11) ◽

pp. 1643-1656

Author(s):

Adrienne Tin ◽

Anna Köttgen

Keyword(s):

Kidney Function ◽

Complex Traits ◽

Kidney Diseases ◽

Association Studies ◽

Genome Wide Association ◽

Model Organisms ◽

Genome Wide Association Studies ◽

Genetic Loci ◽

Genome Wide ◽

Causal Genes

The past few years have seen major advances in genome-wide association studies (GWAS) of CKD and kidney function–related traits in several areas: increases in sample size from >100,000 to >1 million, enabling the discovery of >250 associated genetic loci that are highly reproducible; the inclusion of participants not only of European but also of non-European ancestries; and the use of advanced computational methods to integrate additional genomic and other unbiased, high-dimensional data to characterize the underlying genetic architecture and prioritize potentially causal genes and variants. Together with other large-scale biobank and genetic association studies of complex traits, these GWAS of kidney function–related traits have also provided novel insight into the relationship of kidney function to other diseases with respect to their genetic associations, genetic correlation, and directional relationships. A number of studies also included functional experiments using model organisms or cell lines to validate prioritized potentially causal genes and/or variants. In this review article, we will summarize these recent GWAS of CKD and kidney function–related traits, explain approaches for downstream characterization of associated genetic loci and the value of such computational follow-up analyses, and discuss related challenges along with potential solutions to ultimately enable improved treatment and prevention of kidney diseases through genetics.

Download Full-text

Transcriptome-wide Association Study and eQTL colocalization identify potentially causal genes responsible for bone mineral density GWAS associations

10.1101/2021.10.12.464046 ◽

2021 ◽

Author(s):

Basel M Al-Barghouthi ◽

Will T Rosenow ◽

Kang-Ping Du ◽

Jinho Heo ◽

Robert Maynard ◽

...

Keyword(s):

Bone Mineral Density ◽

Bone Mineral ◽

Complex Traits ◽

Association Studies ◽

Tissue Expression ◽

Genome Wide Association Studies ◽

Biological Processes ◽

Mineral Density ◽

Genome Wide ◽

Causal Genes

Genome-wide association studies (GWASs) for bone mineral density (BMD) have identified over 1,100 associations to date. However, identifying causal genes implicated by such studies has been challenging. Recent advances in the development of transcriptome reference datasets and computational approaches such as transcriptome-wide association studies (TWASs) and expression quantitative trait loci (eQTL) colocalization have proven to be informative in identifying putatively causal genes underlying GWAS associations. Here, we used TWAS/eQTL colocalization in conjunction with transcriptomic data from the Genotype-Tissue Expression (GTEx) project to identify potentially causal genes for the largest BMD GWAS performed to date. Using this approach, we identified 512 genes as significant (Bonferroni <= 0.05) using both TWAS and eQTL colocalization. This set of genes was enriched for regulators of BMD and members of bone relevant biological processes. To investigate the significance of our findings, we selected PPP6R3, the gene with the strongest support from our analysis which was not previously implicated in the regulation of BMD, for further investigation. We observed that Ppp6r3 deletion in mice decreased BMD. In this work, we provide an updated resource of putatively causal BMD genes and demonstrate that PPP6R3 is a putatively causal BMD GWAS gene. These data increase our understanding of the genetics of BMD and provide further evidence for the utility of combined TWAS/colocalization approaches in untangling the genetics of complex traits.

Download Full-text

A practical view of fine-mapping and gene prioritization in the post-genome-wide association era

Open Biology ◽

10.1098/rsob.190221 ◽

2020 ◽

Vol 10 (1) ◽

pp. 190221 ◽

Cited By ~ 8

Author(s):

R. V. Broekema ◽

O. B. Bakker ◽

I. H. Jonkers

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Association Studies ◽

Population Based ◽

Gene Prioritization ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Underlying Mechanisms ◽

The Impact

Over the past 15 years, genome-wide association studies (GWASs) have enabled the systematic identification of genetic loci associated with traits and diseases. However, due to resolution issues and methodological limitations, the true causal variants and genes associated with traits remain difficult to identify. In this post-GWAS era, many biological and computational fine-mapping approaches now aim to solve these issues. Here, we review fine-mapping and gene prioritization approaches that, when combined, will improve the understanding of the underlying mechanisms of complex traits and diseases. Fine-mapping of genetic variants has become increasingly sophisticated: initially, variants were simply overlapped with functional elements, but now the impact of variants on regulatory activity and direct variant-gene 3D interactions can be identified. Moreover, gene manipulation by CRISPR/Cas9, the identification of expression quantitative trait loci and the use of co-expression networks have all increased our understanding of the genes and pathways affected by GWAS loci. However, despite this progress, limitations including the lack of cell-type- and disease-specific data and the ever-increasing complexity of polygenic models of traits pose serious challenges. Indeed, the combination of fine-mapping and gene prioritization by statistical, functional and population-based strategies will be necessary to truly understand how GWAS loci contribute to complex traits and diseases.

Download Full-text

Faculty Opinions recommendation of Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733803377.793550136 ◽

2018 ◽

Author(s):

Mohan Liu

Keyword(s):

Effect Size ◽

Complex Traits ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Size Distributions ◽

Complex Effect ◽

Genome Wide ◽

Level Statistics

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.0569 ◽

2016 ◽

Vol 283 (1835) ◽

pp. 20160569 ◽

Cited By ~ 52

Author(s):

M. E. Goddard ◽

K. E. Kemper ◽

I. M. MacLeod ◽

A. J. Chamberlain ◽

B. J. Hayes

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Quantitative Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Crop Breeding ◽

Single Nucleotide ◽

Genome Wide ◽

Phenotype Identification

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.

Download Full-text

Exploring the predictive power of polygenic scores derived from genome-wide association studies: a study of 10 complex traits

Bioinformatics ◽

10.1093/bioinformatics/btw745 ◽

2017 ◽

pp. btw745 ◽

Cited By ~ 8

Author(s):

Hon-Cheong So ◽

Pak C. Sham

Keyword(s):

Complex Traits ◽

Predictive Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Polygenic Scores

Download Full-text

Comprehensive evaluation of mapping complex traits in wheat using genome-wide association studies

Molecular Breeding ◽

10.1007/s11032-021-01272-7 ◽

2021 ◽

Vol 42 (1) ◽

Author(s):

Dinesh K. Saini ◽

Yuvraj Chopra ◽

Jagmohan Singh ◽

Karansher S. Sandhu ◽

Anand Kumar ◽

...

Keyword(s):

Complex Traits ◽

Comprehensive Evaluation ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Mapping Complex Traits

Download Full-text

Abstract 1443: Systems-Based Analysis of HDL Metabolism in a Mouse Intercross between Strains CAST and C57BL6/J (CxB)

Circulation ◽

10.1161/circ.118.suppl_18.s_326-c ◽

2008 ◽

Vol 118 (suppl_18) ◽

Author(s):

Margarete Mehrabian ◽

Charles Farber ◽

Peter Langfelder ◽

Anatole Ghazalpour ◽

Zhiqiang Zhou ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

Plasma Lipid ◽

Heart Association ◽

Hdl Cholesterol ◽

Chow Diet ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Genomic Study ◽

Hdl Metabolism

A recent meta-analysis of three large genome-wide association studies for HDL cholesterol levels revealed several highly significant associations, but altogether these explained less than 10% of the population variance of HDL. Since HDL levels are highly heritable, with heritability estimated at 50–70% in many studies, there are clearly many additional genes, and probably complex genetic and environmental interactions, involved in HDL metabolism. Thus, if “personalized medicine” is to become a reality, these complex factors must be addressed. Combined genetic-genomic approaches have rejuvenated the analysis of complex traits using mouse models, and here report an integrative genomic study of HDL in a large mouse cross. We previously reported the identification of loci associated with HDL cholesterol concentrations using a CXB F2 intercross. We have now generated a much larger CXB cross, consisting of 438 mice, and have integrated genome wide gene expression analysis of liver and adipose with quantitative trait locus (QTL) mapping and causality modeling. These studies were carried out on mice fed a low fat, chow diet and then switched to a high fat, ’Western’ diet. QTL analysis on the clinical traits using R/QTL (http://cran.r-project.org/) revealed a complex inheritance pattern with significant LOD scores at 9 loci, on chromosomes 1,2,4,5,8,9,10,16,18. Of these loci, 6 (chr: 1,4,5,10,16,18) were seen to be involved in genetic-dietary regulation of HDL cholesterol. Expression QTLs (eQTL) were determined using Agilent microarrays for 23,624 transcripts. Genes expressed within a 1-LOD support interval or correlated with HDL (p<2.7E-11) in both adipose and liver were identified. Using Network Edge Orienting (NEO) methods, causal relationships between the identified genes, related QTL peak markers and HDL levels were accessed. The genes were then ranked based on the NEO scores. In liver the highest ranked genes were associated with mitochondrial, ER and golgi trafficking. In adipose, on the other hand, pathways associated with cell signaling, transcription regulation and protein ubiquitation were predicted to be causal for HDL levels. In conclusion, our results reveal a large number of novel pathways and candidate genes for plasma lipid metabolism. This research has received full or partial funding support from the American Heart Association, AHA Western States Affiliate (California, Nevada & Utah).

Download Full-text