Open Targets Genetics: An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci

AbstractGenome-wide association studies (GWAS) have identified many variants robustly associated with complex traits but identifying the gene(s) mediating such associations is a major challenge. Here we present an open resource that provides systematic fine-mapping and protein-coding gene prioritization across 133,441 published human GWAS loci. We integrate diverse data sources, including genetics (from GWAS Catalog and UK Biobank) as well as transcriptomic, proteomic and epigenomic data across many tissues and cell types. We also provide systematic disease-disease and disease-molecular trait colocalization results across 92 cell types and tissues and identify 729 loci fine-mapped to a single coding causal variant and colocalized with a single gene. We trained a machine learning model using the fine mapped genetics and functional genomics data using 445 gold standard curated GWAS loci to distinguish causal genes from background genes at the same loci, outperforming a naive distance based model. Genes prioritized by our model are enriched for known approved drug targets (OR = 8.1, 95% CI: [5.7, 11.5]). These results will be regularly updated and are publicly available through a web portal, Open Targets Genetics (OTG, http://genetics.opentargets.org), enabling users to easily prioritize genes at disease-associated loci and assess their potential as drug targets.

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Genetic associations of protein-coding variants in human disease

10.1101/2021.10.14.21265023 ◽

2021 ◽

Author(s):

Benjamin B Sun ◽

Mitja I Kurki ◽

Christopher N Foley ◽

Asma Mechakra ◽

Chia-Yen Chen ◽

...

Keyword(s):

Genetic Variants ◽

Human Disease ◽

Drug Targets ◽

Association Studies ◽

Single Gene ◽

Clinical Stage ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Protein Coding ◽

Disease Associations

Genome-wide association studies (GWAS) have identified thousands of genetic variants linked to the risk of human disease. However, GWAS have thus far remained largely underpowered to identify associations in the rare and low frequency allelic spectrum and have lacked the resolution to trace causal mechanisms to underlying genes. Here, we combined whole exome sequencing in 392,814 UK Biobank participants with imputed genotypes from 260,405 FinnGen participants (653,219 total individuals) to conduct association meta-analyses for 744 disease endpoints across the protein-coding allelic frequency spectrum, bridging the gap between common and rare variant studies. We identified 975 associations, with more than one-third of our findings not reported previously. We demonstrate population-level relevance for mutations previously ascribed to causing single-gene disorders, map GWAS associations to likely causal genes, explain disease mechanisms, and systematically relate disease associations to levels of 117 biomarkers and clinical-stage drug targets. Combining sequencing and genotyping in two population biobanks allowed us to benefit from increased power to detect and explain disease associations, validate findings through replication and propose medical actionability for rare genetic variants. Our study provides a compendium of protein-coding variant associations for future insights into disease biology and drug discovery.

Download Full-text

Perspective of the GEMSTONE Consortium on Current and Future Approaches to Functional Validation for Skeletal Genetic Disease Using Cellular, Molecular and Animal-Modeling Techniques

Frontiers in Endocrinology ◽

10.3389/fendo.2021.731217 ◽

2021 ◽

Vol 12 ◽

Author(s):

Martina Rauner ◽

Ines Foessl ◽

Melissa M. Formosa ◽

Erika Kague ◽

Vid Prijatelj ◽

...

Keyword(s):

Resource Sharing ◽

Complex Traits ◽

Cellular Localization ◽

Target Genes ◽

Mission Statement ◽

Association Studies ◽

Repetitive Sequences ◽

Genome Wide Association Studies ◽

Causal Genes

The availability of large human datasets for genome-wide association studies (GWAS) and the advancement of sequencing technologies have boosted the identification of genetic variants in complex and rare diseases in the skeletal field. Yet, interpreting results from human association studies remains a challenge. To bridge the gap between genetic association and causality, a systematic functional investigation is necessary. Multiple unknowns exist for putative causal genes, including cellular localization of the molecular function. Intermediate traits (“endophenotypes”), e.g. molecular quantitative trait loci (molQTLs), are needed to identify mechanisms of underlying associations. Furthermore, index variants often reside in non-coding regions of the genome, therefore challenging for interpretation. Knowledge of non-coding variance (e.g. ncRNAs), repetitive sequences, and regulatory interactions between enhancers and their target genes is central for understanding causal genes in skeletal conditions. Animal models with deep skeletal phenotyping and cell culture models have already facilitated fine mapping of some association signals, elucidated gene mechanisms, and revealed disease-relevant biology. However, to accelerate research towards bridging the current gap between association and causality in skeletal diseases, alternative in vivo platforms need to be used and developed in parallel with the current -omics and traditional in vivo resources. Therefore, we argue that as a field we need to establish resource-sharing standards to collectively address complex research questions. These standards will promote data integration from various -omics technologies and functional dissection of human complex traits. In this mission statement, we review the current available resources and as a group propose a consensus to facilitate resource sharing using existing and future resources. Such coordination efforts will maximize the acquisition of knowledge from different approaches and thus reduce redundancy and duplication of resources. These measures will help to understand the pathogenesis of osteoporosis and other skeletal diseases towards defining new and more efficient therapeutic targets.

Download Full-text

Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases

10.1101/2020.09.08.20190561 ◽

2020 ◽

Cited By ~ 1

Author(s):

Elle M Weeks ◽

Jacob C Ulirsch ◽

Nathan Y Cheng ◽

Brian L Trippe ◽

Rebecca S Fine ◽

...

Keyword(s):

Complex Traits ◽

Association Studies ◽

Gene Prioritization ◽

Protein Interaction Data ◽

Large Set ◽

Genome Wide Association Studies ◽

Protein Protein Interaction ◽

Genome Wide ◽

Causal Genes ◽

Red Blood Cell Count

Genome-wide association studies (GWAS) are a valuable tool for understanding the biology of complex traits, but the associations found rarely point directly to causal genes. Here, we introduce a new method to identify the causal genes by integrating GWAS summary statistics with gene expression, biological pathway, and predicted protein-protein interaction data. We further propose an approach that effectively leverages both polygenic and locus-specific genetic signals by combining results across multiple gene prioritization methods, increasing confidence in prioritized genes. Using a large set of gold standard genes to evaluate our approach, we prioritize 8,402 unique gene-trait pairs with greater than 75% estimated precision across 113 complex traits and diseases, including known genes such as SORT1 for LDL cholesterol, SMIM1 for red blood cell count, and DRD2 for schizophrenia, as well as novel genes such as TTC39B for cholelithiasis. Our results demonstrate that a polygenic approach is a powerful tool for gene prioritization and, in combination with locus-specific signal, improves upon existing methods.

Download Full-text

A health disparities study of MicroRNA-146a expression in prostate cancer samples derived from African American and European American patients

Journal of Solid Tumors ◽

10.5430/jst.v10n2p1 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1

Author(s):

Monet Stevenson ◽

Narendra Narendra Banerjee ◽

Narendra Banerjee ◽

Kuldeep Rawat ◽

Lin Chen ◽

...

Keyword(s):

Prostate Cancer ◽

African American ◽

Association Studies ◽

Single Gene ◽

Micro Rna ◽

Genome Wide Association Studies ◽

Racial Groups ◽

Protein Coding ◽

Rna Molecules ◽

Prostate Carcinogenesis

Considering the prevalence of prostate cancer all over the world, it is desired to have tools, technologies, and biomarkers which help in early detection of the disease and discriminate different races and ethnic groups. Genetic information from the single gene analysis and genome-wide association studies have identified few biomarkers, however, the drivers of prostate cancer remain unknown in the majority of prostate cancer patients. In those cases where genetic association has been identified, the genes confer only a modest risk of this cancer, hence, making them less relevant for risk counseling and disease management. There is a need for additional biomarkers for diagnosis and prognosis of prostate cancer. MicroRNAs are a class of non-protein coding RNA molecules that are frequently dysregulated in different cancers including prostate cancer and show promise as diagnostic biomarkers and targets for therapy. Here we describe the role of micro RNA 146a (miR-146a) which may serve as a diagnostic and prognostic marker for prostate cancer, as indicated from the data presented in this report. Also, a pilot study indicated differential expression of miR-146a in prostate cancer cell lines and tissues from different racial groups. Reduced expression of miR-146a was observed in African American tumor tissues compared to those from European Whites This report provides a novel insight into understanding the prostate carcinogenesis.

Download Full-text

Inferring relevant tissues and cell types for complex traits in genome-wide association studies

10.1101/2021.06.09.447805 ◽

2021 ◽

Author(s):

Rujin Wang ◽

Danyu Lin ◽

Yuchao Jiang

Keyword(s):

Single Cell ◽

Complex Traits ◽

Association Studies ◽

Cell Types ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Cell Type ◽

Disease Etiology ◽

Genome Wide ◽

Cell Type Specific

More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific omics measurements from single-cell sequencing. We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant tissues or cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We extend our framework to single-cell transcriptomic data and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and single-cell datasets and further validated using PubMed search and existing bulk case-control testing results.

Download Full-text

True causal effect size heterogeneity is not required to explain trans-ethnic differences in GWAS signals

10.1101/085092 ◽

2016 ◽

Cited By ~ 3

Author(s):

Daniela Zanetti ◽

Michael E. Weale

Keyword(s):

Ethnic Differences ◽

Effect Size ◽

Complex Traits ◽

Causal Effect ◽

Association Studies ◽

African Ancestry ◽

Causal Variant ◽

Genome Wide Association Studies ◽

Population Differences ◽

Relative Risks

AbstractThrough genome-wide association studies (GWASs), researchers have identified hundreds of genetic variants associated with particular complex traits. Previous studies have compared the pattern of association signals across different populations in real data, and these have detected differences in the strength and sometimes even the direction of GWAS signals. These differences could be due to a combination of (1) lack of power (insufficient sample sizes); (2) minor allele frequency (MAF) differences (again affecting power); (3) linkage disequilibrium (LD) differences (affecting power to ‘tag’ the causal variant); and (4) true differences in causal variant effect sizes (defined by relative risks).In the present work, we sought to assess whether the first three of these reasons are sufficient on their own to explain the observed incidence of trans-ethnic differences in replications of GWAS signals, or whether the fourth reason is also required. We simulated case-control data of European, Asian and African ancestry, drawing on observed MAF and LD patterns seen in the 1000-Genomes reference dataset and assuming the true causal relative risks were the same in all three populations.We found that a combination of Euro-centric SNP selection and between-population differences in LD, accentuated by the lower SNP density typical of older GWAS panels, was sufficient to explain the rate of trans-ethnic differences previously reported, without the need to assume between-population differences in true causal SNP effect size. This suggests a cross-population consistency that has implications for our understanding of the interplay between genetics and environment in the aetiology of complex human diseases.

Download Full-text

Genome-Wide Association Studies of CKD and Related Traits

Clinical Journal of the American Society of Nephrology ◽

10.2215/cjn.00020120 ◽

2020 ◽

Vol 15 (11) ◽

pp. 1643-1656

Author(s):

Adrienne Tin ◽

Anna Köttgen

Keyword(s):

Kidney Function ◽

Complex Traits ◽

Kidney Diseases ◽

Association Studies ◽

Genome Wide Association ◽

Model Organisms ◽

Genome Wide Association Studies ◽

Genetic Loci ◽

Genome Wide ◽

Causal Genes

The past few years have seen major advances in genome-wide association studies (GWAS) of CKD and kidney function–related traits in several areas: increases in sample size from >100,000 to >1 million, enabling the discovery of >250 associated genetic loci that are highly reproducible; the inclusion of participants not only of European but also of non-European ancestries; and the use of advanced computational methods to integrate additional genomic and other unbiased, high-dimensional data to characterize the underlying genetic architecture and prioritize potentially causal genes and variants. Together with other large-scale biobank and genetic association studies of complex traits, these GWAS of kidney function–related traits have also provided novel insight into the relationship of kidney function to other diseases with respect to their genetic associations, genetic correlation, and directional relationships. A number of studies also included functional experiments using model organisms or cell lines to validate prioritized potentially causal genes and/or variants. In this review article, we will summarize these recent GWAS of CKD and kidney function–related traits, explain approaches for downstream characterization of associated genetic loci and the value of such computational follow-up analyses, and discuss related challenges along with potential solutions to ultimately enable improved treatment and prevention of kidney diseases through genetics.

Download Full-text

Effect sizes of causal variants for gene expression and complex traits differ between populations

10.1101/2021.12.06.471235 ◽

2021 ◽

Author(s):

Roshni A. Patel ◽

Shaila A. Musharoff ◽

Jeffrey P. Spence ◽

Harold Pimentel ◽

Catherine Tcheandjieu ◽

...

Keyword(s):

Gene Expression ◽

Complex Traits ◽

Association Studies ◽

Causal Variant ◽

Effect Sizes ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Polygenic Scores ◽

Causal Variants ◽

Variant Effect

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.

Download Full-text

Transcriptome-wide Association Study and eQTL colocalization identify potentially causal genes responsible for bone mineral density GWAS associations

10.1101/2021.10.12.464046 ◽

2021 ◽

Author(s):

Basel M Al-Barghouthi ◽

Will T Rosenow ◽

Kang-Ping Du ◽

Jinho Heo ◽

Robert Maynard ◽

...

Keyword(s):

Bone Mineral Density ◽

Bone Mineral ◽

Complex Traits ◽

Association Studies ◽

Tissue Expression ◽

Genome Wide Association Studies ◽

Biological Processes ◽

Mineral Density ◽

Genome Wide ◽

Causal Genes

Genome-wide association studies (GWASs) for bone mineral density (BMD) have identified over 1,100 associations to date. However, identifying causal genes implicated by such studies has been challenging. Recent advances in the development of transcriptome reference datasets and computational approaches such as transcriptome-wide association studies (TWASs) and expression quantitative trait loci (eQTL) colocalization have proven to be informative in identifying putatively causal genes underlying GWAS associations. Here, we used TWAS/eQTL colocalization in conjunction with transcriptomic data from the Genotype-Tissue Expression (GTEx) project to identify potentially causal genes for the largest BMD GWAS performed to date. Using this approach, we identified 512 genes as significant (Bonferroni <= 0.05) using both TWAS and eQTL colocalization. This set of genes was enriched for regulators of BMD and members of bone relevant biological processes. To investigate the significance of our findings, we selected PPP6R3, the gene with the strongest support from our analysis which was not previously implicated in the regulation of BMD, for further investigation. We observed that Ppp6r3 deletion in mice decreased BMD. In this work, we provide an updated resource of putatively causal BMD genes and demonstrate that PPP6R3 is a putatively causal BMD GWAS gene. These data increase our understanding of the genetics of BMD and provide further evidence for the utility of combined TWAS/colocalization approaches in untangling the genetics of complex traits.

Download Full-text