scholarly journals Colocalization of GWAS and eQTL Signals Detects Target Genes

2016 ◽  
Author(s):  
Farhad Hormozdiari ◽  
Martijn van de Bunt ◽  
Ayellet V. Segrè ◽  
Xiao Li ◽  
Jong Wha J Joo ◽  
...  

AbstractThe vast majority of genome-wide association studies (GWAS) risk loci fall in non-coding regions of the genome. One possible hypothesis is that these GWAS risk loci alter the individual’s disease risk through their effect on gene expression in different tissues. In order to understand the mechanisms driving a GWAS risk locus, it is helpful to determine which gene is affected in specific tissue types. For example, the relevant gene and tissue may play a role in the disease mechanism if the same variant responsible for a GWAS locus also affects gene expression. Identifying whether or not the same variant is causal in both GWAS and eQTL studies is challenging due to the uncertainty induced by linkage disequilibrium (LD) and the fact that some loci harbor multiple causal variants. However, current methods that address this problem assume that each locus contains a single causal variant. In this paper, we present a new method, eCAVIAR, that is capable of accounting for LD while computing the quantity we refer to as the colocalization posterior probability (CLPP). The CLPP is the probability that the same variant is responsible for both the GWAS and eQTL signal. eCAVIAR has several key advantages. First, our method can account for more than one causal variant in any loci. Second, it can leverage summary statistics without accessing the individual genotype data. We use both simulated and real datasets to demonstrate the utility of our method. Utilizing publicly available eQTL data on 45 different tissues, we demonstrate that computing CLPP can prioritize likely relevant tissues and target genes for a set of Glucose and Insulin-related traits loci. eCAVIAR is available at http://genetics.cs.ucla.edu/caviar/

2019 ◽  
Author(s):  
Jing Yang ◽  
Amanda McGovern ◽  
Paul Martin ◽  
Kate Duffus ◽  
Xiangyu Ge ◽  
...  

AbstractGenome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T-cells over 24 hours, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T-cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.


2020 ◽  
Vol 117 (26) ◽  
pp. 15028-15035 ◽  
Author(s):  
Ronald Yurko ◽  
Max G’Sell ◽  
Kathryn Roeder ◽  
Bernie Devlin

To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptiveP-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS associationPvalues play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.


2017 ◽  
Vol 242 (13) ◽  
pp. 1325-1334 ◽  
Author(s):  
Yizhou Zhu ◽  
Cagdas Tazearslan ◽  
Yousin Suh

Genome-wide association studies have shown that the far majority of disease-associated variants reside in the non-coding regions of the genome, suggesting that gene regulatory changes contribute to disease risk. To identify truly causal non-coding variants and their affected target genes remains challenging but is a critical step to translate the genetic associations to molecular mechanisms and ultimately clinical applications. Here we review genomic/epigenomic resources and in silico tools that can be used to identify causal non-coding variants and experimental strategies to validate their functionalities. Impact statement Most signals from genome-wide association studies (GWASs) map to the non-coding genome, and functional interpretation of these associations remained challenging. We reviewed recent progress in methodologies of studying the non-coding genome and argued that no single approach allows one to effectively identify the causal regulatory variants from GWAS results. By illustrating the advantages and limitations of each method, our review potentially provided a guideline for taking a combinatorial approach to accurately predict, prioritize, and eventually experimentally validate the causal variants.


2015 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
John A. Woolliams ◽  
Albert Tenesa

Genome-wide association studies (GWAS) promised to translate their findings into clinically beneficial improvements of patient management by tailoring disease management to the individual through the prediction of disease risk. However, the ability to translate genetic findings from GWAS into predictive tools that are of clinical utility and which may inform clinical practice has, so far, been encouraging but limited. Here we propose to use a more powerful statistical approach that enables the prediction of multiple medically relevant phenotypes without the costs associated with developing a genetic test for each of them. As a proof of principle, we used a common panel of 319,038 SNPs to train the prediction models in 114,264 unrelated White-British for height and four obesity related traits (body mass index, basal metabolic rate, body fat percentage, and waist-to-hip ratio). We obtained prediction accuracies that ranged between 46% and 75% of the maximum achievable given their explained heritable component. This represents an improvement of up to 75% over the phenotypic variance explained by the predictors developed through large collaborations, which used more than twice as many training samples. Across-population predictions in White non-British individuals were similar to those of White-British whilst those in Asian and Black individuals were informative but less accurate. The genotyping of circa 500,000 UK Biobank participants will yield predictions ranging between 66% and 83% of the maximum. We anticipate that our models and a common panel of genetic markers, which can be used across multiple traits and diseases, will be the starting point to tailor disease management to the individual. Ultimately, we will be able to capitalise on whole-genome sequence and environmental risk factors to realise the full potential of genomic medicine.


2021 ◽  
Author(s):  
Steven Gazal ◽  
Omer Weissbrod ◽  
Farhad Hormozdiari ◽  
Kushal Dey ◽  
Joseph Nasser ◽  
...  

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies. 


2018 ◽  
Author(s):  
Yeda Wu ◽  
Enda M. Byrne ◽  
Zhili Zheng ◽  
Kathryn E. Kemper ◽  
Loic Yengo ◽  
...  

AbstractIt is common that one medication is prescribed for several indications, and conversely that several medications are prescribed for the same indication, suggesting a complex biological network for disease risk and its relationship with pharmacological function. Genome-wide association studies (GWASs) of medication-use may contribute to understanding of disease etiology, generation of new leads relevant for drug discovery and quantify prospects for precision medicine. We conducted GWAS to profile self-reported medication-use from 23 categories in approximately 320,000 individuals from the UK Biobank. A total of 505 independent genetic loci that met stringent criteria for statistical significance were identified. We investigated the implications of these GWAS findings in relation to biological mechanism, drug target identification and genetic risk stratification of disease. Amongst the medication-associated genes were 16 known therapeutic-effect target genes for medications from 9 categories.


2020 ◽  
Author(s):  
Xi Peng ◽  
Joel S. Bader ◽  
Dimitrios Avramopoulos

ABSTRACTVariants identified by genome-wide association studies (GWAS) are often expression quantitative trait loci (eQTLs), suggesting they are proxies or are themselves regulatory. Across many datasets analyses show that variants often affect multiple genes. Lacking data on many tissue types, developmental time points and homogeneous cell types, the extent of this one-to-many relationship is underestimated. This raises questions on whether a disease eQTL target gene explains the genetic association or is a by-stander and puts into question the direction of expression effect of on the risk, since the many variant - regulated genes may have opposing effects, imperfectly balancing each other. We used two brain gene expression datasets (CommonMind and BrainSeq) for mediation analysis of schizophrenia-associated variants. We confirm that eQTL target genes often mediate risk but the direction in which expression affects risk is often different from that in which the risk allele changes expression. Of 38 mediator genes significant in both datasets 33 showed consistent mediation direction (Chi2 test P=6*10−6). One might expect that the expression would correlate with the risk allele in the same direction it correlates with disease. For 15 of these 33 (45%), however, the expression change associated with the risk allele was protective, suggesting the likely presence of other target genes with overriding effects. Our results identify specific risk mediating genes and suggest caution in interpreting the biological consequences of targeted modifications of gene expression, as not all eQTL targets may be relevant to disease while those that are, might have different than expected directions.


2021 ◽  
Author(s):  
Roshni A. Patel ◽  
Shaila A. Musharoff ◽  
Jeffrey P. Spence ◽  
Harold Pimentel ◽  
Catherine Tcheandjieu ◽  
...  

Despite the growing number of genome-wide association studies (GWAS) for complex traits, it remains unclear whether effect sizes of causal genetic variants differ between populations. In principle, effect sizes of causal variants could differ between populations due to gene-by-gene or gene-by-environment interactions. However, comparing causal variant effect sizes is challenging: it is difficult to know which variants are causal, and comparisons of variant effect sizes are confounded by differences in linkage disequilibrium (LD) structure between ancestries. Here, we develop a method to assess causal variant effect size differences that overcomes these limitations. Specifically, we leverage the fact that segments of European ancestry shared between European-American and admixed African-American individuals have similar LD structure, allowing for unbiased comparisons of variant effect sizes in European ancestry segments. We apply our method to two types of traits: gene expression and low-density lipoprotein cholesterol (LDL-C). We find that causal variant effect sizes for gene expression are significantly different between European-Americans and African-Americans; for LDL-C, we observe a similar point estimate although this is not significant, likely due to lower statistical power. Cross-population differences in variant effect sizes highlight the role of genetic interactions in trait architecture and will contribute to the poor portability of polygenic scores across populations, reinforcing the importance of conducting GWAS on individuals of diverse ancestries and environments.


2019 ◽  
Author(s):  
Ronald Yurko ◽  
Max G’Sell ◽  
Kathryn Roeder ◽  
Bernie Devlin

AbstractTo correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive p-value thresholding (Lei & Fithian 2018, AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association p-values play the role of the primary data for AdaPT; SNPs are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically-correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene-gene coexpression, captured by subnetwork (module) membership. In all 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefontal cortex (Werling et al. 2019). We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.


2020 ◽  
Vol 48 (W1) ◽  
pp. W193-W199 ◽  
Author(s):  
Nina Baumgarten ◽  
Dennis Hecker ◽  
Sivarajan Karunanithi ◽  
Florian Schmidt ◽  
Markus List ◽  
...  

Abstract A current challenge in genomics is to interpret non-coding regions and their role in transcriptional regulation of possibly distant target genes. Genome-wide association studies show that a large part of genomic variants are found in those non-coding regions, but their mechanisms of gene regulation are often unknown. An additional challenge is to reliably identify the target genes of the regulatory regions, which is an essential step in understanding their impact on gene expression. Here we present the EpiRegio web server, a resource of regulatory elements (REMs). REMs are genomic regions that exhibit variations in their chromatin accessibility profile associated with changes in expression of their target genes. EpiRegio incorporates both epigenomic and gene expression data for various human primary cell types and tissues, providing an integrated view of REMs in the genome. Our web server allows the analysis of genes and their associated REMs, including the REM’s activity and its estimated cell type-specific contribution to its target gene’s expression. Further, it is possible to explore genomic regions for their regulatory potential, investigate overlapping REMs and by that the dissection of regions of large epigenomic complexity. EpiRegio allows programmatic access through a REST API and is freely available at https://epiregio.de/.


Sign in / Sign up

Export Citation Format

Share Document