scholarly journals Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk

2017 ◽  
Author(s):  
Yakir A Reshef ◽  
Hilary K Finucane ◽  
David R Kelley ◽  
Alexander Gusev ◽  
Dylan Kotliar ◽  
...  

AbstractBiological interpretation of GWAS data frequently involves analyzing unsigned genomic annotations comprising SNPs involved in a biological process and assessing enrichment for disease signal. However, it is often possible to generate signed annotations quantifying whether each SNP allele promotes or hinders a biological process, e.g., binding of a transcription factor (TF). Directional effects of such annotations on disease risk enable stronger statements about causal mechanisms of disease than enrichments of corresponding unsigned annotations. Here we introduce a new method, signed LD profile regression, for detecting such directional effects using GWAS summary statistics, and we apply the method using 382 signed annotations reflecting predicted TF binding. We show via theory and simulations that our method is well-powered and is well-calibrated even when TF binding sites co-localize with other enriched regulatory elements, which can confound unsigned enrichment methods. We further validate our method by showing that it recovers known transcriptional regulators when applied to molecular QTL in blood. We then apply our method to eQTL in 48 GTEx tissues, identifying 651 distinct TF-tissue expression associations at per-tissue FDR < 5%, including 30 associations with robust evidence of tissue specificity. Finally, we apply our method to 46 diseases and complex traits (averageN= 289,617) and identify 77 annotation-trait associations at per-trait FDR < 5% representing 12 independent TF-trait associations, and we conduct gene-set enrichment analyses to characterize the underlying transcriptional programs. Our results implicate new causal disease genes (including causal genes at known GWAS loci), and in some cases suggest a detailed mechanism for a causal gene’s effect on disease. Our method provides a new way to leverage functional data to draw inferences about disease etiology.

2019 ◽  
Vol 28 (17) ◽  
pp. 2976-2986 ◽  
Author(s):  
Irfahan Kassam ◽  
Yang Wu ◽  
Jian Yang ◽  
Peter M Visscher ◽  
Allan F McRae

Abstract Despite extensive sex differences in human complex traits and disease, the male and female genomes differ only in the sex chromosomes. This implies that most sex-differentiated traits are the result of differences in the expression of genes that are common to both sexes. While sex differences in gene expression have been observed in a range of different tissues, the biological mechanisms for tissue-specific sex differences (TSSDs) in gene expression are not well understood. A total of 30 640 autosomal and 1021 X-linked transcripts were tested for heterogeneity in sex difference effect sizes in n = 617 individuals across 40 tissue types in Genotype–Tissue Expression (GTEx). This identified 65 autosomal and 66 X-linked TSSD transcripts (corresponding to unique genes) at a stringent significance threshold. Results for X-linked TSSD transcripts showed mainly concordant direction of sex differences across tissues and replicate previous findings. Autosomal TSSD transcripts had mainly discordant direction of sex differences across tissues. The top cis-expression quantitative trait loci (eQTLs) across tissues for autosomal TSSD transcripts are located a similar distance away from the nearest androgen and estrogen binding motifs and the nearest enhancer, as compared to cis-eQTLs for transcripts with stable sex differences in gene expression across tissue types. Enhancer regions that overlap top cis-eQTLs for TSSD transcripts, however, were found to be more dispersed across tissues. These observations suggest that androgen and estrogen regulatory elements in a cis region may play a common role in sex differences in gene expression, but TSSD in gene expression may additionally be due to causal variants located in tissue-specific enhancer regions.


2021 ◽  
Author(s):  
Steven Gazal ◽  
Omer Weissbrod ◽  
Farhad Hormozdiari ◽  
Kushal Dey ◽  
Joseph Nasser ◽  
...  

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies. 


2020 ◽  
Author(s):  
Guojun Hou ◽  
Isaac T.W. Harley ◽  
Xiaoming Lu ◽  
Tian Zhou ◽  
Ning Xu ◽  
...  

AbstractThe human genome contains millions of putative regulatory elements, which regulate gene expression. We are just beginning to understand the functional consequences of genetic variation within these regulatory elements. Since the bulk of common genetic variation impacting polygenic disease phenotypes localizes to these non-coding regions of the genome, understanding the consequences will improve our understanding of the mechanisms mediating genetic risk in human disease. Here, we define the systemic lupus erythematosus (SLE) risk variant rs2431369 as likely causal for SLE and show that it is located in a functional regulatory element that modulates miR-146a expression. We use epigenomic analysis and genome-editing to show that the rs2431697-containing region is a distal enhancer that specifically regulates miR-146a expression in a cell-type dependent manner. 3D chromatin structure analysis demonstrates physical interaction between the rs2431697-containing region and the miR-146a promoter. Further, our data show that NF-kB binds the disease protective allele in a sequence-specific manner, leading to increased expression of this immunoregulatory microRNA. Our work provides a strategy for using disease-associated variants to define the functional regulatory elements of non-coding RNA molecules such as miR-146a and provides mechanistic links between autoimmune disease risk genetic variation and disease etiology.


2020 ◽  
Vol 6 (37) ◽  
pp. eaba2083 ◽  
Author(s):  
Milton Pividori ◽  
Padma S. Rajagopal ◽  
Alvaro Barbeira ◽  
Yanyu Liang ◽  
Owen Melia ◽  
...  

Large-scale genomic and transcriptomic initiatives offer unprecedented insight into complex traits, but clinical translation remains limited by variant-level associations without biological context and lack of analytic resources. Our resource, PhenomeXcan, synthesizes 8.87 million variants from genome-wide association study summary statistics on 4091 traits with transcriptomic data from 49 tissues in Genotype-Tissue Expression v8 into a gene-based, queryable platform including 22,515 genes. We developed a novel Bayesian colocalization method, fast enrichment estimation aided colocalization analysis (fastENLOC), to prioritize likely causal gene-trait associations. We successfully replicate associations from the phenome-wide association studies (PheWAS) catalog Online Mendelian Inheritance in Man, and an evidence-based curated gene list. Using PhenomeXcan results, we provide examples of novel and underreported genome-to-phenome associations, complex gene-trait clusters, shared causal genes between common and rare diseases via further integration of PhenomeXcan with ClinVar, and potential therapeutic targets. PhenomeXcan (phenomexcan.org) provides broad, user-friendly access to complex data for translational researchers.


2021 ◽  
Author(s):  
Naoto Kubota ◽  
Mikita Suyama

AbstractGenome-wide association studies (GWAS) have been performed to identify thousands of variants in the human genome as disease risk markers, but functional variants that actually affect gene regulation and their genomic features remain largely unknown. Here we performed a comprehensive survey of functional variants in the regulatory elements of the human genome. We integrated hematopoietic transcription factor (TF) footprints datasets generated by ENCODE project with multiple quantitative trait locus (QTL) datasets (eQTL, caQTL, bQTL, and hQTL) and investigated the associations of functional variants and immune system disease risk. We identified candidate regulatory variants highly linked with GWAS lead variants and found that they were strongly enriched in active enhancers in hematopoietic cells, emphasizing the clinical relevance of enhancers in disease risk. Moreover, we found some strong relationships between traits and hematopoietic cell types or TFs. We highlighted some credible regulatory variants and found that a variant, rs2291668, which potentially functions in the molecular pathogenesis of multiple sclerosis, is located within a TF footprint present in a protein-coding exon of the TNFSF14 gene, indicating that protein-coding exons as well as noncoding regions can possess clinically relevant regulatory elements. Collectively, our results shed light on the molecular pathogenesis of immune system diseases. The methods described in this study can readily be applied to the study of the risk factors of other diseases.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Guojun Hou ◽  
Isaac T. W. Harley ◽  
Xiaoming Lu ◽  
Tian Zhou ◽  
Ning Xu ◽  
...  

AbstractSince most variants that impact polygenic disease phenotypes localize to non-coding genomic regions, understanding the consequences of regulatory element variants will advance understanding of human disease mechanisms. Here, we report that the systemic lupus erythematosus (SLE) risk variant rs2431697 as likely causal for SLE through disruption of a regulatory element, modulating miR-146a expression. Using epigenomic analysis, genome-editing and 3D chromatin structure analysis, we show that rs2431697 tags a cell-type dependent distal enhancer specific for miR-146a that physically interacts with the miR-146a promoter. NF-kB binds the disease protective allele in a sequence-specific manner, increasing expression of this immunoregulatory microRNA. Finally, CRISPR activation-based modulation of this enhancer in the PBMCs of SLE patients attenuates type I interferon pathway activation by increasing miR-146a expression. Our work provides a strategy to define non-coding RNA functional regulatory elements using disease-associated variants and provides mechanistic links between autoimmune disease risk genetic variation and disease etiology.


2021 ◽  
Author(s):  
Basel M Al-Barghouthi ◽  
Will T Rosenow ◽  
Kang-Ping Du ◽  
Jinho Heo ◽  
Robert Maynard ◽  
...  

Genome-wide association studies (GWASs) for bone mineral density (BMD) have identified over 1,100 associations to date. However, identifying causal genes implicated by such studies has been challenging. Recent advances in the development of transcriptome reference datasets and computational approaches such as transcriptome-wide association studies (TWASs) and expression quantitative trait loci (eQTL) colocalization have proven to be informative in identifying putatively causal genes underlying GWAS associations. Here, we used TWAS/eQTL colocalization in conjunction with transcriptomic data from the Genotype-Tissue Expression (GTEx) project to identify potentially causal genes for the largest BMD GWAS performed to date. Using this approach, we identified 512 genes as significant (Bonferroni <= 0.05) using both TWAS and eQTL colocalization. This set of genes was enriched for regulators of BMD and members of bone relevant biological processes. To investigate the significance of our findings, we selected PPP6R3, the gene with the strongest support from our analysis which was not previously implicated in the regulation of BMD, for further investigation. We observed that Ppp6r3 deletion in mice decreased BMD. In this work, we provide an updated resource of putatively causal BMD genes and demonstrate that PPP6R3 is a putatively causal BMD GWAS gene. These data increase our understanding of the genetics of BMD and provide further evidence for the utility of combined TWAS/colocalization approaches in untangling the genetics of complex traits.


2019 ◽  
Author(s):  
Alexi Nott ◽  
Inge R. Holtman ◽  
Nicole G. Coufal ◽  
Johannes C.M. Schlachetzki ◽  
Miao Yu ◽  
...  

AbstractUnique cell type-specific patterns of activated enhancers can be leveraged to interpret non-coding genetic variation associated with complex traits and diseases such as neurological and psychiatric disorders. Here, we have defined active promoters and enhancers for major cell types of the human brain. Whereas psychiatric disorders were primarily associated with regulatory regions in neurons, idiopathic Alzheimer’s disease (AD) variants were largely confined to microglia enhancers. Interactome maps connecting GWAS variants in cell type-specific enhancers to gene promoters revealed an extended microglia gene network in AD. Deletion of a microglia-specific enhancer harboring AD-risk variants ablated BIN1 expression in microglia but not in neurons or astrocytes. These findings revise and expand the genes likely to be influenced by non-coding variants in AD and suggest the probable brain cell types in which they function.One Sentence SummaryIdentification of cell type-specific regulatory elements in the human brain enables interpretation of non-coding GWAS risk variants.


2019 ◽  
Author(s):  
Milton Pividori ◽  
Padma S. Rajagopal ◽  
Alvaro Barbeira ◽  
Yanyu Liang ◽  
Owen Melia ◽  
...  

AbstractLarge-scale genomic and transcriptomic initiatives offer unprecedented ability to study the biology of complex traits and identify target genes for precision prevention or therapy. Translation to clinical contexts, however, has been slow and challenging due to lack of biological context for identified variant-level associations. Moreover, many translational researchers lack the computational or analytic infrastructures required to fully use these resources. We integrate genome-wide association study (GWAS) summary statistics from multiple publicly available sources and data from Genotype-Tissue Expression (GTEx) v8 using PrediXcan and provide a user-friendly platform for translational researchers based on state-of-the-art algorithms. We develop a novel Bayesian colocalization method, fastENLOC, to prioritize the most likely causal gene-trait associations. Our resource, PhenomeXcan, synthesizes 8.87 million variants from GWAS on 4,091 traits with transcriptome regulation data from 49 tissues in GTEx v8 into an innovative, gene-based resource including 22,255 genes. Across the entire genome/phenome space, we find 65,603 significant associations (Bonferroni-corrected p-value of 5.5 × 10−10), where 19,579 (29.8 percent) were colocalized (locus regional colocalization probability > 0.1). We successfully replicate associations from PheWAS Catalog (AUC=0.61) and OMIM (AUC=0.64). We provide examples of (a) finding novel and underreported genome-to-phenome associations, (b) exploring complex gene-trait clusters within PhenomeXcan, (c) studying phenome-to-phenome relationships between common and rare diseases via further integration of PhenomeXcan with ClinVar, and (d) evaluating potential therapeutic targets. PhenomeXcan (phenomexcan.org) broadens access to complex genomic and transcriptomic data and empowers translational researchers.One-Sentence SummaryPhenomeXcan is a gene-based resource of gene-trait associations with biological context that supports translational research.


2021 ◽  
Author(s):  
Katherine M Siewert-Rocks ◽  
Samuel S Kim ◽  
Douglas Yao ◽  
Huwenbo Shi ◽  
Alkes L. Price

Identifying gene sets that are associated to disease can provide valuable biological knowledge, but a fundamental challenge of gene set analyses of GWAS data is linking disease-associated SNPs to genes. Transcriptome-wide association studies (TWAS) can be used to detect associations between the genetically predicted expression of a gene and disease risk, thus implicating candidate disease genes. However, causal disease genes at TWAS-associated loci generally remain unknown due to gene co-regulation, which leads to correlations across genes in predicted expression. We developed a new method, gene co-regulation score (GCSC) regression, to identify gene sets that are enriched for disease heritability explained by the predicted expression of causal disease genes in the gene set. GCSC regresses TWAS chi-square statistics on gene co-regulation scores reflecting correlations in predicted gene expression; GCSC determines that a gene set is enriched for disease heritability if genes with high co-regulation to the gene set have higher TWAS chi-square statistics than genes with low co-regulation to the gene set, beyond what is expected based on co-regulation to all genes. We verified via simulations that GCSC is well-calibrated, and well-powered to identify gene sets that are enriched for disease heritability explained by predicted expression. We applied GCSC to gene expression data from GTEx (48 tissues) and GWAS summary statistics for 43 independent diseases and complex traits (average N=344K), analyzing a broad set of biological pathways and specifically expressed gene sets. We identified many enriched gene sets, recapitulating known biology. For Alzheimer's disease, we detected evidence of an immune basis, and specifically a role for antigen presentation, in analyses of both biological pathways and specifically expressed gene sets. Our results highlight the advantages of leveraging gene co-regulation within the TWAS framework to identify gene sets associated to disease.


Sign in / Sign up

Export Citation Format

Share Document