Functional variants in hematopoietic transcription factor footprints and their roles in the risk of immune system diseases

2021 ◽  
Author(s):  
Naoto Kubota ◽  
Mikita Suyama

AbstractGenome-wide association studies (GWAS) have been performed to identify thousands of variants in the human genome as disease risk markers, but functional variants that actually affect gene regulation and their genomic features remain largely unknown. Here we performed a comprehensive survey of functional variants in the regulatory elements of the human genome. We integrated hematopoietic transcription factor (TF) footprints datasets generated by ENCODE project with multiple quantitative trait locus (QTL) datasets (eQTL, caQTL, bQTL, and hQTL) and investigated the associations of functional variants and immune system disease risk. We identified candidate regulatory variants highly linked with GWAS lead variants and found that they were strongly enriched in active enhancers in hematopoietic cells, emphasizing the clinical relevance of enhancers in disease risk. Moreover, we found some strong relationships between traits and hematopoietic cell types or TFs. We highlighted some credible regulatory variants and found that a variant, rs2291668, which potentially functions in the molecular pathogenesis of multiple sclerosis, is located within a TF footprint present in a protein-coding exon of the TNFSF14 gene, indicating that protein-coding exons as well as noncoding regions can possess clinically relevant regulatory elements. Collectively, our results shed light on the molecular pathogenesis of immune system diseases. The methods described in this study can readily be applied to the study of the risk factors of other diseases.

2019 ◽  
Vol 41 (3) ◽  
pp. 46-48
Author(s):  
Jon M. Laurent ◽  
Sudarshan Pinglay ◽  
Leslie Mitchell ◽  
Ran Brosh

Less than 2% of our genome is protein-coding DNA. The vast expanses of non-coding DNA make up the genome's “dark matter”, where introns, repetitive and regulatory elements reside. Variation between individuals in non-coding regulatory DNA is emerging as a major factor in the genetics of numerous diseases and traits, yet very little is known about how such variations contribute to disease risk. Studying the genetics of regulatory variation is technically challenging as regulatory elements can affect genes located tens of thousands of base pairs away, and often, multiple distal regulatory variations, each with a very small effect, combine in an unknown way to significantly modulate the expression of genes. At the Center for Synthetic Regulatory Genomics (SyRGe) we directly tackle these problems in order to systematically elucidate the mechanisms of regulatory variation underlying human disease.


2012 ◽  
Vol 108 (09) ◽  
pp. 419-426 ◽  
Author(s):  
Richard J. Fish ◽  
Marguerite Neerman-Arbez

SummaryThe Aα, Bβ and γ polypeptide chains of fibrinogen are encoded by a three gene cluster on human chromosome four. The fibrinogen genes (FGB-FGA-FGG) are expressed almost exclusively in hepatocytes where their output is coordinated to ensure a sufficient mRNA pool for each chain and maintain an abundant plasma fibrinogen protein level. Fibrinogen gene expression is controlled by the activity of proximal promoters which contain binding sites for hepatocyte transcription factors, including proteins which influence fibrinogen transcription in response to acute-phase inflammatory stimuli. The fibrinogen gene cluster also contains cis regulatory elements; enhancer sequences with liver activities identified by sequence conservation and functional genomics. While the transcriptional control of this gene cluster is fascinating biology, the medical impetus to understand fibrinogen gene regulation stems from the association of cardiovascular disease risk with high level circulating fibrinogen. In the general population this level varies from about 1.5 to 3.5 g/l. This variation between individuals is influenced by genotype, suggesting there are genetic variants contributing to fibrinogen levels which reside in fibrinogen regulatory loci. A complete picture of how fibrinogen genes are regulated will therefore point towards novel sources of regulatory variants. In this review we discuss regulation of the fibrinogen genes from proximal promoters and enhancers, the influence of acute-phase stimulation, post-transcriptional regulation by miRNAs and functional regulatory variants identified in genetic studies. Finally, we discuss the fibrinogen locus in light of recent advances in understanding chromosomal architecture and suggest future directions for researching the mechanisms that control fibrinogen expression.


Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 436-436 ◽  
Author(s):  
Christopher J. Ott ◽  
Alexander J. Federation ◽  
Siddha Kasar ◽  
Josephine L. Klitgaard ◽  
Stacey M. Fernandes ◽  
...  

Abstract Genome sequencing efforts of chronic lymphocytic leukemia have revealed mutations that disrupt protein-coding elements of the genome (Puente et al, 2011; Wang et al, 2011; Landau et al, 2013). Recently, comprehensive whole-genome sequencing efforts have begun to reveal the genetic aberrations that occur outside of protein-coding exons, many that may perturb gene regulatory sites (Puente et al, 2015). These include enhancer elements that make physical contact with gene promoters to regulate gene expression in a cell-type specific manner. While mutations certainly promote CLL leukemogenesis, epigenomic alterations may also play an important role in facilitating disease progression and maintenance by inducing the gene expression aberrations that have long been observed in CLL. Epigenomic alterations include chromatin structure changes that facilitate altered transcription and chromatin factor recruitment to regulatory elements. While comprehensive genome-wide DNA methylation studies have been performed on human cancers and normal cell counterparts including CLL, other comprehensive studies of cancer epigenomes have been lacking. We have completed an analysis of chromatin structures in a cohort of primary chronic lymphocytic leukemia (CLL) samples with comparisons to normal CD19+ B lymphocytes (n = 18 CLL samples, n = 5 normal B lymphocyte samples). We used chromatin accessibility assays (ATAC-seq) and genome-wide enhancer mapping (H3K27ac ChIP-seq) to comprehensively define the transcriptionally active chromatin landscape of CLL. We have discovered greater than 15,000 novel regulatory elements when compared to previously annotated regulatory elements. Moreover, sites within the loci of several hundred genes were found to have large regions of gained chromatin accessibility and H3K27 acetylation, revealing the appearance of aberrant enhancer activity. These gained enhancer elements correspond with increased gene expression and are found at gene loci such as LEF1, PLCG1, CTLA4, and ITGB1. We have also systematically identified the super-enhancers of CLL - large complex regulatory regions that possess unique tissue-specific regulatory capabilities. Many of these super-enhancers are found in normal B lymphocytes, yet the super-enhancer at the ITGB1 and LEF1 loci are CLL-specific and may be considered to facilitate leukemia-specific expression. We have found CLL-specific enhancers are also significantly associated with annotated CLL risk variants, and have identified enhancer-associated SNPs found within CLL-risk loci predicted to disrupt transcription factor binding sites. These include SNPs at the IRF8 and LEF1 locithat lead to the creation and destruction of SMAD4 and RXRA binding sites, respectively. Additionally, we have analyzed whole-genome sequencing data from a subset of our sample cohort. Mutational hotspots in the CXCR4 and BACH2 promoters occur within open, acetylated regions. Moreover, we discover recurrent mutations in enhancers of the ETS1 and ST6GAL1 locus that have not been previously annotated. Using a transcription factor network modeling approach, we used these global chromatin structure characteristics to determine networks that are highly active in CLL. We find that transcription factors such as NFATc1, E2F5, and NR3C2 are among the most interconnected transcription factors of the CLL genome, and their connectivity is significantly higher in CLL cells compared to normal B cells. In contrast, network profiling of CLL cells predicts loss of MXI1 connectivity, a negative regulator of the MYC oncogene. By treating cells with specific pharmacological inhibitors of NFAT family members including cyclosporin and FK506, we are able to reduce NFAT-mediated network connectivity, resulting in a selective loss of NFAT-bound enhancers. This leads to CLL cell death in vitro of both cell lines and primary CLL patient samples. Our results reveal the unique chromatin structure landscape of CLL for the first time, and identify the CLL-specific enhancer elements that confer the transcriptional dysregulation that has long been observed in this disease. Use of these chromatin structure analyses and enhancer landscapes has allowed us to construct the intrinsic transcription factor network of CLL, and determine a particular dependency on NFAT signaling for cell survival. Disclosures No relevant conflicts of interest to declare.


2020 ◽  
Vol 23 (2) ◽  
pp. 113-120
Author(s):  
A. Athanassiadou

Determination of the DNA sequence of the human genome, revealing extensive genetic variation, and the mapping of the genes and the various regulatory elements of genome function within the genomic DNA, has revolutionized the way we view the states of health and disease in our time. Genetic complexity of the genome is manifested on different levels. The first level refers to the expression of protein coding genes, as regulated by their individual promoter in linear proximity. The next level of genetic complexity involves long distance action by far away enhancers, interacting with promoters through DNA looping. This 3- dimensional (3D) regulation is further developing by chromosome folding into the so called transcription factories, for fully physiological expression. Chromosome folding, mediated by specific genetic elements - insulators - is adding to the genetic complexity by facilitating movements of chromatin of specific genomic regions - the so-called topologically associated domains (TAD) in support of transcription and other cellular functions. Further genetic complexity has emerged with the finding that over 75% of the genome is transcribed and except of the coding genes, a plethora of RNA transcripts are produced - the non-coding RNA - that has important regulatory roles in the gene expression context. The great variation of genome sequence and regulatory elements of the genome architecture are exploited in studies of genome-wide association with disease, in the framework of Precision Medicine and in general of Genomic Medicine.


2021 ◽  
Author(s):  
Yonatan A. Cooper ◽  
Jessica E. Davis ◽  
Sriram Kosuri ◽  
Giovanni Coppola ◽  
Daniel H. Geschwind

Predicting functionality of noncoding variation is one of the major challenges in modern genetics. We employed massively parallel reporter assays to screen 5,706 variants from genome-wide association studies for both Alzheimers disease (AD) and Progressive Supranuclear Palsy (PSP). We identified 320 functional regulatory polymorphisms (SigVars) comprising 27 of 34 unique tested loci, including multiple independent signals across the complex 17q21.31 region. We identify novel risk genes including PLEKHM1 in PSP and APOC1 in AD, and perform gene-editing to validate four distinct causal loci, confirming complement 4 (C4A) as a novel genetic risk factor for AD. Moreover, functional variants preferentially disrupt transcription factor binding sites that converge on enhancers with differential cell-type specific activity in PSP and AD, implicating a neuronal SP1-driven regulatory network in PSP pathogenesis. These analyses support a novel mechanism underlying noncoding genetic risk, whereby common genetic variants drive disease risk via their aggregate activity on specific transcriptional programs.


2019 ◽  
Author(s):  
Gloriia Novikova ◽  
Manav Kapoor ◽  
Julia TCW ◽  
Edsel M. Abud ◽  
Anastasia G. Efthymiou ◽  
...  

AbstractGenome-wide association studies (GWAS) have identified more than thirty loci associated with Alzheimer’s disease (AD), but the causal variants, regulatory elements, genes and pathways remain largely unknown thus impeding a mechanistic understanding of AD pathogenesis. Previously, we showed that AD risk alleles are enriched in myeloid-specific epigenomic annotations. Here, we show that they are specifically enriched in active enhancers of monocytes, macrophages and microglia. We integrated AD GWAS signals with myeloid epigenomic and transcriptomic datasets using novel analytical approaches to link myeloid enhancer activity to target gene expression regulation and AD risk modification. We nominate candidate AD risk enhancers and identify their target causal genes (including AP4E1, AP4M1, APBB3, BIN1, CD2AP, MS4A4A, MS4A6A, PILRA, RABEP1, SPI1, SPPL2A, TP53INP1, ZKSCAN1, and ZYX) in sixteen loci. Fine-mapping of these enhancers nominates candidate functional variants that likely modify disease susceptibility by regulating causal gene expression in myeloid cells. In the MS4A locus we identified a single candidate functional variant and validated it experimentally in human induced pluripotent stem cell (hiPSC)-derived microglia. Combined, these results strongly implicate dysfunction of the myeloid endolysosomal system in the etiology of AD.


2020 ◽  
Author(s):  
Robert Lesurf ◽  
Abdelrahman Said ◽  
Oyediran Akinrinade ◽  
Jeroen Breckpot ◽  
Kathleen Delfosse ◽  
...  

ABSTRACTCardiomyopathy (CMP) is a heritable genetic disorder. Protein-coding variants account for 20-30% of cases. The contribution of variants in non-coding DNA elements that regulate gene expression has not been explored. We performed whole-genome sequencing (WGS) of 228 unrelated CMP families. Besides pathogenic protein-coding variants in known CMP genes, 5% cases harbored rare loss-of-function variants in novel cardiac genes, with NRAP and FHOD3 being strong candidates. WGS also revealed a high burden of high-risk variants in promoters and enhancers of CMP genes in an additional 20% cases (Odds ratio 2.14, 95% CI 1.60-2.86, p=5.26×10−7 vs 1326 controls) with genes involved in α-dystroglycan glycosylation (FKTN, DTNA) and desmosomal signaling (DSC2, DSG2) specifically enriched for regulatory variants (False discovery rate <0.03). These findings were independently replicated in the Genomics England CMP cohort (n=1266). The functional effect of non-coding variants on transcription was functionally validated in patient myocardium and reporter assays in human cardiomyocytes, and that of novel gene variants in zebrafish knockouts. Our results show that functionally active variants in novel genes and in regulatory elements of CMP genes contribute strongly to the genomic etiology of childhood-onset CMP.


2019 ◽  
Author(s):  
Xiao Zhang ◽  
Kenneth C. Ehrlich ◽  
Fangtang Yu ◽  
Xiaojun Hu ◽  
Hong-Wen Deng ◽  
...  

AbstractA major challenge in translating findings from genome-wide association studies (GWAS) to biological mechanisms is pinpointing functional variants because only a very small percentage of variants associated with a given trait actually impact the trait. We used an extensive epigenetics, transcriptomics, and genetics analysis of theTBX15/WARS2neighborhood to prioritize this region’s best-candidate causal variants for the genetic risk of osteoporosis (estimated bone density, eBMD) and obesity (waist-hip ratio or waist circumference adjusted for body mass index).TBX15encodes a transcription factor that is important in bone development and adipose biology. Manual curation of 692 GWAS-derived variants gave eight strong candidates for causal SNPs that modulateTBX15transcription in subcutaneous adipose tissue (SAT) or osteoblasts, which highly and specifically express this gene. None of these SNPs were prioritized by Bayesian fine-mapping. The eight regulatory causal SNPs were in enhancer or promoter chromatin seen preferentially in SAT or osteoblasts atTBX15intron-1 or upstream. They overlap strongly predicted, allele-specific transcription factor binding sites. Our analysis suggests that these SNPs act independently of two missense SNPs inTBX15. Remarkably, five of the regulatory SNPs were associated with eBMD and obesity and had the same trait-increasing allele for both. We found thatWARS2obesity-related SNPs can be ascribed to high linkage disequilibrium withTBX15intron-1 SNPs. Our findings from GWAS index, proxy, and imputed SNPs suggest that a few SNPs, including three in a 0.7-kb cluster, act as causal regulatory variants to fine-tuneTBX15expression and, thereby, affect both obesity and osteoporosis risk.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Gloriia Novikova ◽  
Manav Kapoor ◽  
Julia TCW ◽  
Edsel M. Abud ◽  
Anastasia G. Efthymiou ◽  
...  

AbstractGenome-wide association studies (GWAS) have identified more than 40 loci associated with Alzheimer’s disease (AD), but the causal variants, regulatory elements, genes and pathways remain largely unknown, impeding a mechanistic understanding of AD pathogenesis. Previously, we showed that AD risk alleles are enriched in myeloid-specific epigenomic annotations. Here, we show that they are specifically enriched in active enhancers of monocytes, macrophages and microglia. We integrated AD GWAS with myeloid epigenomic and transcriptomic datasets using analytical approaches to link myeloid enhancer activity to target gene expression regulation and AD risk modification. We identify AD risk enhancers and nominate candidate causal genes among their likely targets (including AP4E1, AP4M1, APBB3, BIN1, MS4A4A, MS4A6A, PILRA, RABEP1, SPI1, TP53INP1, and ZYX) in twenty loci. Fine-mapping of these enhancers nominates candidate functional variants that likely modify AD risk by regulating gene expression in myeloid cells. In the MS4A locus we identified a single candidate functional variant and validated it in human induced pluripotent stem cell (hiPSC)-derived microglia and brain. Taken together, this study integrates AD GWAS with multiple myeloid genomic datasets to investigate the mechanisms of AD risk alleles and nominates candidate functional variants, regulatory elements and genes that likely modulate disease susceptibility.


2017 ◽  
Author(s):  
Yakir A Reshef ◽  
Hilary K Finucane ◽  
David R Kelley ◽  
Alexander Gusev ◽  
Dylan Kotliar ◽  
...  

AbstractBiological interpretation of GWAS data frequently involves analyzing unsigned genomic annotations comprising SNPs involved in a biological process and assessing enrichment for disease signal. However, it is often possible to generate signed annotations quantifying whether each SNP allele promotes or hinders a biological process, e.g., binding of a transcription factor (TF). Directional effects of such annotations on disease risk enable stronger statements about causal mechanisms of disease than enrichments of corresponding unsigned annotations. Here we introduce a new method, signed LD profile regression, for detecting such directional effects using GWAS summary statistics, and we apply the method using 382 signed annotations reflecting predicted TF binding. We show via theory and simulations that our method is well-powered and is well-calibrated even when TF binding sites co-localize with other enriched regulatory elements, which can confound unsigned enrichment methods. We further validate our method by showing that it recovers known transcriptional regulators when applied to molecular QTL in blood. We then apply our method to eQTL in 48 GTEx tissues, identifying 651 distinct TF-tissue expression associations at per-tissue FDR < 5%, including 30 associations with robust evidence of tissue specificity. Finally, we apply our method to 46 diseases and complex traits (averageN= 289,617) and identify 77 annotation-trait associations at per-trait FDR < 5% representing 12 independent TF-trait associations, and we conduct gene-set enrichment analyses to characterize the underlying transcriptional programs. Our results implicate new causal disease genes (including causal genes at known GWAS loci), and in some cases suggest a detailed mechanism for a causal gene’s effect on disease. Our method provides a new way to leverage functional data to draw inferences about disease etiology.


Sign in / Sign up

Export Citation Format

Share Document