scholarly journals IMPACT: Genomic annotation of cell-state-specific regulatory elements inferred from the epigenome of bound transcription factors

2018 ◽  
Author(s):  
Tiffany Amariuta ◽  
Yang Luo ◽  
Steven Gazal ◽  
Emma E. Davenport ◽  
Bryce van de Geijn ◽  
...  

Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be characterized by leveraging local epigenomic signatures at sites where specific transcription factors (TFs) are bound. To link these two identifying features, we introduce IMPACT, a genome annotation strategy which identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT predicts TF motif binding with high accuracy (average AUC 0.92, s.e. 0.03; across 8 TFs), a significant improvement (all p<6.9e-15) over intersecting motifs with open chromatin (average AUC 0.66, s.e. 0.11). Second, an IMPACT annotation trained on RNA polymerase II is more enriched for peripheral blood cis-eQTL variation (N=3,754) than sequence based annotations, such as promoters and regions around the TSS, (permutation p<1e-3, 25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N=38,242) and East Asian (N=22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% (s.e. 19.4%) of RA h2 (p<1.6e-5) and that the top 9.8% of Treg IMPACT regulatory elements, consisting of all SNPs with a non-zero annotation value, capture 97.3% (s.e. 18.2%) of RA h2 (p<7.6e-7), the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Finally, integration with RA fine-mapping data (N=27,345) revealed a significant enrichment (2.87, p<8.6e-3) of putatively causal variants across 20 RA associated loci in the top 1% of CD4+ Treg IMPACT regulatory regions. Overall, we find that IMPACT generalizes well to other cell types in identifying complex trait associated regulatory elements.

Author(s):  
Alexandra Maslova ◽  
Ricardo N. Ramirez ◽  
Ke Ma ◽  
Hugo Schmutz ◽  
Chendi Wang ◽  
...  

SUMMARYThe mammalian genome contains several million cis-regulatory elements, whose differential activity marked by open chromatin determines organogenesis and differentiation. This activity is itself embedded in the DNA sequence, decoded by sequence-specific transcription factors. Leveraging a granular ATAC-seq atlas of chromatin activity across 81 immune cell-types we show that a convolutional neural network (“AI-TAC”) can learn to infer cell-type-specific chromatin activity solely from the DNA sequence. AI-TAC does so by rediscovering, with astonishing precision, binding motifs for known regulators, and some unknown ones, mapping them with high concordance to positions validated by ChIP-seq data. AI-TAC also uncovers combinatorial influences, establishing a hierarchy of transcription factors (TFs) and their interactions involved in immunocyte specification, with intriguingly different strategies between lineages. Mouse-trained AI-TAC can parse human DNA, revealing a strikingly similar ranking of influential TFs. Thus, Deep Learning can reveal the regulatory syntax that drives the full differentiative complexity of the immune system.


2021 ◽  
Vol 90 (1) ◽  
pp. 193-219
Author(s):  
Emmanuel Compe ◽  
Jean-Marc Egly

In eukaryotes, transcription of protein-coding genes requires the assembly at core promoters of a large preinitiation machinery containing RNA polymerase II (RNAPII) and general transcription factors (GTFs). Transcription is potentiated by regulatory elements called enhancers, which are recognized by specific DNA-binding transcription factors that recruit cofactors and convey, following chromatin remodeling, the activating cues to the preinitiation complex. This review summarizes nearly five decades of work on transcription initiation by describing the sequential recruitment of diverse molecular players including the GTFs, the Mediator complex, and DNA repair factors that support RNAPII to enable RNA synthesis. The elucidation of the transcription initiation mechanism has greatly benefited from the study of altered transcription components associated with human diseases that could be considered transcription syndromes.


2021 ◽  
Author(s):  
Sneha Gopalan ◽  
Yuqing Wang ◽  
Nicholas W. Harper ◽  
Manuel Garber ◽  
Thomas G Fazzio

Methods derived from CUT&RUN and CUT&Tag enable genome-wide mapping of the localization of proteins on chromatin from as few as one cell. These and other mapping approaches focus on one protein at a time, preventing direct measurements of co-localization of different chromatin proteins in the same cells and requiring prioritization of targets where samples are limiting. Here we describe multi-CUT&Tag, an adaptation of CUT&Tag that overcomes these hurdles by using antibody-specific barcodes to simultaneously map multiple proteins in the same cells. Highly specific multi-CUT&Tag maps of histone marks and RNA Polymerase II uncovered sites of co-localization in the same cells, active and repressed genes, and candidate cis-regulatory elements. Single-cell multi-CUT&Tag profiling facilitated identification of distinct cell types from a mixed population and characterization of cell type-specific chromatin architecture. In sum, multi-CUT&Tag increases the information content per cell of epigenomic maps, facilitating direct analysis of the interplay of different proteins on chromatin.


2020 ◽  
Vol 29 (11) ◽  
pp. 1922-1932
Author(s):  
Priyanka Nandakumar ◽  
Dongwon Lee ◽  
Thomas J Hoffmann ◽  
Georg B Ehret ◽  
Dan Arking ◽  
...  

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.


Science ◽  
2020 ◽  
Vol 367 (6484) ◽  
pp. eaay6690 ◽  
Author(s):  
Katrina L. Grasby ◽  
Neda Jahanshad ◽  
Jodie N. Painter ◽  
Lucía Colodro-Conde ◽  
Janita Bralten ◽  
...  

The cerebral cortex underlies our complex cognitive capabilities, yet little is known about the specific genetic loci that influence human cortical structure. To identify genetic variants that affect cortical structure, we conducted a genome-wide association meta-analysis of brain magnetic resonance imaging data from 51,665 individuals. We analyzed the surface area and average thickness of the whole cortex and 34 regions with known functional specializations. We identified 199 significant loci and found significant enrichment for loci influencing total surface area within regulatory elements that are active during prenatal cortical development, supporting the radial unit hypothesis. Loci that affect regional surface area cluster near genes in Wnt signaling pathways, which influence progenitor expansion and areal identity. Variation in cortical structure is genetically correlated with cognitive function, Parkinson’s disease, insomnia, depression, neuroticism, and attention deficit hyperactivity disorder.


Thorax ◽  
2011 ◽  
Vol 67 (5) ◽  
pp. 385-391 ◽  
Author(s):  
Jared M Bischof ◽  
Christopher J Ott ◽  
Shih-Hsing Leir ◽  
Nehal Gosalia ◽  
Lingyun Song ◽  
...  

2019 ◽  
Author(s):  
Priyanka Nandakumar ◽  
Dongwon Lee ◽  
Thomas J. Hoffmann ◽  
Georg B. Ehret ◽  
Dan Arking ◽  
...  

AbstractHundreds of loci have been associated with blood pressure traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ∼100,000 Genetic Epidemiology Research on Aging (GERA) study participants. In the present study, we subsequently focused on determining putative regulatory regions for these and other tissues of relevance to blood pressure, to both fine-map these loci by pinpointing genes and variants of functional interest within them, and to identify any novel genes.We constructed maps of putative cis-regulatory elements using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Sequence variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. In order to identify genes of interest, we aggregate these variants in these putative cis-regulatory elements within 50Kb of the start or end of genes considered as “expressed” in these tissues or cell types using publicly available gene expression data, and use the deltaSVM scores as weights in the well-known group-wise sequence kernel association test (SKAT). We test for association with both blood pressure traits as well as expression within these tissues or cell types of interest, and identify several genes, including MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B, and PPCDC. Although our study centers on blood pressure traits, we additionally examined two known genes, SCN5A and NOS1AP involved in the cardiac trait QT interval, in the Atherosclerosis Risk in Communities Study (ARIC), as a positive control, and observed an expected heart-specific effect. Thus, our method may be used to identify variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.Author SummarySequence change in genes (“variants”) are linked to the presence and severity of different traits or diseases. However, as genes may be expressed in different tissues and at different times and degrees, using this information is expected to more accurately identify genes of interest. Variants within the genes are essential, but also in the sequences (“regulatory elements”) that control the genes’ expression in different tissues or cell types. In this study, we aim to use this information about expression and variants potentially involved in gene expression regulation to better pinpoint genes and variants in regulatory elements of interest for blood pressure regulation. We do so by taking advantage of such data that are publicly available, and use methods to combine information about variants in aggregate within a gene’s putative regulatory elements in tissues thought to be relevant for blood pressure, and identify several genes, meant to enable experimental follow-up.


2019 ◽  
Author(s):  
Pawel F. Przytycki ◽  
Katherine S. Pollard

Single-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell-type specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve enhancers to specific cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their enhancers.


Author(s):  
Hanqing Liu ◽  
Jingtian Zhou ◽  
Wei Tian ◽  
Chongyuan Luo ◽  
Anna Bartlett ◽  
...  

SummaryMammalian brain cells are remarkably diverse in gene expression, anatomy, and function, yet the regulatory DNA landscape underlying this extensive heterogeneity is poorly understood. We carried out a comprehensive assessment of the epigenomes of mouse brain cell types by applying single nucleus DNA methylation sequencing to profile 110,294 nuclei from 45 regions of the mouse cortex, hippocampus, striatum, pallidum, and olfactory areas. We identified 161 cell clusters with distinct spatial locations and projection targets. We constructed taxonomies of these epigenetic types, annotated with signature genes, regulatory elements, and transcription factors. These features indicate the potential regulatory landscape supporting the assignment of putative cell types, and reveal repetitive usage of regulators in excitatory and inhibitory cells for determining subtypes. The DNA methylation landscape of excitatory neurons in the cortex and hippocampus varied continuously along spatial gradients. Using this deep dataset, an artificial neural network model was constructed that precisely predicts single neuron cell-type identity and brain area spatial location. Integration of high-resolution DNA methylomes with single-nucleus chromatin accessibility data allowed prediction of high-confidence enhancer-gene interactions for all identified cell types, which were subsequently validated by cell-type-specific chromatin conformation capture experiments. By combining multi-omic datasets (DNA methylation, chromatin contacts, and open chromatin) from single nuclei and annotating the regulatory genome of hundreds of cell types in the mouse brain, our DNA methylation atlas establishes the epigenetic basis for neuronal diversity and spatial organization throughout the mouse brain.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 531-531
Author(s):  
Erik L. Bao ◽  
Jacob C. Ulirsch ◽  
Caleb A. Lareau ◽  
Leif S. Ludwig ◽  
Michael H. Guo ◽  
...  

Abstract Hematopoiesis is a well-characterized paradigm of cellular differentiation that is highly regulated to ensure balanced proportions of mature blood cells. However, many aspects of this process remain poorly understood in humans. For example, there is extensive variation in commonly measured blood cell traits, which can manifest as diseases at extreme ends of the spectrum, yet the vast majority of genetic loci responsible for driving these differences are currently unknown. Here, we integrate fine-mapped population genetics with high-resolution chromatin landscapes to gain novel insights into regulatory mechanisms critical for human blood cell production and disease. First, we conducted a genome-wide association study in 115,000 individuals from the UK Biobank, measuring the effects of genetic variation on 16 blood traits spanning 7 hematopoietic lineages (erythroid, platelet, lymphocyte, monocyte, neutrophil, eosinophil, basophil). Within each region of association (n = 2,056), we performed Bayesian fine-mapping on all common variants to resolve the most likely causal hits. Going further, we were interested in whether genetic variants predominantly act in terminal cell states or less differentiated progenitors. To this end, we overlapped fine-mapped variants with chromatin accessibility profiles (ATAC-seq) of 18 primary hematopoietic populations sorted from healthy donors. Across all lineages, 21% of regulatory variants were restricted to accessible chromatin (AC) peaks in terminal progenitors. Interestingly, 59% of variants fell in AC regions of one or more upstream progenitor states, suggesting that a significant amount of variation in blood traits stems from regulatory signaling in earlier stages of hematopoiesis. Motivated by this finding, we hypothesized that different branches of hematopoiesis (e.g., monocyte and red blood cell count) could be co-regulated by pleiotropic variants acting in common progenitor populations. Therefore, we investigated variants associated with 2 or more of the 7 blood cell types for which phenotypes were available. Remarkably, across 172 such variants, there was an average of 60% more open chromatin in progenitors than terminal cell types (mean 4.01 vs. 2.44 counts per million; p = 0.025). Examining the directional effects of these variants on distinct lineages, we discovered that 91% of pleiotropic variants exhibited a tune mechanism by changing the levels of different blood cells in the same direction. One such example was rs17758695 located in intron 1 of BCL2, an anti-apoptotic protein known to regulate cell death similarly across multiple hematopoietic cell types. In contrast, the remaining 9% of pleiotropic variants favored one lineage at the expense of others (switch mechanism), including novel variants near key myeloid-determining transcription factors CEBPA and MYC (rs78744187 and rs562240450). Together, these results suggest that pleiotropic variants 1) preferentially act in common progenitor rather than terminal cell types, and 2) predominantly tune multiple traits in the same direction, but may favor one at the expense of others when influencing lineage commitment. Finally, given the enrichment of fine-mapped variants in common progenitor states, we set out to determine whether classically defined hematopoietic populations could be divided into lineage-biased subpopulations based on differential genetic regulation of blood traits. To do so, we measured the enrichment of fine-mapped variants in the chromatin landscapes of 2,034 single cells isolated from 8 hematopoietic progenitor populations. Strikingly, we discovered significant heterogeneity within the common myeloid progenitor (CMP) population, in which one subset of cells exhibited greater open-chromatin enrichment for myeloid trait variants and relevant transcription factor (TF) binding (CEBPA, IRF8), whereas the other subset showed enrichment for erythroid trait variants and TFs (GATA1, KLF1). By integrating genetic fine-mapping with chromatin data, we identified hundreds of causal variants regulating 16 blood traits, characterized novel mechanisms of pleiotropic effects, and discovered cell states enriched for blood trait regulation. These findings provide new insights into the importance of genetic regulation in progenitor cell states and will contribute to knowledge of how these processes go awry in diseases of blood cell production. Disclosures No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document