scholarly journals Universal annotation of the human genome through integration of over a thousand epigenomic datasets

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Ha Vu ◽  
Jason Ernst

Abstract Background Genome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative “stacked modeling” approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. Results Using a version of ChromHMM enhanced for large-scale applications, we apply the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, with the learned model denoted as the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we use in characterizing each state. Compared to per-cell-type annotations, the full-stack annotations directly differentiate constitutive from cell type-specific activity and is more predictive of locations of external genomic annotations. Conclusions The full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing per-cell-type annotations for studying the non-coding human genome.

2020 ◽  
Author(s):  
Ha Vu ◽  
Jason Ernst

AbstractGenome-wide maps of chromatin marks such as histone modifications and open chromatin sites provide valuable information for annotating the non-coding genome, including identifying regulatory elements. Computational approaches such as ChromHMM have been applied to discover and annotate chromatin states defined by combinatorial and spatial patterns of chromatin marks within the same cell type. An alternative ‘stacked modeling’ approach was previously suggested, where chromatin states are defined jointly from datasets of multiple cell types to produce a single universal genome annotation based on all datasets. Despite its potential benefits for applications that are not specific to one cell type, such an approach was previously applied only for small-scale specialized purposes. Large-scale applications of stacked modeling have previously posed scalability challenges. In this paper, using a version of ChromHMM enhanced for large-scale applications, we applied the stacked modeling approach to produce a universal chromatin state annotation of the human genome using over 1000 datasets from more than 100 cell types, denoted the full-stack model. The full-stack model states show distinct enrichments for external genomic annotations, which we used in characterizing each state. Compared to cell-type-specific annotations, the full-stack annotation directly differentiates constitutive from cell-type-specific activity and is more predictive of locations of external genomic annotations. Overall, the full-stack ChromHMM model provides a universal chromatin state annotation of the genome and a unified global view of over 1000 datasets. We expect this to be a useful resource that complements existing cell-type-specific annotations for studying the non-coding human genome.


Genes ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 961
Author(s):  
Kanwal Tariq ◽  
Ann-Kristin Östlund Farrants

Ribosomal transcription constitutes the major energy consuming process in cells and is regulated in response to proliferation, differentiation and metabolic conditions by several signalling pathways. These act on the transcription machinery but also on chromatin factors and ncRNA. The many ribosomal gene repeats are organised in a number of different chromatin states; active, poised, pseudosilent and repressed gene repeats. Some of these chromatin states are unique to the 47rRNA gene repeat and do not occur at other locations in the genome, such as the active state organised with the HMG protein UBF whereas other chromatin state are nucleosomal, harbouring both active and inactive histone marks. The number of repeats in a certain state varies on developmental stage and cell type; embryonic cells have more rRNA gene repeats organised in an open chromatin state, which is replaced by heterochromatin during differentiation, establishing different states depending on cell type. The 47S rRNA gene transcription is regulated in different ways depending on stimulus and chromatin state of individual gene repeats. This review will discuss the present knowledge about factors involved, such as chromatin remodelling factors NuRD, NoRC, CSB, B-WICH, histone modifying enzymes and histone chaperones, in altering gene expression and switching chromatin states in proliferation, differentiation, metabolic changes and stress responses.


2020 ◽  
Author(s):  
Yan Kai ◽  
Stephanos Tsoucas ◽  
Shengbao Suo ◽  
Guo-Cheng Yuan

AbstractGenome-wide profiling of chromatin states has been widely used to characterize the biological function of non-coding genomic sequences in a cell-type specific manner. However, the systematic, comprehensive annotations of chromatin states from experimental data are challenging and require not just extensive biological knowledge but also sophisticated computational modeling. Previously we developed a hierarchical hidden Markov model, named diHMM, to systematically annotate chromatin states at multiple scales based on the combination of histone mark and chromatin regulator binding profiles. Here, we have improved the method by optimizing computational efficiency and using an ensemble-clustering approach to achieve a unified annotation by integrating information from cell-type-specific models. We then applied this improved method to generate a unified multi-scale chromatin state map in 127 human cell types, based on public data generated by the Epigenome Roadmap and ENCODE consortia. We found cell types with similar origin are typically associated with similar chromatin states, but cultured cell lines have distinct structures than primary cells. The contribution of enhancer elements to gene regulation is mediated by the broader context of domain-state organization. Distinct domain-state patterns are associated with various 3D chromatin structures. As such, we have demonstrated the utility of the multi-scale chromatin state map in characterizing the biological function of the human genome.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Henriette Miko ◽  
Yunjiang Qiu ◽  
Bjoern Gaertner ◽  
Maike Sander ◽  
Uwe Ohler

Abstract Background Co-localized combinations of histone modifications (“chromatin states”) have been shown to correlate with promoter and enhancer activity. Changes in chromatin states over multiple time points (“chromatin state trajectories”) have previously been analyzed at promoter and enhancers separately. With the advent of time series Hi-C data it is now possible to connect promoters and enhancers and to analyze chromatin state trajectories at promoter-enhancer pairs. Results We present TimelessFlex, a framework for investigating chromatin state trajectories at promoters and enhancers and at promoter-enhancer pairs based on Hi-C information. TimelessFlex extends our previous approach Timeless, a Bayesian network for clustering multiple histone modification data sets at promoter and enhancer feature regions. We utilize time series ATAC-seq data measuring open chromatin to define promoters and enhancer candidates. We developed an expectation-maximization algorithm to assign promoters and enhancers to each other based on Hi-C interactions and jointly cluster their feature regions into paired chromatin state trajectories. We find jointly clustered promoter-enhancer pairs showing the same activation patterns on both sides but with a stronger trend at the enhancer side. While the promoter side remains accessible across the time series, the enhancer side becomes dynamically more open towards the gene activation time point. Promoter cluster patterns show strong correlations with gene expression signals, whereas Hi-C signals get only slightly stronger towards activation. The code of the framework is available at https://github.com/henriettemiko/TimelessFlex. Conclusions TimelessFlex clusters time series histone modifications at promoter-enhancer pairs based on Hi-C and it can identify distinct chromatin states at promoter and enhancer feature regions and their changes over time.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Arjan van der Velde ◽  
Kaili Fan ◽  
Junko Tsuji ◽  
Jill E. Moore ◽  
Michael J. Purcaro ◽  
...  

AbstractThe morphologically and functionally distinct cell types of a multicellular organism are maintained by their unique epigenomes and gene expression programs. Phase III of the ENCODE Project profiled 66 mouse epigenomes across twelve tissues at daily intervals from embryonic day 11.5 to birth. Applying the ChromHMM algorithm to these epigenomes, we annotated eighteen chromatin states with characteristics of promoters, enhancers, transcribed regions, repressed regions, and quiescent regions. Our integrative analyses delineate the tissue specificity and developmental trajectory of the loci in these chromatin states. Approximately 0.3% of each epigenome is assigned to a bivalent chromatin state, which harbors both active marks and the repressive mark H3K27me3. Highly evolutionarily conserved, these loci are enriched in silencers bound by polycomb repressive complex proteins, and the transcription start sites of their silenced target genes. This collection of chromatin state assignments provides a useful resource for studying mammalian development.


2020 ◽  
Vol 29 (11) ◽  
pp. 1922-1932
Author(s):  
Priyanka Nandakumar ◽  
Dongwon Lee ◽  
Thomas J Hoffmann ◽  
Georg B Ehret ◽  
Dan Arking ◽  
...  

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.


2019 ◽  
Author(s):  
Priyanka Nandakumar ◽  
Dongwon Lee ◽  
Thomas J. Hoffmann ◽  
Georg B. Ehret ◽  
Dan Arking ◽  
...  

AbstractHundreds of loci have been associated with blood pressure traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ∼100,000 Genetic Epidemiology Research on Aging (GERA) study participants. In the present study, we subsequently focused on determining putative regulatory regions for these and other tissues of relevance to blood pressure, to both fine-map these loci by pinpointing genes and variants of functional interest within them, and to identify any novel genes.We constructed maps of putative cis-regulatory elements using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Sequence variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. In order to identify genes of interest, we aggregate these variants in these putative cis-regulatory elements within 50Kb of the start or end of genes considered as “expressed” in these tissues or cell types using publicly available gene expression data, and use the deltaSVM scores as weights in the well-known group-wise sequence kernel association test (SKAT). We test for association with both blood pressure traits as well as expression within these tissues or cell types of interest, and identify several genes, including MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B, and PPCDC. Although our study centers on blood pressure traits, we additionally examined two known genes, SCN5A and NOS1AP involved in the cardiac trait QT interval, in the Atherosclerosis Risk in Communities Study (ARIC), as a positive control, and observed an expected heart-specific effect. Thus, our method may be used to identify variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.Author SummarySequence change in genes (“variants”) are linked to the presence and severity of different traits or diseases. However, as genes may be expressed in different tissues and at different times and degrees, using this information is expected to more accurately identify genes of interest. Variants within the genes are essential, but also in the sequences (“regulatory elements”) that control the genes’ expression in different tissues or cell types. In this study, we aim to use this information about expression and variants potentially involved in gene expression regulation to better pinpoint genes and variants in regulatory elements of interest for blood pressure regulation. We do so by taking advantage of such data that are publicly available, and use methods to combine information about variants in aggregate within a gene’s putative regulatory elements in tissues thought to be relevant for blood pressure, and identify several genes, meant to enable experimental follow-up.


Author(s):  
Hanqing Liu ◽  
Jingtian Zhou ◽  
Wei Tian ◽  
Chongyuan Luo ◽  
Anna Bartlett ◽  
...  

SummaryMammalian brain cells are remarkably diverse in gene expression, anatomy, and function, yet the regulatory DNA landscape underlying this extensive heterogeneity is poorly understood. We carried out a comprehensive assessment of the epigenomes of mouse brain cell types by applying single nucleus DNA methylation sequencing to profile 110,294 nuclei from 45 regions of the mouse cortex, hippocampus, striatum, pallidum, and olfactory areas. We identified 161 cell clusters with distinct spatial locations and projection targets. We constructed taxonomies of these epigenetic types, annotated with signature genes, regulatory elements, and transcription factors. These features indicate the potential regulatory landscape supporting the assignment of putative cell types, and reveal repetitive usage of regulators in excitatory and inhibitory cells for determining subtypes. The DNA methylation landscape of excitatory neurons in the cortex and hippocampus varied continuously along spatial gradients. Using this deep dataset, an artificial neural network model was constructed that precisely predicts single neuron cell-type identity and brain area spatial location. Integration of high-resolution DNA methylomes with single-nucleus chromatin accessibility data allowed prediction of high-confidence enhancer-gene interactions for all identified cell types, which were subsequently validated by cell-type-specific chromatin conformation capture experiments. By combining multi-omic datasets (DNA methylation, chromatin contacts, and open chromatin) from single nuclei and annotating the regulatory genome of hundreds of cell types in the mouse brain, our DNA methylation atlas establishes the epigenetic basis for neuronal diversity and spatial organization throughout the mouse brain.


2019 ◽  
Author(s):  
Qi Song ◽  
Jiyoung Lee ◽  
Shamima Akter ◽  
Ruth Grene ◽  
Song Li

AbstractRecent advances in genomic technologies have generated large-scale protein-DNA interaction data and open chromatic regions for multiple plant species. To predict condition specific gene regulatory networks using these data, we developed the Condition Specific Regulatory network inference engine (ConSReg), which combines heterogeneous genomic data using sparse linear model followed by feature selection and stability selection to select key regulatory genes. Using Arabidopsis as a model system, we constructed maps of gene regulation under more than 50 experimental conditions including abiotic stresses, cell type-specific expression, and stress responses in individual cell types. Our results show that ConSReg accurately predicted gene expressions (average auROC of 0.84) across multiple testing datasets. We found that, (1) including open chromatin information from ATAC-seq data significantly improves the performance of ConSReg across all tested datasets; (2) choice of negative training samples and length of promoter regions are two key factors that affect model performance. We applied ConSReg to Arabidopsis single cell RNA-seq data of two root cell types (endodermis and cortex) and identified five regulators in two root cell types. Four out of the five regulators have additional experimental evidence to support their roles in regulating gene expression in Arabidopsis roots. By comparing regulatory maps in abiotic stress responses and cell type-specific experiments, we revealed that transcription factors that regulate tissue levels abiotic stresses tend to also regulate stress responses in individual cell types in plants.


Author(s):  
◽  
Ricky S. Adkins ◽  
Andrew I. Aldridge ◽  
Shona Allen ◽  
Seth A. Ament ◽  
...  

ABSTRACTWe report the generation of a multimodal cell census and atlas of the mammalian primary motor cortex (MOp or M1) as the initial product of the BRAIN Initiative Cell Census Network (BICCN). This was achieved by coordinated large-scale analyses of single-cell transcriptomes, chromatin accessibility, DNA methylomes, spatially resolved single-cell transcriptomes, morphological and electrophysiological properties, and cellular resolution input-output mapping, integrated through cross-modal computational analysis. Together, our results advance the collective knowledge and understanding of brain cell type organization: First, our study reveals a unified molecular genetic landscape of cortical cell types that congruently integrates their transcriptome, open chromatin and DNA methylation maps. Second, cross-species analysis achieves a unified taxonomy of transcriptomic types and their hierarchical organization that are conserved from mouse to marmoset and human. Third, cross-modal analysis provides compelling evidence for the epigenomic, transcriptomic, and gene regulatory basis of neuronal phenotypes such as their physiological and anatomical properties, demonstrating the biological validity and genomic underpinning of neuron types and subtypes. Fourth, in situ single-cell transcriptomics provides a spatially-resolved cell type atlas of the motor cortex. Fifth, integrated transcriptomic, epigenomic and anatomical analyses reveal the correspondence between neural circuits and transcriptomic cell types. We further present an extensive genetic toolset for targeting and fate mapping glutamatergic projection neuron types toward linking their developmental trajectory to their circuit function. Together, our results establish a unified and mechanistic framework of neuronal cell type organization that integrates multi-layered molecular genetic and spatial information with multi-faceted phenotypic properties.


Sign in / Sign up

Export Citation Format

Share Document