scholarly journals Accurate imputation of histone modifications using transcription

Author(s):  
Zhong Wang ◽  
Alexandra G. Chivu ◽  
Lauren A. Choate ◽  
Edward J. Rice ◽  
Donald C. Miller ◽  
...  

AbstractWe trained a sensitive machine learning tool to infer the distribution of histone marks using maps of nascent transcription. Transcription captured the variation in active histone marks and complex chromatin states, like bivalent promoters, down to single-nucleosome resolution and at an accuracy that rivaled the correspondence between independent ChIP-seq experiments. The relationship between active histone marks and transcription was conserved in all cell types examined, allowing individual labs to annotate active functional elements in mammals with similar richness as major consortia. Using imputation as an interpretative tool uncovered cell-type specific differences in how the PRC2-dependent repressive mark, H3K27me3, corresponds to transcription, and revealed that transcription initiation requires both chromatin accessibility and an active chromatin environment demonstrating that initiation is less promiscuous than previously thought.

2021 ◽  
Author(s):  
Risa Karakida Kawaguchi ◽  
Ziqi Tang ◽  
Stephan Fischer ◽  
Rohit Tripathy ◽  
Peter K. Koo ◽  
...  

Background: Single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) measures genome-wide chromatin accessibility for the discovery of cell-type specific regulatory networks. ScATAC-seq combined with single-cell RNA sequencing (scRNA-seq) offers important avenues for ongoing research, such as novel cell-type specific activation of enhancer and transcription factor binding sites as well as chromatin changes specific to cell states. On the other hand, scATAC-seq data is known to be challenging to interpret due to its high number of zeros as well as the heterogeneity derived from different protocols. Because of the stochastic lack of marker gene activities, cell type identification by scATAC-seq remains difficult even at a cluster level. Results: In this study, we exploit reference knowledge obtained from external scATAC-seq or scRNA-seq datasets to define existing cell types and uncover the genomic regions which drive cell-type specific gene regulation. To investigate the robustness of existing cell-typing methods, we collected 7 scATAC-seq datasets targeting mouse brain for a meta-analytic comparison of neuronal cell-type annotation, including a reference atlas generated by the BRAIN Initiative Cell Census Network (BICCN). By comparing the area under the receiver operating characteristics curves (AUROCs) for the three major cell types (inhibitory, excitatory, and non-neuronal cells), cell-typing performance by single markers is found to be highly variable even for known marker genes due to study-specific biases. However, the signal aggregation of a large and redundant marker gene set, optimized via multiple scRNA-seq data, achieves the highest cell-typing performances among 5 existing marker gene sets, from the individual cell to cluster level. That gene set also shows a high consistency with the cluster-specific genes from inhibitory subtypes in two well-annotated datasets, suggesting applicability to rare cell types. Next, we demonstrate a comprehensive assessment of scATAC-seq cell typing using exhaustive combinations of the marker gene sets with supervised learning methods including machine learning classifiers and joint clustering methods. Our results show that the combinations using robust marker gene sets systematically ranked at the top, not only with model based prediction using a large reference data but also with a simple summation of expression strengths across markers. To demonstrate the utility of this robust cell typing approach, we trained a deep neural network to predict chromatin accessibility in each subtype using only DNA sequence. Through model interpretation methods, we identify key motifs enriched about robust gene sets for each neuronal subtype. Conclusions: Through the meta-analytic evaluation of scATAC-seq cell-typing methods, we develop a novel method set to exploit the BICCN reference atlas. Our study strongly supports the value of robust marker gene selection as a feature selection tool and cross-dataset comparison between scATAC-seq datasets to improve alignment of scATAC-seq to known biology. With this novel, high quality epigenetic data, genomic analysis of regulatory regions can reveal sequence motifs that drive cell type-specific regulatory programs.


Science ◽  
2020 ◽  
Vol 370 (6518) ◽  
pp. eaba7612 ◽  
Author(s):  
Silvia Domcke ◽  
Andrew J. Hill ◽  
Riza M. Daza ◽  
Junyue Cao ◽  
Diana R. O’Day ◽  
...  

The chromatin landscape underlying the specification of human cell types is of fundamental interest. We generated human cell atlases of chromatin accessibility and gene expression in fetal tissues. For chromatin accessibility, we devised a three-level combinatorial indexing assay and applied it to 53 samples representing 15 organs, profiling ~800,000 single cells. We leveraged cell types defined by gene expression to annotate these data and cataloged hundreds of thousands of candidate regulatory elements that exhibit cell type–specific chromatin accessibility. We investigated the properties of lineage-specific transcription factors (such as POU2F1 in neurons), organ-specific specializations of broadly distributed cell types (such as blood and endothelial), and cell type–specific enrichments of complex trait heritability. These data represent a rich resource for the exploration of in vivo human gene regulation in diverse tissues and cell types.


2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Lila Rieber ◽  
Shaun Mahony

Abstract Background Comparisons of Hi–C data sets between cell types and conditions have revealed differences in topologically associated domains (TADs) and A/B compartmentalization, which are correlated with differences in gene regulation. However, previous comparisons have focused on known forms of 3D organization while potentially neglecting other functionally relevant differences. We aimed to create a method to quantify all locus-specific differences between two Hi–C data sets. Results We developed MultiMDS to jointly infer and align 3D chromosomal structures from two Hi–C data sets, thereby enabling a new way to comprehensively quantify relocalization of genomic loci between cell types. We demonstrate this approach by comparing Hi–C data across a variety of cell types. We consistently find relocalization of loci with minimal difference in A/B compartment score. For example, we identify compartment-independent relocalizations between GM12878 and K562 cells that involve loci displaying enhancer-associated histone marks in one cell type and polycomb-associated histone marks in the other. Conclusions MultiMDS is the first tool to identify all loci that relocalize between two Hi–C data sets. Our method can identify 3D localization differences that are correlated with cell-type-specific regulatory activities and which cannot be identified using other methods.


2016 ◽  
Author(s):  
Nicholas E. Banovich ◽  
Yang I. Li ◽  
Anil Raj ◽  
Michelle C. Ward ◽  
Peyton Greenside ◽  
...  

AbstractInduced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation across different cell types and as models for studies of complex disease. We established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression, chromatin accessibility and DNA methylation. Regulatory variation between individuals is lower in iPSCs than in the differentiated cell types, consistent with the intuition that developmental processes are generally canalized. While most cell type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell type-specific QTLs are in shared open chromatin. Finally, we developed a deep neural network to predict open chromatin regions from DNA sequence alone and were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell type-specific chromatin accessibility.


2020 ◽  
Vol 52 (9) ◽  
pp. 1428-1442 ◽  
Author(s):  
Jeongwoo Lee ◽  
Do Young Hyeon ◽  
Daehee Hwang

Abstract Advances in single-cell isolation and barcoding technologies offer unprecedented opportunities to profile DNA, mRNA, and proteins at a single-cell resolution. Recently, bulk multiomics analyses, such as multidimensional genomic and proteogenomic analyses, have proven beneficial for obtaining a comprehensive understanding of cellular events. This benefit has facilitated the development of single-cell multiomics analysis, which enables cell type-specific gene regulation to be examined. The cardinal features of single-cell multiomics analysis include (1) technologies for single-cell isolation, barcoding, and sequencing to measure multiple types of molecules from individual cells and (2) the integrative analysis of molecules to characterize cell types and their functions regarding pathophysiological processes based on molecular signatures. Here, we summarize the technologies for single-cell multiomics analyses (mRNA-genome, mRNA-DNA methylation, mRNA-chromatin accessibility, and mRNA-protein) as well as the methods for the integrative analysis of single-cell multiomics data.


2019 ◽  
Author(s):  
Qiao Liu ◽  
Wing Hung Wong ◽  
Rui Jiang

AbstractRegulatory elements (REs) in human genome are major sites of non-coding transcription which lack adequate interpretation. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it remains a big challenge to systematically and accurately characterize REs in the context of a specific cell type. To address this problem, we proposed DeepCAGE, an deep learning framework that incorporates transcriptome profile of human transcription factors (TFs) for accurately predicting the activities of cell type-specific REs. Our approach automatically learns the regulatory code of input DNA sequence incorporated with cell type-specific TFs expression. In a series of systematic comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions, but also the regression of DNase-seq signals. A typical scenario of usage for our method is to predict the activities of REs in novel cell types, especially where the chromatin accessibility data is not available. To sum up, our study provides a fascinating insight into disclosing complex regulatory mechanism by integrating transcriptome profile of human TFs.


Author(s):  
Ryan S. Ziffra ◽  
Chang N. Kim ◽  
Amy Wilfert ◽  
Tychele N. Turner ◽  
Maximilian Haeussler ◽  
...  

AbstractDynamic changes in chromatin accessibility coincide with important aspects of neuronal differentiation, such as fate specification and arealization and confer cell type-specific associations to neurodevelopmental disorders. However, studies of the epigenomic landscape of the developing human brain have yet to be performed at single-cell resolution. Here, we profiled chromatin accessibility of >75,000 cells from eight distinct areas of developing human forebrain using single cell ATAC-seq (scATACseq). We identified thousands of loci that undergo extensive cell type-specific changes in accessibility during corticogenesis. Chromatin state profiling also reveals novel distinctions between neural progenitor cells from different cortical areas not seen in transcriptomic profiles and suggests a role for retinoic acid signaling in cortical arealization. Comparison of the cell type-specific chromatin landscape of cerebral organoids to primary developing cortex found that organoids establish broad cell type-specific enhancer accessibility patterns similar to the developing cortex, but lack many putative regulatory elements identified in homologous primary cell types. Together, our results reveal the important contribution of chromatin state to the emerging patterns of cell type diversity and cell fate specification and provide a blueprint for evaluating the fidelity and robustness of cerebral organoids as a model for cortical development.


2019 ◽  
Author(s):  
Peiyao A. Zhao ◽  
Takayo Sasaki ◽  
David M. Gilbert

ABSTRACTDNA replication in mammalian cells occurs in a defined temporal order during S phase, known as the replication timing (RT) programme. RT is developmentally regulated and correlated with chromatin conformation and local transcriptional potential. Here we present RT profiles of unprecedented temporal resolution in two human embryonic stem cell lines, human colon carcinoma line HCT116 as well as F1 subspecies hybrid mouse embryonic stem cells and their neural progenitor derivatives. Strong enrichment of nascent DNA in fine temporal windows reveals a remarkable degree of cell to cell conservation in replication timing and patterns of replication genome-wide. We identify 5 patterns of replication in all cell types, consistent with varying degrees of initiation efficiency. Zones of replication initiation were found throughout S phase and resolved to ~50kb precision. Temporal transition regions were resolved into segments of uni-directional replication punctuated with small zones of inefficient initiation. Small and large valleys of convergent replication were consistent with either termination or broadly distributed initiation, respectively. RT correlated with chromatin compartment across all cell types but correlations of initiation time to chromatin domain boundaries and histone marks were cell type specific. Haplotype phasing revealed previously unappreciated regions of allele-specific and alleleindependent asynchronous replication. Allele-independent asynchrony was associated with large transcribed genes that resemble common fragile sites. Altogether, these data reveal a remarkably deterministic temporal choreography of DNA replication in mammalian cells.Highly homogeneous replication landscape between cells in a populationInitiation zones resolved within constant timing and timing transition regionsActive histone marks enriched within early initiation zones while enrichment of repressive marks is cell type specific.Transcribed long genes replicate asynchronously.


2018 ◽  
Author(s):  
Xi Chen ◽  
Ricardo J Miragaia ◽  
Kedar Nath Natarajan ◽  
Sarah A Teichmann

AbstractThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) is widely used to identify regulatory regions throughout the genome. However, very few studies have been performed at the single cell level (scATAC-seq) due to technical challenges. Here we developed a simple and robust plate-based scATAC-seq method, combining upfront bulk Tn5 tagging with single-nuclei sorting. We demonstrated that our method worked robustly across various systems, including fresh and cryopreserved cells from primary tissues. By profiling over 3,000 splenocytes, we identify distinct immune cell types and reveal cell type-specific regulatory regions and related transcription factors.


2019 ◽  
Author(s):  
Lila Rieber ◽  
Shaun Mahony

AbstractCell-type-specific chromosome conformation is correlated with differential gene regulation. Broad compartmentalization into two compartments (A & B) is proposed to be the main driver of cell-specific chromosome organization. However it is unclear what fraction of chromosome conformation changes between cell types and conditions is independent of changes in compartmentalization and whether any such compartment-independent reorganization is functionally important. We developed MultiMDS to jointly infer and align 3D chromosomal structures, thereby enabling a quantitative comparison of locus-specific changes across Hi-C datasets. We compared Hi-C datasets from yeast, which lack compartmentalization, grown with and without galactose. These comparisons confirmed known relocalizations as well as identifying additional examples. We also compared mammalian datasets across a variety of cell lines. We found a consistent enrichment for changes along the A/B compartment (nuclear interior/nuclear periphery) axis, even when comparing the same cell type from different individuals. Despite the prevalence of compartment changes, we consistently find compartment-independent relocalizations of loci that are within the A compartment in both compared cell types. Some such intra-compartment relocalizations involve loci that display enhancer-associated histone marks in one cell type and polycomb-associated histone marks in the other. MultiMDS thus enables a new way to compare chromosome conformations across two Hi-C datasets.Availabilityhttps://github.com/seqcode/multimds


Sign in / Sign up

Export Citation Format

Share Document