scholarly journals Modeling chromatin state from sequence across angiosperms using recurrent convolutional neural networks

2021 ◽  
Author(s):  
Travis Wrightsman ◽  
Alexandre P. Marand ◽  
Peter A. Crisp ◽  
Nathan M. Springer ◽  
Edward S. Buckler

Accessible chromatin regions are critical components of gene regulation but modeling them directly from sequence remains challenging, especially within plants, whose mechanisms of chromatin remodeling are less understood than in animals. We trained an existing deep learning architecture, DanQ, on leaf ATAC-seq data from 12 angiosperm species to predict the chromatin accessibility of sequence windows within and across species. We also trained DanQ on DNA methylation data from 10 angiosperms, because unmethylated regions have been shown to overlap significantly with accessible chromatin regions in some plants. The across-species models have comparable or even superior performance to a model trained within species, suggesting strong conservation of chromatin mechanisms across angiosperms. Testing a maize held out model on a multi-tissue scATAC panel revealed our models are best at predicting constitutively-accessible chromatin regions, with diminishing performance as cell-type specificity increases. Using a combination of interpretation methods, we ranked JASPAR motifs by their importance to each model and saw that the TCP and AP2/ERF transcription factor families consistently ranked highly. We embedded the top three JASPAR motifs for each model at all possible positions on both strands in our sequence window and observed position- and strand-specific patterns in their importance to the model. With our cross-species "a2z" model it is now feasible to predict the chromatin accessibility and methylation landscape of any angiosperm genome.

2021 ◽  
Vol 12 ◽  
Author(s):  
Zhe Cui ◽  
Ya Cui ◽  
Yan Gao ◽  
Tao Jiang ◽  
Tianyi Zang ◽  
...  

Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.


2019 ◽  
Vol 35 (19) ◽  
pp. 3818-3820 ◽  
Author(s):  
Eugene Urrutia ◽  
Li Chen ◽  
Haibo Zhou ◽  
Yuchao Jiang

Abstract Summary Single-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique—sparse, binary and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic datasets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all datasets and platforms. For demonstration, we further applied Destin to 2088 adult mouse forebrain cells and identified cell-type-specific association of previously reported schizophrenia GWAS loci. Availability and implementation Destin toolkit is freely available as an R package at https://github.com/urrutiag/destin. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Pierre-Olivier Estève ◽  
Udayakumar S. Vishnu ◽  
Hang Gyeong Chin ◽  
Sriharsa Pradhan

AbstractChromatin accessibility is a predictor of gene expression, cell division and cell type specificity. NicE-viewSeq (Nicking Enzyme assisted viewing and Sequencing) allows accessible chromatin visualization and sequencing with overall lower mitochondrial DNA and duplicated sequences interference relative to ATAC-see. Using NicE-viewSeq, we interrogated the accessibility of chromatin in a cell cycle (G1, S and G2/M) - specific manner using mammalian cells. Despite DNA replication and subsequent condensation of chromatin to chromosomes, chromatin accessibility remained generally preserved with minimal subtle alterations. Genome-wide alteration of chromatin accessibility within TSS and enhancer elements gradually decreased as cells progressed from G1 to G2M, with distinct differential accessibility near consensus transcription factors sites. Inhibition of histone deacetylases promoted accessible chromatin within gene bodies, correlating with apoptotic gene expression. In addition, reduced chromatin accessibility for the MYC oncogene pathway correlated with down regulation of pertinent genes. Surprisingly, repetitive RNA loci expression remained unaltered following histone acetylation-mediated increased accessibility. Therefore, we suggest that subtle changes in chromatin accessibility is a prerequisite during cell cycle and histone deacetylase inhibitor mediated therapeutics.


2019 ◽  
Author(s):  
Casey A. Thornton ◽  
Ryan M. Mulqueen ◽  
Andrew Nishida ◽  
Kristof A. Torkenczy ◽  
Eve G. Lowenstein ◽  
...  

AbstractHigh-throughput single-cell epigenomic assays can resolve the heterogeneity of cell types and states in complex tissues, however, spatial orientation within the network of interconnected cells is lost. Here, we present a novel method for highly scalable, spatially resolved, single-cell profiling of chromatin states. We use high-density multiregional sampling to perform single-cell combinatorial indexing on Microbiopsies Assigned to Positions for the Assay for Transposase Accessible Chromatin (sciMAP-ATAC) to produce single-cell data of an equivalent quality to non-spatially resolved single-cell ATAC-seq, where each cell is localized to a three-dimensional position within the tissue. A typical experiment comprises between 96 and 384 spatially mapped tissue positions, each producing 10s to over 100 individual single-cell ATAC-seq profiles, and a typical resolution of 214 cubic microns; with the ability to tune the resolution and cell throughput to suit each target application. We apply sciMAP-ATAC to the adult mouse primary somatosensory cortex, where we profile cortical lamination and demonstrate the ability to analyze data from a single tissue position or compare a single cell type in adjacent positions. We also profile the human primary visual cortex, where we produce spatial trajectories through the cortex. Finally, we characterize the spatially progressive nature of cerebral ischemic infarct in the mouse brain using a model of transient middle cerebral artery occlusion. We leverage the spatial information to identify novel and known transcription factor activities that vary by proximity to the ischemic infarction core with cell type specificity.


2017 ◽  
Author(s):  
Antonina Hafner ◽  
Lyubov Kublo ◽  
Galit Lahav ◽  
Jacob Stewart-Ornstein

AbstractThe tumor suppressor p53 is a major regulator of the DNA damage response and has been suggested to selectively bind and activate cell type specific gene expression programs, however recent studies and meta-analyses of genomic data propose largely uniform, and condition independent, p53 binding. To systematically assess the cell type specificity of p53, we measured its association with DNA in 12 p53 wild-type cell lines, from a range of epithelial linages, in response to ionizing radiation. We found that the majority of bound sites were occupied across all cell lines, however we also identified a subset of binding sites that were specific to one or a few cell lines. Unlike the shared p53-bound genome, which was not dependent on chromatin accessibility, the association of p53 with these atypical binding sites was well explained by chromatin accessibility and could be modulated by forcing cell state changes such as the epithelial-to-mesenchymal transition. These results position p53 as having both universal and cell type specific regulatory programs that have different regulators and dependence on chromatin state.


2019 ◽  
Vol 11 (10) ◽  
pp. 3035-3053 ◽  
Author(s):  
Lee E Edsall ◽  
Alejandro Berrio ◽  
William H Majoros ◽  
Devjanee Swain-Lenz ◽  
Shauna Morrow ◽  
...  

Abstract Changes in transcriptional regulation are thought to be a major contributor to the evolution of phenotypic traits, but the contribution of changes in chromatin accessibility to the evolution of gene expression remains almost entirely unknown. To address this important gap in knowledge, we developed a new method to identify DNase I Hypersensitive (DHS) sites with differential chromatin accessibility between species using a joint modeling approach. Our method overcomes several limitations inherent to conventional threshold-based pairwise comparisons that become increasingly apparent as the number of species analyzed rises. Our approach employs a single quantitative test which is more sensitive than existing pairwise methods. To illustrate, we applied our joint approach to DHS sites in fibroblast cells from five primates (human, chimpanzee, gorilla, orangutan, and rhesus macaque). We identified 89,744 DHS sites, of which 41% are identified as differential between species using the joint model compared with 33% using the conventional pairwise approach. The joint model provides a principled approach to distinguishing single from multiple chromatin accessibility changes among species. We found that nondifferential DHS sites are enriched for nucleotide conservation. Differential DHS sites with decreased chromatin accessibility relative to rhesus macaque occur more commonly near transcription start sites (TSS), while those with increased chromatin accessibility occur more commonly distal to TSS. Further, differential DHS sites near TSS are less cell type-specific than more distal regulatory elements. Taken together, these results point to distinct classes of DHS sites, each with distinct characteristics of selection, genomic location, and cell type specificity.


2019 ◽  
Author(s):  
Qiao Liu ◽  
Wing Hung Wong ◽  
Rui Jiang

AbstractRegulatory elements (REs) in human genome are major sites of non-coding transcription which lack adequate interpretation. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it remains a big challenge to systematically and accurately characterize REs in the context of a specific cell type. To address this problem, we proposed DeepCAGE, an deep learning framework that incorporates transcriptome profile of human transcription factors (TFs) for accurately predicting the activities of cell type-specific REs. Our approach automatically learns the regulatory code of input DNA sequence incorporated with cell type-specific TFs expression. In a series of systematic comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions, but also the regression of DNase-seq signals. A typical scenario of usage for our method is to predict the activities of REs in novel cell types, especially where the chromatin accessibility data is not available. To sum up, our study provides a fascinating insight into disclosing complex regulatory mechanism by integrating transcriptome profile of human TFs.


2018 ◽  
Author(s):  
Eugene Urrutia ◽  
Li Chen ◽  
Haibo Zhou ◽  
Yuchao Jiang

AbstractSummarySingle-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique sparse, binary, and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic data sets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all data sets and platforms. For demonstration, we further applied Destin to 2,088 adult mouse forebrain cells and identified cell type-specific association of previously reported schizophrenia GWAS loci.AvailabilityDestin toolkit is freely available as an R package at https://github.com/urrutiag/[email protected].


2018 ◽  
Author(s):  
John R. Sinnamon ◽  
Kristof A. Torkenczy ◽  
Michael W. Linhoff ◽  
Sarah Vitak ◽  
Hannah A. Pliner ◽  
...  

ABSTRACTHere we present a comprehensive map of the accessible chromatin landscape of the mouse hippocampus at single-cell resolution. Substantial advances of this work include the optimization of single-cell combinatorial indexing assay for transposase accessible chromatin (sci-ATAC-seq), a software suite,scitools, for the rapid processing and visualization of single-cell combinatorial indexing datasets, and a valuable resource of hippocampal regulatory networks at single-cell resolution. We utilized sci-ATAC-seq to produce 2,346 high-quality single-cell chromatin accessibility maps with a mean unique read count per cell of 29,201 from both fresh and frozen hippocampi, observing little difference in accessibility patterns between the preparations. Using this dataset, we identified eight distinct major clusters of cells representing both neuronal and non-neuronal cell types and characterized the driving regulatory factors and differentially accessible loci that define each cluster. We then applied a recently described co-accessibility framework,Cicero, which identified 146,818 links between promoters and putative distal regulatory DNA. Identified co-accessibility networks showed cell-type specificity, shedding light on key dynamic loci that reconfigure to specify hippocampal cell lineages. Lastly, we carried out an additional sci-ATAC-seq preparation from cultured hippocampal neurons (899 high-quality cells, 43,532 mean unique reads) that revealed substantial alterations in their epigenetic landscape compared to nuclei from hippocampal tissue. This dataset and accompanying analysis tools provide a new resource that can guide subsequent studies of the hippocampus.


2018 ◽  
Author(s):  
Xi Chen ◽  
Ricardo J Miragaia ◽  
Kedar Nath Natarajan ◽  
Sarah A Teichmann

AbstractThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) is widely used to identify regulatory regions throughout the genome. However, very few studies have been performed at the single cell level (scATAC-seq) due to technical challenges. Here we developed a simple and robust plate-based scATAC-seq method, combining upfront bulk Tn5 tagging with single-nuclei sorting. We demonstrated that our method worked robustly across various systems, including fresh and cryopreserved cells from primary tissues. By profiling over 3,000 splenocytes, we identify distinct immune cell types and reveal cell type-specific regulatory regions and related transcription factors.


Sign in / Sign up

Export Citation Format

Share Document