scholarly journals Discovering differential genome sequence activity with interpretable and efficient deep learning

2021 ◽  
Vol 17 (8) ◽  
pp. e1009282
Author(s):  
Jennifer Hammelman ◽  
David K. Gifford

Discovering sequence features that differentially direct cells to alternate fates is key to understanding both cellular development and the consequences of disease related mutations. We introduce Expected Pattern Effect and Differential Expected Pattern Effect, two black-box methods that can interpret genome regulatory sequences for cell type-specific or condition specific patterns. We show that these methods identify relevant transcription factor motifs and spacings that are predictive of cell state-specific chromatin accessibility. Finally, we integrate these methods into framework that is readily accessible to non-experts and available for download as a binary or installed via PyPI or bioconda at https://cgs.csail.mit.edu/deepaccess-package/.

2021 ◽  
Author(s):  
Jennifer Hammelman ◽  
David K Gifford

AbstractDiscovering sequence features that differentially direct cells to alternate fates is key to understanding both cellular development and the consequences of disease related mutations. Here we present a new method that efficiently learns sequence features that can predict cell state-specific chromatin accessibility in a framework that is readily accessible to non-experts.


2019 ◽  
Author(s):  
Qiao Liu ◽  
Wing Hung Wong ◽  
Rui Jiang

AbstractRegulatory elements (REs) in human genome are major sites of non-coding transcription which lack adequate interpretation. Although computational approaches have been complementing high-throughput biological experiments towards the annotation of the human genome, it remains a big challenge to systematically and accurately characterize REs in the context of a specific cell type. To address this problem, we proposed DeepCAGE, an deep learning framework that incorporates transcriptome profile of human transcription factors (TFs) for accurately predicting the activities of cell type-specific REs. Our approach automatically learns the regulatory code of input DNA sequence incorporated with cell type-specific TFs expression. In a series of systematic comparison with existing methods, we show the superior performance of our model in not only the classification of accessible regions, but also the regression of DNase-seq signals. A typical scenario of usage for our method is to predict the activities of REs in novel cell types, especially where the chromatin accessibility data is not available. To sum up, our study provides a fascinating insight into disclosing complex regulatory mechanism by integrating transcriptome profile of human TFs.


2017 ◽  
Author(s):  
Paja Sijacic ◽  
Marko Bajic ◽  
Elizabeth C. McKinney ◽  
Richard B. Meagher ◽  
Roger B. Deal

AbstractBackgroundCell differentiation is driven by changes in transcription factor (TF) activity and subsequent alterations in transcription. To study this process, differences in TF binding between cell types can be deduced by methods that probe chromatin accessibility. We used cell type-specific nuclei purification followed by the Assay for Transposase Accessible Chromatin (ATAC-seq) to delineate differences in chromatin accessibility and TF regulatory networks between stem cells of the shoot apical meristem (SAM) and differentiated leaf mesophyll cells ofArabidopsis thaliana.ResultsChromatin accessibility profiles of SAM stem cells and leaf mesophyll cells were highly similar at a qualitative level, yet thousands of regions of quantitatively different chromatin accessibility were also identified. We found that chromatin regions preferentially accessible in mesophyll cells tended to also be substantially accessible in the stem cells as compared to the genome-wide average, whereas the converse was not true. Analysis of genomic regions preferentially accessible in each cell type identified hundreds of overrepresented TF binding motifs, highlighting a set of TFs that are likely important for each cell type. Among these, we found evidence for extensive co-regulation of target genes by multiple TFs that are preferentially expressed in one cell type or the other. For example, a set of zinc-finger TFs appear to control a suite of growth-and development-related genes specifically in stem cells, while another TF set co-regulates genes involved in light responses and photosynthesis specifically in mesophyll cells. Interestingly, the TFs within both of these sets also show evidence of extensively co-regulating each other.ConclusionsQuantitative analysis of chromatin accessibility differences between stem cells and differentiated mesophyll cells allowed us to identify TF regulatory networks and downstream target genes that are likely to be functionally important in each cell type. Our findings that mesophyll cell-enriched accessible sites tend to already be substantially accessible in stem cells, but not vice versa, suggests that widespread regulatory element accessibility may be important for the developmental plasticity of stem cells. This work also demonstrates the utility of cell type-specific chromatin accessibility profiling in quickly developing testable models of regulatory control differences between cell types.


2021 ◽  
Vol 22 (9) ◽  
pp. 4959
Author(s):  
Lilas Courtot ◽  
Elodie Bournique ◽  
Chrystelle Maric ◽  
Laure Guitton-Sert ◽  
Miguel Madrid-Mencía ◽  
...  

DNA replication timing (RT), reflecting the temporal order of origin activation, is known as a robust and conserved cell-type specific process. Upon low replication stress, the slowing of replication forks induces well-documented RT delays associated to genetic instability, but it can also generate RT advances that are still uncharacterized. In order to characterize these advanced initiation events, we monitored the whole genome RT from six independent human cell lines treated with low doses of aphidicolin. We report that RT advances are cell-type-specific and involve large heterochromatin domains. Importantly, we found that some major late to early RT advances can be inherited by the unstressed next-cellular generation, which is a unique process that correlates with enhanced chromatin accessibility, as well as modified replication origin landscape and gene expression in daughter cells. Collectively, this work highlights how low replication stress may impact cellular identity by RT advances events at a subset of chromosomal domains.


1992 ◽  
Vol 12 (2) ◽  
pp. 552-562
Author(s):  
L Pani ◽  
X B Quian ◽  
D Clevidence ◽  
R H Costa

The transcription factor hepatocyte nuclear factor 3 (HNF-3) is involved in the coordinate expression of several liver genes. HNF-3 DNA binding activity is composed of three different liver proteins which recognize the same DNA site. The HNF-3 proteins (designated alpha, beta, and gamma) possess homology in the DNA binding domain and in several additional regions. To understand the cell-type-specific expression of HNF-3 beta, we have defined the regulatory sequences that elicit hepatoma-specific expression. Promoter activity requires -134 bp of HNF-3 beta proximal sequences and binds four nuclear proteins, including two ubiquitous factors. One of these promoter sites interacts with a novel cell-specific factor, LF-H3 beta, whose binding activity correlates with the HNF-3 beta tissue expression pattern. Furthermore, there is a binding site for the HNF-3 protein within its own promoter, suggesting that an autoactivation mechanism is involved in the establishment of HNF-3 beta expression. We propose that both the LF-H3 beta and HNF-3 sites play an important role in the cell-type-specific expression of the HNF-3 beta transcription factor.


2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario B. Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


Sign in / Sign up

Export Citation Format

Share Document