scholarly journals accuEnhancer: Accurate enhancer prediction by integration of multiple cell type data with deep learning

2020 ◽  
Author(s):  
Yi-An Tung ◽  
Wen-Tse Yang ◽  
Tsung-Ting Hsieh ◽  
Yu-Chuan Chang ◽  
June-Tai Wu ◽  
...  

AbstractEnhancers are one class of the regulatory elements that have been shown to act as key components to assist promoters in modulating the gene expression in living cells. At present, the number of enhancers as well as their activities in different cell types are still largely unclear. Previous studies have shown that enhancer activities are associated with various functional data, such as histone modifications, sequence motifs, and chromatin accessibilities. In this study, we utilized DNase data to build a deep learning model for predicting the H3K27ac peaks as the active enhancers in a target cell type. We propose joint training of multiple cell types to boost the model performance in predicting the enhancer activities of an unstudied cell type. The results demonstrated that by incorporating more datasets across different cell types, the complex regulatory patterns could be captured by deep learning models and the prediction accuracy can be largely improved. The analyses conducted in this study demonstrated that the cell type-specific enhancer activity can be predicted by joint learning of multiple cell type data using only DNase data and the primitive sequences as the input features. This reveals the importance of cross-cell type learning, and the constructed model can be applied to investigate potential active enhancers of a novel cell type which does not have the H3K27ac modification data yet.AvailabilityThe accuEnhancer package can be freely accessed at: https://github.com/callsobing/accuEnhancer

2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario B. Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


2021 ◽  
Vol 17 (12) ◽  
pp. e1009670
Author(s):  
Asa Thibodeau ◽  
Shubham Khetan ◽  
Alper Eroglu ◽  
Ryan Tewhey ◽  
Michael L. Stitzel ◽  
...  

Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cellswithout the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.


2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

Abstract ObjectiveComputational identification of cell type-specific regulatory elements on a genome-wide scale is very challenging.ResultsWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


2018 ◽  
Author(s):  
Sahra Uygun ◽  
Christina B. Azodi ◽  
Shin-Han Shiu

AbstractMulticellular organisms have diverse cell types with distinct roles in development and responses to the environment. At the transcriptional level, the differences in environmental response between cell types are due to differences in regulatory programs. In plants, although cell-type environmental responses have been examined, details on how these responses are regulated remain spotty. Here, we identify a set of putative cis-regulatory elements (pCREs) enriched in the promoters of genes responsive to high salinity stress in six Arabidopsis thaliana root cell types. Using machine learning with pCREs as predictors, we establish cis-regulatory codes, i.e. models predicting whether a gene is responsive to high salinity for each cell type. These pCRE-based models outperform models utilizing in vitro binding data of 758 A. thaliana transcription factors. Surprisingly, organ pCREs identified based on whole root high salinity response can predict cell-type responses as well as pCREs derived from cell-type data -because organ and cell-type pCREs predict complementary subsets of high salinity response genes. Our findings not only advance our understanding of the regulatory mechanisms of plant spatial transcriptional response through cis-regulatory codes, but also suggest broad applicability of the approach to any species, particularly those with little or no trans regulatory data.


BMC Genomics ◽  
2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Yueh-Hua Tu ◽  
Hsueh-Fen Juan ◽  
Hsuan-Cheng Huang

Abstract Background A new class of regulatory elements called super-enhancers, comprised of multiple neighboring enhancers, have recently been reported to be the key transcriptional drivers of cellular, developmental, and disease states. Results Here, we defined super-enhancer RNAs as highly expressed enhancer RNAs that are transcribed from a cluster of localized genomic regions. Using the cap analysis of gene expression sequencing data from FANTOM5, we systematically explored the enhancer and messenger RNA landscapes in hundreds of different cell types in response to various environments. Applying non-negative matrix factorization (NMF) to super-enhancer RNA profiles, we found that different cell types were well classified. In addition, through the NMF of individual time-course profiles from a single cell-type, super-enhancer RNAs were clustered into several states with progressive patterns. We further investigated the enriched biological functions of the proximal genes involved in each pattern, and found that they were associated with the corresponding developmental process. Conclusions The proposed super-enhancer RNAs can act as a good alternative, without the complicated measurement of histone modifications, for identifying important regulatory elements of cell type specification and identifying dynamic cell states.


2017 ◽  
Author(s):  
Yueh-Hua Tu ◽  
Hsueh-Fen Juan ◽  
Hsuan-Cheng Huang

AbstractA new class of regulatory elements called super-enhancers, comprised of multiple neighboring enhancers, have recently been reported to be the key transcriptional drivers of cellular, developmental, and disease states. Here, we propose to define super-enhancer RNA as highly expressed enhancer RNAs that are transcribed from a cluster of localized genomic regions. Using the cap analysis of gene expression sequencing data from FANTOM5, we systematically explored the enhancer and messenger RNA landscapes in hundreds of different cell types in response to various environments. Applying non-negative matrix factorization (NMF) to super-enhancer RNA profiles, we found that different cell types were well classified. In addition, through the NMF of individual time-course profiles from a single cell-type, super-enhancer RNAs were clustered into several states with progressive patterns. We further investigated the enriched biological functions of the proximal genes involved in each pattern, and found that they were associated with the corresponding developmental process. The proposed super-enhancer RNAs can act as a good alternative, without the complicated measurement of histone modifications, for identifying important regulatory elements of cell type specification and identifying dynamic cell states.


2019 ◽  
Author(s):  
Sinisa Hrvatin ◽  
Christopher P. Tzeng ◽  
M. Aurel Nagy ◽  
Hume Stroud ◽  
Charalampia Koutsioumpa ◽  
...  

AbstractEnhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify and test the function of enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method that leverages ATAC- and single-cell RNA-sequencing protocols, to characterize cell-type-specific enhancers that should enable genetic access and perturbation of gene function across mammalian cell types. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.One sentence summaryHighly paralleled functional evaluation of enhancer activity in single cells generates new cell-type-specific tools with broad medical and scientific applications.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Yupeng Wang ◽  
Rosario B. Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

Abstract Objective To address the challenge of computational identification of cell type-specific regulatory elements on a genome-wide scale. Results We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


Sign in / Sign up

Export Citation Format

Share Document