scholarly journals Epitome: Predicting epigenetic events in novel cell types with multi-cell deep ensemble learning

2021 ◽  
Author(s):  
Alyssa Kramer Morrow ◽  
John Weston Hughes ◽  
Jahnavi Singh ◽  
Anthony Douglas Joseph ◽  
Nir Yosef

The accumulation of large epigenomics data consortiums provides us with the opportunity to extrapolate existing knowledge to new cell types and conditions. We propose Epitome, a deep neural network that learns similarities of chromatin accessibility between well characterized reference cell types and a query cellular context, and copies over signal of transcription factor binding and modification of histones from reference cell types when chromatin profiles are similar to the query. Epitome achieves state-of-the-art accuracy when predicting transcription factor binding sites on novel cellular contexts and can further improve predictions as more epigenetic signals are collected from both reference cell types and the query cellular context of interest.

2018 ◽  
Author(s):  
Mehran Karimzadeh ◽  
Michael M. Hoffman

AbstractMotivationIdentifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.ResultsWe developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, pre-dicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3).AvailabilityThe datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Lianggang Huang ◽  
Xuejie Li ◽  
Liangbo Dong ◽  
Bin Wang ◽  
Li Pan

Abstract Background The identification of open chromatin regions and transcription factor binding sites (TFBs) is an important step in understanding the regulation of gene expression in diverse species. ATAC-seq is a technique used for such purpose by providing high-resolution measurements of chromatin accessibility revealed through integration of Tn5 transposase. However, the existence of cell walls in filamentous fungi and associated difficulty in purifying nuclei have precluded the routine application of this technique, leading to a lack of experimentally determined and computationally inferred data on the identity of genome-wide cis-regulatory elements (CREs) and TFBs. In this study, we constructed an ATAC-seq platform suitable for filamentous fungi and generated ATAC-seq libraries of Aspergillus niger and Aspergillus oryzae grown under a variety of conditions. Results We applied the ATAC-seq assay for filamentous fungi to delineate the syntenic orthologue and differentially changed chromatin accessibility regions among different Aspergillus species, during different culture conditions, and among specific TF-deleted strains. The syntenic orthologues of accessible regions were responsible for the conservative functions across Aspergillus species, while regions differentially changed between culture conditions and TFs mutants drove differential gene expression programs. Importantly, we suggest criteria to determine TFBs through the analysis of unbalanced cleavage of distinct TF-bound DNA strands by Tn5 transposase. Based on this criterion, we constructed data libraries of the in vivo genomic footprint of A. niger under distinct conditions, and generated a database of novel transcription factor binding motifs through comparison of footprints in TF-deleted strains. Furthermore, we validated the novel TFBs in vivo through an artificial synthetic minimal promoter system. Conclusions We characterized the chromatin accessibility regions of filamentous fungi species, and identified a complete TFBs map by ATAC-seq, which provides valuable data for future analyses of transcriptional regulation in filamentous fungi.


2019 ◽  
Author(s):  
Ningxin Ouyang ◽  
Alan P. Boyle

AbstractTranscription is tightly regulated by cis-regulatory DNA elements where transcription factors can bind. Thus, identification of transcription factor binding sites is key to understanding gene expression and whole regulatory networks within a cell. The standard approaches for transcription factor binding sites (TFBSs) prediction such as position weight matrices (PWMs) and chromatin immunoprecipitation followed by sequencing (ChIP-seq) are widely used but have their drawbacks such as high false positive rates and limited antibody availability, respectively. Several computational footprinting algorithms have been developed to detect TFBSs by investigating chromatin accessibility patterns, but also have their limitations. To improve on these methods, we have developed a footprinting method to predict Transcription factor footpRints in Active Chromatin Elements (TRACE). Trace incorporates DNase-seq data and PWMs within a multivariate Hidden Markov Model (HMM) to detect footprint-like regions with matching motifs. Trace is an unsupervised method that accurately annotates binding sites for specific TFs automatically with no requirement on pre-generated candidate binding sites or ChIP-seq training data. Compared to published footprinting algorithms, TRACE has the best overall performance with the distinct advantage of targeting multiple motifs in a single model.


2017 ◽  
Vol 45 (8) ◽  
pp. 4315-4329 ◽  
Author(s):  
Xi Chen ◽  
Bowen Yu ◽  
Nicholas Carriero ◽  
Claudio Silva ◽  
Richard Bonneau

2019 ◽  
Author(s):  
Lianggang Huang ◽  
Xuejie Li ◽  
Liangbo Dong ◽  
Bin Wang ◽  
Li Pan

AbstractTo identify cis-regulatory elements (CREs) and motifs of TF binding is an important step in understanding the regulatory functions of TF binding and gene expression. The lack of experimentally determined and computationally inferred data means that the genome-wide CREs and TF binding sites (TFBs) in filamentous fungi remain unknown. ATAC-seq is a technique that provides a high-resolution measurement of chromatin accessibility to Tn5 transposase integration. In filamentous fungi, the existence of cell walls and the difficulty in purifying nuclei have prevented the routine application of this technique. Herein, we modified the ATAC-seq protocol in filamentous fungi to identify and map open chromatin and TF-binding sites on a genome-scale. We applied the assay for ATAC-seq among different Aspergillus species, during different culture conditions, and among TF-deficient strains to delineate open chromatin regions and TFBs across each genome. The syntenic orthologues regions and differential changes regions of chromatin accessibility were responsible for functional conservative regulatory elements and differential gene expression in the Aspergillus genome respectively. Importantly, 17 and 15 novel transcription factor binding motifs that were enriched in the genomic footprints identified from ATAC-seq data of A. niger, were verified in vivo by our artificial synthetic minimal promoter system, respectively. Furthermore, we first confirmed the strand-specific patterns of Tn5 transposase around the binding sites of known TFs by comparing ATAC-seq data of TF-deficient strains with the data from a wild-type strain.


Sign in / Sign up

Export Citation Format

Share Document