scholarly journals Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome

2018 ◽  
Author(s):  
Mehran Karimzadeh ◽  
Michael M. Hoffman

AbstractMotivationIdentifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.ResultsWe developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, pre-dicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3).AvailabilityThe datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).

2016 ◽  
Vol 2016 ◽  
pp. 1-27 ◽  
Author(s):  
Kristopher J. L. Irizarry ◽  
Randall L. Bryden

Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons.


2021 ◽  
Vol 49 (17) ◽  
pp. 9809-9820
Author(s):  
Wakana Koda ◽  
Satoshi Senmatsu ◽  
Takuya Abe ◽  
Charles S Hoffman ◽  
Kouji Hirota

Abstract Transcriptional regulation, a pivotal biological process by which cells adapt to environmental fluctuations, is achieved by the binding of transcription factors to target sequences in a sequence-specific manner. However, how transcription factors recognize the correct target from amongst the numerous candidates in a genome has not been fully elucidated. We here show that, in the fission-yeast fbp1 gene, when transcription factors bind to target sequences in close proximity, their binding is reciprocally stabilized, thereby integrating distinct signal transduction pathways. The fbp1 gene is massively induced upon glucose starvation by the activation of two transcription factors, Atf1 and Rst2, mediated via distinct signal transduction pathways. Atf1 and Rst2 bind to the upstream-activating sequence 1 region, carrying two binding sites located 45 bp apart. Their binding is reciprocally stabilized due to the close proximity of the two target sites, which destabilizes the independent binding of Atf1 or Rst2. Tup11/12 (Tup-family co-repressors) suppress independent binding. These data demonstrate a previously unappreciated mechanism by which two transcription-factor binding sites, in close proximity, integrate two independent-signal pathways, thereby behaving as a hub for signal integration.


2021 ◽  
Author(s):  
Chen Chen ◽  
Jie Hou ◽  
Xiaowen Shi ◽  
Hua Yang ◽  
James A. Birchler ◽  
...  

Abstract BackgroundDue to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors.ResultsIn this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN.ConclusionsDeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Chen Chen ◽  
Jie Hou ◽  
Xiaowen Shi ◽  
Hua Yang ◽  
James A. Birchler ◽  
...  

Abstract Background Due to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors. Results In this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN. Conclusions DeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.


2020 ◽  
Author(s):  
Chen Chen ◽  
Jie Hou ◽  
Xiaowen Shi ◽  
Hua Yang ◽  
James A. Birchler ◽  
...  

Abstract Background Due to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors. Results In this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN. Conclusions DeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.


2009 ◽  
Vol 2009 ◽  
pp. 1-8 ◽  
Author(s):  
K. Shameer ◽  
S. Ambika ◽  
Susan Mary Varghese ◽  
N. Karaba ◽  
M. Udayakumar ◽  
...  

Elucidating the key players of molecular mechanism that mediate the complex stress-responses in plants system is an important step to develop improved variety of stress tolerant crops. Understanding the effects of different types of biotic and abiotic stress is a rapidly emerging domain in the area of plant research to develop better, stress tolerant plants. Information about the transcription factors, transcription factor binding sites, function annotation of proteins coded by genes expressed during abiotic stress (for example: drought, cold, salinity, excess light, abscisic acid, and oxidative stress) response will provide better understanding of this phenomenon. STIFDB is a database of abiotic stress responsive genes and their predicted abiotic transcription factor binding sites in Arabidopsis thaliana. We integrated 2269 genes upregulated in different stress related microarray experiments and surveyed their 1000 bp and 100 bp upstream regions and 5′UTR regions using the STIF algorithm and identified putative abiotic stress responsive transcription factor binding sites, which are compiled in the STIFDB database. STIFDB provides extensive information about various stress responsive genes and stress inducible transcription factors of Arabidopsis thaliana. STIFDB will be a useful resource for researchers to understand the abiotic stress regulome and transcriptome of this important model plant system.


2018 ◽  
Author(s):  
Ali Shariati ◽  
Antonia Dominguez ◽  
Marius Wernig ◽  
Lei S. Qi ◽  
Jan M. Skotheim

AbstractThe control of gene expression by transcription factor binding sites frequently determines phenotype. However, it has been difficult to assay the function of single transcription factor binding sites within larger transcription networks. Here, we developed such a method by using deactivated Cas9 to disrupt binding to specific sites on the genome. Since CRISPR guide RNAs are longer than transcription factor binding sites, flanking sequence can be used to target specific sites. Targeting deactivated Cas9 to a specific Oct4 binding site in the Nanog promoter blocked Oct4 binding, reduced Nanog expression, and slowed division. Multiple guide RNAs allows simultaneous inhibition of multiple binding sites and conditionally-destabilized dCas9 allows rapid reversibility. The method is a novel high-throughput approach to systematically interrogate cis-regulatory function within complex regulatory networks.


Sign in / Sign up

Export Citation Format

Share Document