scholarly journals Systematic analysis of low-affinity transcription factor binding site clusters in vitro and in vivo establishes their functional relevance

2021 ◽  
Author(s):  
Amir Shahein ◽  
Maria L&oacutepez-Malo ◽  
Ivan Istomin ◽  
Evan J. Olson ◽  
Shiyu Cheng ◽  
...  

Transcription factor binding to a single binding site and its functional consequence in a promoter context are beginning to be relatively well understood. However, binding to clusters of sites has yet to be characterized in depth, and the functional relevance of binding site clusters remains uncertain.We employed a high-throughput biochemical method to characterize transcription factor binding to clusters varying across a range of affinities and configurations. We found that transcription factors can bind concurrently to overlapping sites, challenging the notion of binding exclusivity. Further-more, compared to an individual high-affinity binding site, small clusters with binding sites an order of magnitude lower in affinity give rise to higher mean occupancies at physiologically-relevant transcription factor concentrations in vitro. To assess whether the observed in vitro occupancies translate to transcriptional activation in vivo, we tested low-affinity binding site clusters by inserting them into a synthetic minimal CYC1 and the native PHO5 S. cerevisiae promoter. In the minCYC1 promoter, clusters of low-affinity binding sites can generate transcriptional output comparable to a promoter containing three consensus binding sites. In the PHO5 promoter, replacing the native Pho4 binding sites with clusters of low-affinity binding sites recovered activation of these promoters as well. This systematic characterization demonstrates that clusters of low-affinity binding sites achieve substantial occupancies, and that this occupancy can drive expression in eukaryotic promoters

2014 ◽  
Vol 35 (4) ◽  
pp. 688-698 ◽  
Author(s):  
Robert M. Yarrington ◽  
Jared S. Rudd ◽  
David J. Stillman

Promoters often contain multiple binding sites for a single factor. The yeastHOgene contains nine highly conserved binding sites for the SCB (Swi4/6-dependent cell cycle box) binding factor (SBF) complex (composed of Swi4 and Swi6) in the 700-bp upstream regulatory sequence 2 (URS2) promoter region. Here, we show that the distal and proximal SBF sites in URS2 function differently. Chromatin immunoprecipitation (ChIP) experiments show that SBF binds preferentially to the left side of URS2 (URS2-L), despite equivalent binding to the left-half and right-half SBF sitesin vitro. SBF binding at URS2-L sites depends on prior chromatin remodeling events at the upstream URS1 region. These signals from URS1 influence chromatin changes at URS2 but only at sites within a defined distance. SBF bound at URS2-L, however, is unable to activate transcription but instead facilitates SBF binding to sites in the right half (URS2-R), which are required for transcriptional activation. Factor binding atHO, therefore, follows a temporal cascade, with SBF bound at URS2-L serving to relay a signal from URS1 to the SBF sites in URS2-R that ultimately activate gene expression. Taken together, we describe a novel property of a transcription factor that can have two distinct roles in gene activation, depending on its location within a promoter.


2015 ◽  
Vol 9S4 ◽  
pp. BBI.S29330
Author(s):  
Stephen A. Ramsey

A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5’ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis-Hastings with an information entropybased move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences.


2008 ◽  
Vol 2008 ◽  
pp. 1-9 ◽  
Author(s):  
J. Sunil Rao ◽  
Suresh Karanam ◽  
Colleen D. McCabe ◽  
Carlos S. Moreno

Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. Results. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of those pairs, 9107 genes contained conserved TFBS in the 3 kb proximal promoter and first intron. To attempt to predict in vivo occupancy of transcription factor binding sites, we developed a novel marginal effect isolator algorithm that builds upon Bayesian methods for multigroup TFBS filtering and predicted the in vivo occupancy of two transcription factors with an overall accuracy of 84%. Conclusion. Our analyses show that integration of chromatin immunoprecipitation data with conserved TFBS analysis can be used to generate accurate predictions of functional TFBS. They also show that TFBS cooccurrence can be used to predict transcription factor binding to promoters in vivo.


2021 ◽  
Author(s):  
Chen Chen ◽  
Jie Hou ◽  
Xiaowen Shi ◽  
Hua Yang ◽  
James A. Birchler ◽  
...  

Abstract BackgroundDue to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors.ResultsIn this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN.ConclusionsDeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Lianggang Huang ◽  
Xuejie Li ◽  
Liangbo Dong ◽  
Bin Wang ◽  
Li Pan

Abstract Background The identification of open chromatin regions and transcription factor binding sites (TFBs) is an important step in understanding the regulation of gene expression in diverse species. ATAC-seq is a technique used for such purpose by providing high-resolution measurements of chromatin accessibility revealed through integration of Tn5 transposase. However, the existence of cell walls in filamentous fungi and associated difficulty in purifying nuclei have precluded the routine application of this technique, leading to a lack of experimentally determined and computationally inferred data on the identity of genome-wide cis-regulatory elements (CREs) and TFBs. In this study, we constructed an ATAC-seq platform suitable for filamentous fungi and generated ATAC-seq libraries of Aspergillus niger and Aspergillus oryzae grown under a variety of conditions. Results We applied the ATAC-seq assay for filamentous fungi to delineate the syntenic orthologue and differentially changed chromatin accessibility regions among different Aspergillus species, during different culture conditions, and among specific TF-deleted strains. The syntenic orthologues of accessible regions were responsible for the conservative functions across Aspergillus species, while regions differentially changed between culture conditions and TFs mutants drove differential gene expression programs. Importantly, we suggest criteria to determine TFBs through the analysis of unbalanced cleavage of distinct TF-bound DNA strands by Tn5 transposase. Based on this criterion, we constructed data libraries of the in vivo genomic footprint of A. niger under distinct conditions, and generated a database of novel transcription factor binding motifs through comparison of footprints in TF-deleted strains. Furthermore, we validated the novel TFBs in vivo through an artificial synthetic minimal promoter system. Conclusions We characterized the chromatin accessibility regions of filamentous fungi species, and identified a complete TFBs map by ATAC-seq, which provides valuable data for future analyses of transcriptional regulation in filamentous fungi.


2015 ◽  
Vol 197 (15) ◽  
pp. 2454-2457 ◽  
Author(s):  
Ivan Erill

Experimentally verified transcription factor-binding sites represent an information-rich and highly applicable data type that aptly summarizes the results of time-consuming experiments and inference processes. Currently, there is no centralized repository for this type of data, which is routinely embedded in articles and extremely hard to mine. CollecTF provides the first standardized resource for submission and deposition of these data into the NCBI RefSeq database, maximizing its accessibility and prompting the community to adopt direct submission policies.


Sign in / Sign up

Export Citation Format

Share Document