scholarly journals Computational Approaches for Transcription Factor Binding Prediction

2014 ◽  
Vol 94 (11) ◽  
pp. 1-5
Author(s):  
Smitha CS ◽  
Saritha R
2021 ◽  
pp. 203-221
Author(s):  
Erick I. Navarro-Delgado ◽  
Marisol Salgado-Albarrán ◽  
Karla Torres-Arciga ◽  
Nicolas Alcaraz ◽  
Ernesto Soto-Reyes ◽  
...  

2018 ◽  
Vol 35 (9) ◽  
pp. 1608-1609 ◽  
Author(s):  
Florian Schmidt ◽  
Fabian Kern ◽  
Peter Ebert ◽  
Nina Baumgarten ◽  
Marcel H Schulz

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Janik Sielemann ◽  
Donat Wulf ◽  
Romy Schmidt ◽  
Andrea Bräutigam

AbstractUnderstanding gene expression will require understanding where regulatory factors bind genomic DNA. The frequently used sequence-based motifs of protein-DNA binding are not predictive, since a genome contains many more binding sites than are actually bound and transcription factors of the same family share similar DNA-binding motifs. Traditionally, these motifs only depict sequence but neglect DNA shape. Since shape may contribute non-linearly and combinational to binding, machine learning approaches ought to be able to better predict transcription factor binding. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. We observed that DNA shape features were individually weighted for each transcription factor, even if they shared the same binding sequence.


2020 ◽  
Vol 26 (42) ◽  
pp. 7641-7654 ◽  
Author(s):  
Tao Ma ◽  
Zhenqing Ye ◽  
Liguo Wang

Background: Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. Objective: This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. Conclusion: ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome- wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux.


2020 ◽  
Author(s):  
Janik Sielemann ◽  
Donat Wulf ◽  
Romy Schmidt ◽  
Andrea Bräutigam

A genome encodes two types of information, the “what can be made” and the “when and where”. The “what” are mostly proteins which perform the majority of functions within living organisms and the “when and where” is the regulatory information that encodes when and where proteins are made. Currently, it is possible to efficiently predict the majority of the protein content of a genome but nearly impossible to predict the transcriptional regulation. This regulation is based upon the interaction between transcription factors and genomic sequences at the site of binding motifs1,2,3. Information contained within the motif is necessary to predict transcription factor binding, however, it is not sufficient4. Peaks detected in amplified DNA affinity purification sequencing (ampDAP-seq) and the motifs derived from them only partially overlap in the genome3 indicating that the sequence holds information beyond the binding motif. Here we show a random forest machine learning approach which incorporates the 3D-shape improved the area under the precision recall curve for binding prediction for all 216 tested Arabidopsis thaliana transcription factors. The method resolved differential binding of transcription factor family members which share the same binding motif. The models correctly predicted the binding behavior of novel, not-in-genome motif sequences. Understanding transcription factor binding as a combination of motif sequence and motif shape brings us closer to predicting gene expression from promoter sequence.


Sign in / Sign up

Export Citation Format

Share Document