Bioinformatics of Transcription Factor Binding Prediction

2021 ◽  
pp. 203-221
Author(s):  
Erick I. Navarro-Delgado ◽  
Marisol Salgado-Albarrán ◽  
Karla Torres-Arciga ◽  
Nicolas Alcaraz ◽  
Ernesto Soto-Reyes ◽  
...  
2018 ◽  
Vol 35 (9) ◽  
pp. 1608-1609 ◽  
Author(s):  
Florian Schmidt ◽  
Fabian Kern ◽  
Peter Ebert ◽  
Nina Baumgarten ◽  
Marcel H Schulz

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Janik Sielemann ◽  
Donat Wulf ◽  
Romy Schmidt ◽  
Andrea Bräutigam

AbstractUnderstanding gene expression will require understanding where regulatory factors bind genomic DNA. The frequently used sequence-based motifs of protein-DNA binding are not predictive, since a genome contains many more binding sites than are actually bound and transcription factors of the same family share similar DNA-binding motifs. Traditionally, these motifs only depict sequence but neglect DNA shape. Since shape may contribute non-linearly and combinational to binding, machine learning approaches ought to be able to better predict transcription factor binding. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. We observed that DNA shape features were individually weighted for each transcription factor, even if they shared the same binding sequence.


2020 ◽  
Author(s):  
Janik Sielemann ◽  
Donat Wulf ◽  
Romy Schmidt ◽  
Andrea Bräutigam

A genome encodes two types of information, the “what can be made” and the “when and where”. The “what” are mostly proteins which perform the majority of functions within living organisms and the “when and where” is the regulatory information that encodes when and where proteins are made. Currently, it is possible to efficiently predict the majority of the protein content of a genome but nearly impossible to predict the transcriptional regulation. This regulation is based upon the interaction between transcription factors and genomic sequences at the site of binding motifs1,2,3. Information contained within the motif is necessary to predict transcription factor binding, however, it is not sufficient4. Peaks detected in amplified DNA affinity purification sequencing (ampDAP-seq) and the motifs derived from them only partially overlap in the genome3 indicating that the sequence holds information beyond the binding motif. Here we show a random forest machine learning approach which incorporates the 3D-shape improved the area under the precision recall curve for binding prediction for all 216 tested Arabidopsis thaliana transcription factors. The method resolved differential binding of transcription factor family members which share the same binding motif. The models correctly predicted the binding behavior of novel, not-in-genome motif sequences. Understanding transcription factor binding as a combination of motif sequence and motif shape brings us closer to predicting gene expression from promoter sequence.


Sign in / Sign up

Export Citation Format

Share Document