scholarly journals Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana

2020 ◽  
Author(s):  
Janik Sielemann ◽  
Donat Wulf ◽  
Romy Schmidt ◽  
Andrea Bräutigam

A genome encodes two types of information, the “what can be made” and the “when and where”. The “what” are mostly proteins which perform the majority of functions within living organisms and the “when and where” is the regulatory information that encodes when and where proteins are made. Currently, it is possible to efficiently predict the majority of the protein content of a genome but nearly impossible to predict the transcriptional regulation. This regulation is based upon the interaction between transcription factors and genomic sequences at the site of binding motifs1,2,3. Information contained within the motif is necessary to predict transcription factor binding, however, it is not sufficient4. Peaks detected in amplified DNA affinity purification sequencing (ampDAP-seq) and the motifs derived from them only partially overlap in the genome3 indicating that the sequence holds information beyond the binding motif. Here we show a random forest machine learning approach which incorporates the 3D-shape improved the area under the precision recall curve for binding prediction for all 216 tested Arabidopsis thaliana transcription factors. The method resolved differential binding of transcription factor family members which share the same binding motif. The models correctly predicted the binding behavior of novel, not-in-genome motif sequences. Understanding transcription factor binding as a combination of motif sequence and motif shape brings us closer to predicting gene expression from promoter sequence.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Janik Sielemann ◽  
Donat Wulf ◽  
Romy Schmidt ◽  
Andrea Bräutigam

AbstractUnderstanding gene expression will require understanding where regulatory factors bind genomic DNA. The frequently used sequence-based motifs of protein-DNA binding are not predictive, since a genome contains many more binding sites than are actually bound and transcription factors of the same family share similar DNA-binding motifs. Traditionally, these motifs only depict sequence but neglect DNA shape. Since shape may contribute non-linearly and combinational to binding, machine learning approaches ought to be able to better predict transcription factor binding. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. We observed that DNA shape features were individually weighted for each transcription factor, even if they shared the same binding sequence.


2009 ◽  
Vol 2009 ◽  
pp. 1-8 ◽  
Author(s):  
K. Shameer ◽  
S. Ambika ◽  
Susan Mary Varghese ◽  
N. Karaba ◽  
M. Udayakumar ◽  
...  

Elucidating the key players of molecular mechanism that mediate the complex stress-responses in plants system is an important step to develop improved variety of stress tolerant crops. Understanding the effects of different types of biotic and abiotic stress is a rapidly emerging domain in the area of plant research to develop better, stress tolerant plants. Information about the transcription factors, transcription factor binding sites, function annotation of proteins coded by genes expressed during abiotic stress (for example: drought, cold, salinity, excess light, abscisic acid, and oxidative stress) response will provide better understanding of this phenomenon. STIFDB is a database of abiotic stress responsive genes and their predicted abiotic transcription factor binding sites in Arabidopsis thaliana. We integrated 2269 genes upregulated in different stress related microarray experiments and surveyed their 1000 bp and 100 bp upstream regions and 5′UTR regions using the STIF algorithm and identified putative abiotic stress responsive transcription factor binding sites, which are compiled in the STIFDB database. STIFDB provides extensive information about various stress responsive genes and stress inducible transcription factors of Arabidopsis thaliana. STIFDB will be a useful resource for researchers to understand the abiotic stress regulome and transcriptome of this important model plant system.


2018 ◽  
Author(s):  
Mehran Karimzadeh ◽  
Michael M. Hoffman

AbstractMotivationIdentifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.ResultsWe developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, pre-dicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3).AvailabilityThe datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).


2016 ◽  
Vol 2016 ◽  
pp. 1-27 ◽  
Author(s):  
Kristopher J. L. Irizarry ◽  
Randall L. Bryden

Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons.


Cells ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 1435
Author(s):  
Yu-Chin Lien ◽  
Paul Zhiping Wang ◽  
Xueqing Maggie Lu ◽  
Rebecca A. Simmons

Intrauterine growth retardation (IUGR), which induces epigenetic modifications and permanent changes in gene expression, has been associated with the development of type 2 diabetes. Using a rat model of IUGR, we performed ChIP-Seq to identify and map genome-wide histone modifications and gene dysregulation in islets from 2- and 10-week rats. IUGR induced significant changes in the enrichment of H3K4me3, H3K27me3, and H3K27Ac marks in both 2-wk and 10-wk islets, which were correlated with expression changes of multiple genes critical for islet function in IUGR islets. ChIP-Seq analysis showed that IUGR-induced histone mark changes were enriched at critical transcription factor binding motifs, such as C/EBPs, Ets1, Bcl6, Thrb, Ebf1, Sox9, and Mitf. These transcription factors were also identified as top upstream regulators in our previously published transcriptome study. In addition, our ChIP-seq data revealed more than 1000 potential bivalent genes as identified by enrichment of both H3K4me3 and H3K27me3. The poised state of many potential bivalent genes was altered by IUGR, particularly Acod1, Fgf21, Serpina11, Cdh16, Lrrc27, and Lrrc66, key islet genes. Collectively, our findings suggest alterations of histone modification in key transcription factors and genes that may contribute to long-term gene dysregulation and an abnormal islet phenotype in IUGR rats.


2021 ◽  
Vol 49 (17) ◽  
pp. 9809-9820
Author(s):  
Wakana Koda ◽  
Satoshi Senmatsu ◽  
Takuya Abe ◽  
Charles S Hoffman ◽  
Kouji Hirota

Abstract Transcriptional regulation, a pivotal biological process by which cells adapt to environmental fluctuations, is achieved by the binding of transcription factors to target sequences in a sequence-specific manner. However, how transcription factors recognize the correct target from amongst the numerous candidates in a genome has not been fully elucidated. We here show that, in the fission-yeast fbp1 gene, when transcription factors bind to target sequences in close proximity, their binding is reciprocally stabilized, thereby integrating distinct signal transduction pathways. The fbp1 gene is massively induced upon glucose starvation by the activation of two transcription factors, Atf1 and Rst2, mediated via distinct signal transduction pathways. Atf1 and Rst2 bind to the upstream-activating sequence 1 region, carrying two binding sites located 45 bp apart. Their binding is reciprocally stabilized due to the close proximity of the two target sites, which destabilizes the independent binding of Atf1 or Rst2. Tup11/12 (Tup-family co-repressors) suppress independent binding. These data demonstrate a previously unappreciated mechanism by which two transcription-factor binding sites, in close proximity, integrate two independent-signal pathways, thereby behaving as a hub for signal integration.


Sign in / Sign up

Export Citation Format

Share Document