scholarly journals Combining comparative genomics with de novo motif discovery to identify human transcription factor DNA-binding motifs

2006 ◽  
Vol 7 (S4) ◽  
Author(s):  
Linyong Mao ◽  
W Jim Zheng
2019 ◽  
Author(s):  
Nina Baumgarten ◽  
Florian Schmidt ◽  
Marcel H Schulz

Abstract Motivation A central aim of molecular biology is to identify mechanisms of transcriptional regulation. Transcription factors (TFs), which are DNA-binding proteins, are highly involved in these processes, thus a crucial information is to know where TFs interact with DNA, and to be aware of the TFs’ DNA-binding motifs. For that reason, computational tools exist that link DNA-binding motifs to TFs either without sequence information or based on TF-associated sequences, e.g. identified via a ChIP-seq experiment. Method In this paper we present MASSIF, a novel method to improve the performance of existing tools that link motifs to TFs relying on TF-associated sequences. MASSIF is based on the idea that a DNA-binding motif, which is correctly linked to a TF, should be assigned to a DNA-binding domain (DBD) similar to that of the mapped TF. Because DNA-binding motifs are in general not linked to DBDs, it is not possible to compare the DBD of a TF and the motif directly. Instead we created a DBD collection, which consist of TFs with a known DBD and an associated motif. This collection enables us to evaluate how likely it is that a linked motif and a TF of interest are associated to the same DBD. We named this similarity measure domain score, and represent it as a p-value. We developed two different ways to improve the performance of existing tools that link motifs to TFs based on TF-associated sequences: (1) using meta analysis to combine p-values from one or several of these tools with the p-value of the domain score and (2) filter unlikely motifs based on the domain score. Results We demonstrate the functionality of MASSIF on several human ChIP-seq data sets, using either motifs from the HOCOMOCO database or de novo identified ones as input motifs. In addition, we show that both variants of our method improve the performance of tools that link motifs to TFs based on TF-associated sequences significantly independent of the considered DBD type. Availability MASSIF is freely available online at https://github.com/SchulzLab/MASSIF Supplementary information Supplementary data are available at Bioinformatics online.


2013 ◽  
Vol 42 (5) ◽  
pp. e35-e35 ◽  
Author(s):  
Jun Ding ◽  
Haiyan Hu ◽  
Xiaoman Li

Abstract The identification of transcription factor binding motifs is important for the study of gene transcriptional regulation. The chromatin immunoprecipitation (ChIP), followed by massive parallel sequencing (ChIP-seq) experiments, provides an unprecedented opportunity to discover binding motifs. Computational methods have been developed to identify motifs from ChIP-seq data, while at the same time encountering several problems. For example, existing methods are often not scalable to the large number of sequences obtained from ChIP-seq peak regions. Some methods heavily rely on well-annotated motifs even though the number of known motifs is limited. To simplify the problem, de novo motif discovery methods often neglect underrepresented motifs in ChIP-seq peak regions. To address these issues, we developed a novel approach called SIOMICS to de novo discover motifs from ChIP-seq data. Tested on 13 ChIP-seq data sets, SIOMICS identified motifs of many known and new cofactors. Tested on 13 simulated random data sets, SIOMICS discovered no motif in any data set. Compared with two recently developed methods for motif discovery, SIOMICS shows advantages in terms of speed, the number of known cofactor motifs predicted in experimental data sets and the number of false motifs predicted in random data sets. The SIOMICS software is freely available at http://eecs.ucf.edu/∼xiaoman/SIOMICS/SIOMICS.html.


1994 ◽  
Vol 14 (5) ◽  
pp. 2871-2882 ◽  
Author(s):  
C H Hu ◽  
B McStay ◽  
S W Jeong ◽  
R H Reeder

Xenopus UBF (xUBF) is a transcription factor for RNA polymerase I which contains multiple DNA-binding motifs. These include a short basic region adjacent to a dimer motif plus five high-mobility-group (HMG) boxes. All of these DNA-binding motifs exhibit low sequence specificity, whether assayed singly or together. In contrast, the HMG boxes recognize DNA structure that is formed when two double helices are crossed over each other. HMG box 1, in particular, requires association of two double helices before it will bind and, either by itself or in the context of the intact protein, will loop DNA and organize it into higher-order structures. We discuss how this mode of binding affects the function of xUBF as a transcription factor.


2013 ◽  
Vol 11 (01) ◽  
pp. 1340006 ◽  
Author(s):  
JAN GRAU ◽  
JENS KEILWAGEN ◽  
ANDRÉ GOHR ◽  
IVAN A. PAPONOV ◽  
STEFAN POSCH ◽  
...  

DNA-binding proteins are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in target regions of genomic DNA. However, de-novo discovery of these binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not yet been solved satisfactorily. Here, we present a detailed description and analysis of the de-novo motif discovery tool Dispom, which has been developed for finding binding sites of DNA-binding proteins that are differentially abundant in a set of target regions compared to a set of control regions. Two additional features of Dispom are its capability of modeling positional preferences of binding sites and adjusting the length of the motif in the learning process. Dispom yields an increased prediction accuracy compared to existing tools for de-novo motif discovery, suggesting that the combination of searching for differentially abundant motifs, inferring their positional distributions, and adjusting the motif lengths is beneficial for de-novo motif discovery. When applying Dispom to promoters of auxin-responsive genes and those of ABI3 target genes from Arabidopsis thaliana, we identify relevant binding motifs with pronounced positional distributions. These results suggest that learning motifs, their positional distributions, and their lengths by a discriminative learning principle may aid motif discovery from ChIP-chip and gene expression data. We make Dispom freely available as part of Jstacs, an open-source Java library that is tailored to statistical sequence analysis. To facilitate extensions of Dispom, we describe its implementation using Jstacs in this manuscript. In addition, we provide a stand-alone application of Dispom at http://www.jstacs.de/index.php/Dispom for instant use.


2012 ◽  
Vol 25 (2) ◽  
pp. 231-240 ◽  
Author(s):  
Gal Nissan ◽  
Shulamit Manulis-Sasson ◽  
Laura Chalupowicz ◽  
Doron Teper ◽  
Adva Yeheskel ◽  
...  

The type III effector HsvG of the gall-forming Pantoea agglomerans pv. gypsophilae is a DNA-binding protein that is imported to the host nucleus and involved in host specificity. The DNA-binding region of HsvG was delineated to 266 amino acids located within a secondary structure region near the N-terminus of the protein but did not display any homology to canonical DNA-binding motifs. A binding site selection procedure was used to isolate a target gene of HsvG, named HSVGT, in Gypsophila paniculata. HSVGT is a predicted acidic protein of the DnaJ family with 244 amino acids. It harbors characteristic conserved motifs of a eukaryotic transcription factor, including a bipartite nuclear localization signal, zinc finger, and leucine zipper DNA-binding motifs. Quantitative real-time polymerase chain reaction analysis demonstrated that HSVGT transcription is specifically induced in planta within 2 h after inoculation with the wild-type P. agglomerans pv. gypsophilae compared with the hsvG mutant. Induction of HSVGT reached a peak of sixfold at 4 h after inoculation and progressively declined thereafter. Gel-shift assay demonstrated that HsvG binds to the HSVGT promoter, indicating that HSVGT is a direct target of HsvG. Our results support the hypothesis that HsvG functions as a transcription factor in gypsophila.


BMC Genomics ◽  
2011 ◽  
Vol 12 (1) ◽  
Author(s):  
Elena Y Harris ◽  
Nadia Ponts ◽  
Karine G Le Roch ◽  
Stefano Lonardi

2006 ◽  
Vol 22 (14) ◽  
pp. e384-e392 ◽  
Author(s):  
L. Narlikar ◽  
R. Gordan ◽  
U. Ohler ◽  
A. J. Hartemink

Sign in / Sign up

Export Citation Format

Share Document