Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data

Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density ±2 kb around transcription start sites (TSSs) with a peak at −50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (−1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.

Download Full-text

Genome-Wide Identification of Transcription Start Sites, Promoters and Transcription Factor Binding Sites in E. coli

PLoS ONE ◽

10.1371/journal.pone.0007526 ◽

2009 ◽

Vol 4 (10) ◽

pp. e7526 ◽

Cited By ~ 184

Author(s):

Alfredo Mendoza-Vargas ◽

Leticia Olvera ◽

Maricela Olvera ◽

Ricardo Grande ◽

Leticia Vega-Alvarado ◽

...

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Transcription Start ◽

Transcription Start Sites ◽

E Coli ◽

Factor Binding ◽

Genome Wide

Download Full-text

Genome-wide map of human and mouse transcription factor binding sites aggregated from ChIP-Seq data

BMC Research Notes ◽

10.1186/s13104-018-3856-x ◽

2018 ◽

Vol 11 (1) ◽

Cited By ~ 10

Author(s):

Ilya E. Vorontsov ◽

Alla D. Fedorova ◽

Ivan S. Yevshin ◽

Ruslan N. Sharipov ◽

Fedor A. Kolpakov ◽

...

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Genome Wide ◽

Human And Mouse

Download Full-text

Bioinformatics Identification of Modules of Transcription Factor Binding Sites in Alzheimer's Disease-Related Genes by In Silico Promoter Analysis and Microarrays

International Journal of Alzheimer s Disease ◽

10.4061/2011/154325 ◽

2011 ◽

Vol 2011 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Regina Augustin ◽

Stefan F. Lichtenthaler ◽

Michael Greeff ◽

Jens Hansen ◽

Wolfgang Wurst ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Transcription Factor ◽

Binding Sites ◽

In Silico ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Support Vector ◽

Factor Binding ◽

Human And Mouse

The molecular mechanisms and genetic risk factors underlying Alzheimer's disease (AD) pathogenesis are only partly understood. To identify new factors, which may contribute to AD, different approaches are taken including proteomics, genetics, and functional genomics. Here, we used a bioinformatics approach and found that distinct AD-related genes share modules of transcription factor binding sites, suggesting a transcriptional coregulation. To detect additional coregulated genes, which may potentially contribute to AD, we established a new bioinformatics workflow with known multivariate methods like support vector machines, biclustering, and predicted transcription factor binding site modules by using in silico analysis and over 400 expression arrays from human and mouse. Two significant modules are composed of three transcription factor families: CTCF, SP1F, and EGRF/ZBPF, which are conserved between human and mouse APP promoter sequences. The specific combination of in silico promoter and multivariate analysis can identify regulation mechanisms of genes involved in multifactorial diseases.

Download Full-text

Faculty Opinions recommendation of Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1017662.206496 ◽

2004 ◽

Author(s):

Carlos F Barbas

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Noncoding Rnas ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Human Chromosomes ◽

Factor Binding

Download Full-text

Faculty Opinions recommendation of Position specific variation in the rate of evolution in transcription factor binding sites.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1022543.258615 ◽

2004 ◽

Author(s):

Emmanouil Dermitzakis

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Rate Of Evolution ◽

Specific Variation

Download Full-text

Faculty Opinions recommendation of Divergence of transcription factor binding sites across related yeast species.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1089279.548953 ◽

2007 ◽

Author(s):

Emmanouil Dermitzakis

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Yeast Species ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding

Download Full-text

Faculty Opinions recommendation of Genome-wide inference of natural selection on human transcription factor binding sites.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.718018456.793479473 ◽

2013 ◽

Author(s):

Peter Keightley

Keyword(s):

Transcription Factor ◽

Natural Selection ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Genome Wide ◽

Human Transcription Factor

Download Full-text

Gene Variants and Haplotypes Modifying Transcription Factor Binding Sites in the Human Cyclooxygenase 1 and 2 (PTGS1 and PTGS2) Genes

Current Drug Metabolism ◽

10.2174/138920021502140327180336 ◽

2014 ◽

Vol 15 (2) ◽

pp. 182-195 ◽

Cited By ~ 13

Author(s):

Jose Agundez ◽

David Gonzalez-Alvarez ◽

Miguel Vega-Rodriguez ◽

Emilia Botello ◽

Elena Garcia-Martin

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Gene Variants ◽

Factor Binding ◽

Cyclooxygenase 1

Download Full-text

Structure-Based Ab Initio Prediction of Transcription Factor–Binding Sites

Methods in Molecular Biology - Computational Systems Biology ◽

10.1007/978-1-59745-243-4_2 ◽

2009 ◽

pp. 23-41 ◽

Cited By ~ 8

Author(s):

L. Angela Liu ◽

Joel S. Bader

Keyword(s):

Transcription Factor ◽

Ab Initio ◽

Binding Sites ◽

Transcription Factor Binding Sites ◽

Transcription Factor Binding ◽

Factor Binding ◽

Ab Initio Prediction

Download Full-text

Prediction of Transcription Factor Binding Sites of SP1 on Human Chromosome1

Applied Sciences ◽

10.3390/app11115123 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5123

Author(s):

Maiada M. Mahmoud ◽

Nahla A. Belal ◽

Aliaa Youssif

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Messenger Rna ◽

Area Under The Curve ◽

Noisy Data ◽

Transcription Factor Binding Sites ◽

Classification Problem ◽

Transcription Factor Binding ◽

K Nearest Neighbors ◽

Factor Binding

Transcription factors (TFs) are proteins that control the transcription of a gene from DNA to messenger RNA (mRNA). TFs bind to a specific DNA sequence called a binding site. Transcription factor binding sites have not yet been completely identified, and this is considered to be a challenge that could be approached computationally. This challenge is considered to be a classification problem in machine learning. In this paper, the prediction of transcription factor binding sites of SP1 on human chromosome1 is presented using different classification techniques, and a model using voting is proposed. The highest Area Under the Curve (AUC) achieved is 0.97 using K-Nearest Neighbors (KNN), and 0.95 using the proposed voting technique. However, the proposed voting technique is more efficient with noisy data. This study highlights the applicability of the voting technique for the prediction of binding sites, and highlights the outperformance of KNN on this type of data. The study also highlights the significance of using voting.

Download Full-text