scholarly journals Integrating binding and expression data to predict transcription factors combined function

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Mahmoud Ahmed ◽  
Do Sik Min ◽  
Deok Ryong Kim

Abstract Background Transcription factor binding to the regulatory region of a gene induces or represses its gene expression. Transcription factors share their binding sites with other factors, co-factors and/or DNA-binding proteins. These proteins form complexes which bind to the DNA as one-units. The binding of two factors to a shared site does not always lead to a functional interaction. Results We propose a method to predict the combined functions of two factors using comparable binding and expression data (target). We based this method on binding and expression target analysis (BETA), which we re-implemented in R and extended for this purpose. target ranks the factor’s targets by importance and predicts the dominant type of interaction between two transcription factors. We applied the method to simulated and real datasets of transcription factor-binding sites and gene expression under perturbation of factors. We found that Yin Yang 1 transcription factor (YY1) and YY2 have antagonistic and independent regulatory targets in HeLa cells, but they may cooperate on a few shared targets. Conclusion We developed an R package and a web application to integrate binding (ChIP-seq) and expression (microarrays or RNA-seq) data to determine the cooperative or competitive combined function of two transcription factors.

Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 446 ◽  
Author(s):  
Shijie Xin ◽  
Xiaohui Wang ◽  
Guojun Dai ◽  
Jingjing Zhang ◽  
Tingting An ◽  
...  

The proinflammatory cytokine, interleukin-6 (IL-6), plays a critical role in many chronic inflammatory diseases, particularly inflammatory bowel disease. To investigate the regulation of IL-6 gene expression at the molecular level, genomic DNA sequencing of Jinghai yellow chickens (Gallus gallus) was performed to detect single-nucleotide polymorphisms (SNPs) in the region −2200 base pairs (bp) upstream to 500 bp downstream of IL-6. Transcription factor binding sites and CpG islands in the IL-6 promoter region were predicted using bioinformatics software. Twenty-eight SNP sites were identified in IL-6. Four of these 28 SNPs, three [−357 (G > A), −447 (C > G), and −663 (A > G)] in the 5′ regulatory region and one in the 3′ non-coding region [3177 (C > T)] are not labelled in GenBank. Bioinformatics analysis revealed 11 SNPs within the promoter region that altered putative transcription factor binding sites. Furthermore, the C-939G mutation in the promoter region may change the number of CpG islands, and SNPs in the 5′ regulatory region may influence IL-6 gene expression by altering transcription factor binding or CpG methylation status. Genetic diversity analysis revealed that the newly discovered A-663G site significantly deviated from Hardy-Weinberg equilibrium. These results provide a basis for further exploration of the promoter function of the IL-6 gene and the relationships of these SNPs to intestinal inflammation resistance in chickens.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 344
Author(s):  
Mahmoud Ahmed ◽  
Deok Ryong Kim

Researchers use ChIP binding data to identify potential transcription factor binding sites. Similarly, they use gene expression data from sequencing or microarrays to quantify the effect of the factor overexpression or knockdown on its targets. Therefore, the integration of the binding and expression data can be used to improve the understanding of a transcription factor function. Here, we implemented the binding and expression target analysis (BETA) in an R/Bioconductor package. This algorithm ranks the targets based on the distances of their assigned peaks from the factor ChIP experiment and the signed statistics from gene expression profiling with factor perturbation. We further extend BETA to integrate two sets of data from two factors to predict their targets and their combined functions. In this article, we briefly describe the workings of the algorithm and provide a workflow with a real dataset for using it. The gene targets and the aggregate functions of transcription factors YY1 and YY2 in HeLa cells were identified. Using the same datasets, we identified the shared targets of the two factors, which were found to be, on average, more cooperatively regulated.


2005 ◽  
Vol 03 (02) ◽  
pp. 281-301 ◽  
Author(s):  
PATRICK C. H. MA ◽  
KEITH C. C. CHAN ◽  
DAVID K. Y. CHIU

The combined interpretation of gene expression data and gene sequences is important for the investigation of the intricate relationships of gene expression at the transcription level. The expression data produced by microarray hybridization experiments can lead to the identification of clusters of co-expressed genes that are likely co-regulated by the same regulatory mechanisms. By analyzing the promoter regions of co-expressed genes, the common regulatory patterns characterized by transcription factor binding sites can be revealed. Many clustering algorithms have been used to uncover inherent clusters in gene expression data. In this paper, based on experiments using simulated and real data, we show that the performance of these algorithms could be further improved. For the clustering of expression data typically characterized by a lot of noise, we propose to use a two-phase clustering algorithm consisting of an initial clustering phase and a second re-clustering phase. The proposed algorithm has several desirable features: (i) it utilizes both local and global information by computing both a "local" pairwise distance between two gene expression profiles in Phase 1 and a "global" probabilistic measure of interestingness of cluster patterns in Phase 2, (ii) it distinguishes between relevant and irrelevant expression values when performing re-clustering, and (iii) it makes explicit the patterns discovered in each cluster for possible interpretations. Experimental results show that the proposed algorithm can be an effective algorithm for discovering clusters in the presence of very noisy data. The patterns that are discovered in each cluster are found to be meaningful and statistically significant, and cannot otherwise be easily discovered. Based on these discovered patterns, genes co-expressed under the same experimental conditions and range of expression levels have been identified and evaluated. When identifying regulatory patterns at the promoter regions of the co-expressed genes, we also discovered well-known transcription factor binding sites in them. These binding sites can provide explanations for the co-expressed patterns.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Guohua Wang ◽  
Fang Wang ◽  
Qian Huang ◽  
Yu Li ◽  
Yunlong Liu ◽  
...  

Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5–20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, by integrating the DNase I hypersensitive sites with known position weight matrices in the TRANSFAC database, the transcription factor binding sites in gene regulatory region are identified. Based on the global gene expression patterns in cervical cancer HeLaS3 cell and HelaS3-ifnα4h cell (interferon treatment on HeLaS3 cell for 4 hours), we present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression. Significantly, 6 out 10 predicted functional factors, including IRF, IRF-2, IRF-9, IRF-1 and IRF-3, ICSBP, belong to interferon regulatory factor family and upregulate the gene expression levels responding to the interferon treatment. Another factor, ISGF-3, is also a transcriptional activator induced by interferon alpha. Using the different transcription factor binding sites selected criteria, the prediction result of our model is consistent. Our model demonstrated the potential to computationally identify the functional transcription factors in gene regulation.


2018 ◽  
Author(s):  
Mehran Karimzadeh ◽  
Michael M. Hoffman

AbstractMotivationIdentifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific.ResultsWe developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, pre-dicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3).AvailabilityThe datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).


2016 ◽  
Vol 2016 ◽  
pp. 1-27 ◽  
Author(s):  
Kristopher J. L. Irizarry ◽  
Randall L. Bryden

Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons.


2019 ◽  
Author(s):  
Ning Qing Liu ◽  
Michela Maresca ◽  
Teun van den Brand ◽  
Luca Braccioli ◽  
Marijne M.G.A. Schijns ◽  
...  

SUMMARYThe cohesin complex plays essential roles in sister chromatin cohesin, chromosome organization and gene expression. The role of cohesin in gene regulation is incompletely understood. Here, we report that the cohesin release factor WAPL is crucial for maintaining a pool of dynamic cohesin bound to regions that are associated with lineage specific genes in mouse embryonic stem cells. These regulatory regions are enriched for active enhancer marks and transcription factor binding sites, but largely devoid of CTCF binding sites. Stabilization of cohesin, which leads to a loss of dynamic cohesin from these regions, does not affect transcription factor binding or active enhancer marks, but does result in changes in promoter-enhancer interactions and downregulation of genes. Acute cohesin depletion can phenocopy the effect of WAPL depletion, showing that cohesin plays a crucial role in maintaining expression of lineage specific genes. The binding of dynamic cohesin to chromatin is dependent on the pluripotency transcription factor OCT4, but not NANOG. Finally, dynamic cohesin binding sites are also found in differentiated cells, suggesting that they represent a general regulatory principle. We propose that cohesin dynamically binding to regulatory sites creates a favorable spatial environment in which promoters and enhancers can communicate to ensure proper gene expression.HIGHLIGHTSThe cohesin release factor WAPL is crucial for maintaining a pluripotency-specific phenotype.Dynamic cohesin is enriched at lineage specific loci and overlaps with binding sites of pluripotency transcription factors.Expression of lineage specific genes is maintained by dynamic cohesin binding through the formation of promoter-enhancer associated self-interaction domains.CTCF-independent cohesin binding to chromatin is controlled by the pioneer factor OCT4.


2021 ◽  
Vol 49 (17) ◽  
pp. 9809-9820
Author(s):  
Wakana Koda ◽  
Satoshi Senmatsu ◽  
Takuya Abe ◽  
Charles S Hoffman ◽  
Kouji Hirota

Abstract Transcriptional regulation, a pivotal biological process by which cells adapt to environmental fluctuations, is achieved by the binding of transcription factors to target sequences in a sequence-specific manner. However, how transcription factors recognize the correct target from amongst the numerous candidates in a genome has not been fully elucidated. We here show that, in the fission-yeast fbp1 gene, when transcription factors bind to target sequences in close proximity, their binding is reciprocally stabilized, thereby integrating distinct signal transduction pathways. The fbp1 gene is massively induced upon glucose starvation by the activation of two transcription factors, Atf1 and Rst2, mediated via distinct signal transduction pathways. Atf1 and Rst2 bind to the upstream-activating sequence 1 region, carrying two binding sites located 45 bp apart. Their binding is reciprocally stabilized due to the close proximity of the two target sites, which destabilizes the independent binding of Atf1 or Rst2. Tup11/12 (Tup-family co-repressors) suppress independent binding. These data demonstrate a previously unappreciated mechanism by which two transcription-factor binding sites, in close proximity, integrate two independent-signal pathways, thereby behaving as a hub for signal integration.


Sign in / Sign up

Export Citation Format

Share Document