scholarly journals Clustered, information-dense transcription factor binding sites identify genes with similar tissue-wide expression profiles

2018 ◽  
Author(s):  
Ruipeng Lu ◽  
Peter K. Rogan

ABSTRACTBackgroundThe distribution and composition ofcis-regulatory modules (e.g. transcription factor binding site (TFBS) clusters) in promoters substantially determine gene expression patterns and TF targets, whose expression levels are significantly regulated by TF binding. TF knockdown experiments have revealed correlations between TF binding profiles and gene expression levels. We present a general framework capable of predicting genes with similar tissue-wide expression patterns from activated or repressed TF targets using machine learning to combine TF binding and epigenetic features.MethodsGenes with correlated expression patterns across 53 tissues were identified according to their Bray-Curtis similarity. DNase I HyperSensitive region (DHS) -accessible promoter intervals of direct TF target genes were scanned with previously derived information theory-based position weight matrices (iPWMs) of 82 TFs. Features from information density-based TFBS clusters were used to predict target genes with machine learning classifiers. The accuracy, specificity and sensitivity of the classifiers were determined for different feature sets. Mutations in TFBSs were also introduced to examine their impact on cluster densities and the regulatory states of predicted target genes.ResultsWe initially chose the glucocorticoid receptor gene (NR3C1), whose regulation has been extensively studied, to test this approach.SLC25A32andTANKwere found to exhibit the most similar expression patterns to this gene across 53 tissues. Prediction of other genes with similar expression profiles was significantly improved by eliminating inaccessible promoter intervals based on DHSs. A Random Forest classifier exhibited the best performance in detecting such coordinately regulated genes (accuracy was 0.972 for training, 0.976 for testing). Target gene prediction was confirmed using CRISPR knockdown data of TFs, which was more accurate than siRNA inactivation. Mutation analyses of TFBSs also revealed that one or more information-dense TFBS clusters in promoters are required for accurate target gene prediction.ConclusionsMachine learning based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple, information-dense TFBS clusters in promoters appear to protect promoters from the effects of deleterious binding site mutations in a single TFBS that would effectively alter the expression state of these genes.

F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1933 ◽  
Author(s):  
Ruipeng Lu ◽  
Peter K. Rogan

Background:The distribution and composition ofcis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets.Methods:Genes with correlated expression patterns across 53 tissues and TF targets were respectively identified from Bray-Curtis Similarity and TF knockdown experiments. Corresponding promoter sequences were reduced to DNase I-accessible intervals; TFBSs were then identified within these intervals using information theory-based position weight matrices for each TF (iPWMs) and clustered. Features from information-dense TFBS clusters predicted these genes with machine learning classifiers, which were evaluated for accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed toin silicoexamine their impact on cluster densities and the regulatory states of target genes.Results:  We initially chose the glucocorticoid receptor gene (NR3C1), whose regulation has been extensively studied, to test this approach.SLC25A32andTANKwere found to exhibit the most similar expression patterns toNR3C1. A Decision Tree classifier exhibited the largest area under the Receiver Operating Characteristic (ROC) curve in detecting such genes. Target gene prediction was confirmed using siRNA knockdown of TFs, which was found to be more accurate than those predicted after CRISPR/CAS9 inactivation.In-silicomutation analyses of TFBSs also revealed that one or more information-dense TFBS clusters in promoters are required for accurate target gene prediction. Conclusions: Machine learning based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1933 ◽  
Author(s):  
Ruipeng Lu ◽  
Peter K. Rogan

Background:The distribution and composition ofcis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML).Methods:Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzedin silicoto examine their impact on TFBS clustering and predict changes in gene regulation.Results: The glucocorticoid receptor gene (NR3C1), whose regulation has been extensively studied, was selected to test this approach.SLC25A32andTANKexhibited the most similar expression patterns toNR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required  at least 1  information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.


2017 ◽  
Author(s):  
Nisar Wani ◽  
Khalid Raza

AbstractGene expression patterns determine the manner whereby organisms regulate various cellular processes and therefore their organ functions.These patterns do not emerge on their own, but as a result of diverse regulatory factors such as, DNA binding proteins known as transcription factors (TF), chromatin structure and various other environmental factors. TFs play a pivotal role in gene regulation by binding to different locations on the genome and influencing the expression of their target genes. Therefore, predicting target genes and their regulation becomes an important task for understanding mechanisms that control cellular processes governing both healthy and diseased cells.In this paper, we propose an integrated inference pipeline for predicting target genes and their regulatory effects for a specific TF using next-generation data analysis tools.


2019 ◽  
Vol 3 (Supplement_1) ◽  
Author(s):  
Zhi Chai ◽  
Yafei Lyu ◽  
Qiuyan Chen ◽  
Cheng-Hsin Wei ◽  
Lindsay Snyder ◽  
...  

Abstract Objectives To characterize and compare the impact of vitamin A (VA) deficiency on gene expression patterns in the small intestine (SI) and the colon, and to discover novel target genes in VA-related biological pathways. Methods vitamin A deficient (VAD) mice were generated by feeding VAD diet to pregnant C57/BL6 dams and their post-weaning offspring. Total mRNA extracted from SI and colon were sequenced using Illumina HiSeq 2500 platform. Differentially Expressed Gene (DEG), Gene Ontology (GO) enrichment, and Weighted Gene Co-expression Network Analysis (WGCNA) were performed to characterize expression patterns and co-expression patterns. Results The comparison between vitamin A sufficient (VAS) and VAD groups detected 49 and 94 DEGs in SI and colon, respectively. According to GO information, DEGs in the SI demonstrated significant enrichment in categories relevant to retinoid metabolic process, molecule binding, and immune function. Immunity related pathways, such as “humoral immune response” and “complement activation,” were positively associated with VA in SI. On the contrary, in colon, “cell division” was the only enriched category and was negatively associated with VA. WGCNA identified modules significantly correlated with VA status in SI and in colon. One of those modules contained five known retinoic acid targets. Therefore we have prioritized the other module members (e.g., Mbl2, Mmp9, Mmp13, Cxcl14 and Pkd1l2) to be investigated as candidate genes regulated by VA. Comparison of co-expression modules between SI and colon indicated distinct VA effects on these two organs. Conclusions The results show that VA deficiency alters the gene expression profiles in SI and colon quite differently. Some immune-related genes (Mbl2, Mmp9, Mmp13, Cxcl14 and Pkd1l2) may be novel targets under the control of VA in SI. Funding Sources NIH training grant and NIH research grant. Supporting Tables, Images and/or Graphs


2019 ◽  
Author(s):  
Timothy O’Connor ◽  
Charles E. Grant ◽  
Mikael Bodén ◽  
Timothy L. Bailey

AbstractMotivationIdentifying the genes regulated by a given transcription factor (its “target genes”) is a key step in developing a comprehensive understanding of gene regulation. Previously we developed a method for predicting the target genes of a transcription factor (TF) based solely on the correlation between a histone modification at the TF’s binding site and the expression of the gene across a set of tissues. That approach is limited to organisms for which extensive histone and expression data is available, and does not explicitly incorporate the genomic distance between the TF and the gene.ResultsWe present the T-Gene algorithm, which overcomes these limitations. T-Gene can be used to predict which genes are most likely to be regulated by a TF, and which of the TF’s binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene’s promoter, achieving median positive predictive value (PPV) above 50%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median PPV above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions.AvailabilityThe T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://[email protected]


2020 ◽  
Vol 10 (12) ◽  
pp. 4473-4482
Author(s):  
Francheska López-Rivera ◽  
Olivia K. Foster Rhoades ◽  
Ben J. Vincent ◽  
Edward C. G. Pym ◽  
Meghan D. J. Bragdon ◽  
...  

Enhancers are DNA sequences composed of transcription factor binding sites that drive complex patterns of gene expression in space and time. Until recently, studying enhancers in their genomic context was technically challenging. Therefore, minimal enhancers, the shortest pieces of DNA that can drive an expression pattern that resembles a gene’s endogenous pattern, are often used to study features of enhancer function. However, evidence suggests that some enhancers require sequences outside the minimal enhancer to maintain function under environmental perturbations. We hypothesized that these additional sequences also prevent misexpression caused by a transcription factor binding site mutation within a minimal enhancer. Using the Drosophila melanogaster even-skipped stripe 2 enhancer as a case study, we tested the effect of a Giant binding site mutation (gt-2) on the expression patterns driven by minimal and extended enhancer reporter constructs. We found that, in contrast to the misexpression caused by the gt-2 binding site deletion in the minimal enhancer, the same gt-2 binding site deletion in the extended enhancer did not have an effect on expression. The buffering of expression levels, but not expression pattern, is partially explained by an additional Giant binding site outside the minimal enhancer. Deleting the gt-2 binding site in the endogenous locus had no significant effect on stripe 2 expression. Our results indicate that rules derived from mutating enhancer reporter constructs may not represent what occurs in the endogenous context.


Sign in / Sign up

Export Citation Format

Share Document