T-Gene: improved target gene prediction

Timothy O’Connor; Charles E Grant; Mikael Bodén; Timothy L Bailey

doi:10.1093/bioinformatics/btaa227

T-Gene: improved target gene prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa227 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3902-3904

Author(s):

Timothy O’Connor ◽

Charles E Grant ◽

Mikael Bodén ◽

Timothy L Bailey

Keyword(s):

Binding Sites ◽

Target Gene ◽

Target Genes ◽

Gene Prediction ◽

Regulatory Element ◽

Statistical Significance ◽

Web Server ◽

Supplementary Information ◽

Expression Data ◽

Command Line Tool

Abstract Motivation Identifying the genes regulated by a given transcription factor (TF) (its ‘target genes’) is a key step in developing a comprehensive understanding of gene regulation. Previously, we developed a method (CisMapper) for predicting the target genes of a TF based solely on the correlation between a histone modification at the TF’s binding site and the expression of the gene across a set of tissues or cell lines. That approach is limited to organisms for which extensive histone and expression data are available, and does not explicitly incorporate the genomic distance between the TF and the gene. Results We present the T-Gene algorithm, which overcomes these limitations. It can be used to predict which genes are most likely to be regulated by a TF, and which of the TF’s binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene’s promoter, achieving median precision above 60%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median precision above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions. Availability and implementation The T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://meme-suite.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

T-Gene: Improved target gene prediction

10.1101/803221 ◽

2019 ◽

Cited By ~ 1

Author(s):

Timothy O’Connor ◽

Charles E. Grant ◽

Mikael Bodén ◽

Timothy L. Bailey

Keyword(s):

Transcription Factor ◽

Binding Sites ◽

Target Gene ◽

Target Genes ◽

Gene Prediction ◽

Regulatory Element ◽

Statistical Significance ◽

Web Server ◽

Expression Data ◽

Command Line Tool

AbstractMotivationIdentifying the genes regulated by a given transcription factor (its “target genes”) is a key step in developing a comprehensive understanding of gene regulation. Previously we developed a method for predicting the target genes of a transcription factor (TF) based solely on the correlation between a histone modification at the TF’s binding site and the expression of the gene across a set of tissues. That approach is limited to organisms for which extensive histone and expression data is available, and does not explicitly incorporate the genomic distance between the TF and the gene.ResultsWe present the T-Gene algorithm, which overcomes these limitations. T-Gene can be used to predict which genes are most likely to be regulated by a TF, and which of the TF’s binding sites are most likely involved in regulating particular genes. T-Gene calculates a novel score that combines distance and histone/expression correlation, and we show that this score accurately predicts when a regulatory element bound by a TF is in contact with a gene’s promoter, achieving median positive predictive value (PPV) above 50%. T-Gene is easy to use via its web server or as a command-line tool, and can also make accurate predictions (median PPV above 40%) based on distance alone when extensive histone/expression data is not available for the organism. T-Gene provides an estimate of the statistical significance of each of its predictions.AvailabilityThe T-Gene web server, source code, histone/expression data and genome annotation files are provided at http://[email protected]

Download Full-text

Hox repression of a target gene: extradenticle-independent, additive action through multiple monomer binding sites

Development ◽

10.1242/dev.129.13.3115 ◽

2002 ◽

Vol 129 (13) ◽

pp. 3115-3126 ◽

Cited By ~ 7

Author(s):

Ron Galant ◽

Christopher M. Walsh ◽

Sean B. Carroll

Keyword(s):

Binding Sites ◽

Target Gene ◽

Target Genes ◽

Hox Genes ◽

Regulatory Element ◽

Morphological Diversity ◽

Hox Proteins ◽

Cis Element ◽

Additive Action ◽

Hox Protein

Homeotic (Hox) genes regulate the identity of structures along the anterior-posterior axis of most animals. The low DNA-binding specificities of Hox proteins have raised the question of how these transcription factors selectively regulate target gene expression. The discovery that the Extradenticle (Exd)/Pbx and Homothorax (Hth)/Meis proteins act as cofactors for several Hox proteins has advanced the view that interactions with cofactors are critical to the target selectivity of Hox proteins. It is not clear, however, to what extent Hox proteins also regulate target genes in the absence of cofactors. In Drosophila melanogaster, the Hox protein Ultrabithorax (Ubx) promotes haltere development and suppresses wing development by selectively repressing many genes of the wing-patterning hierarchy, and this activity requires neither Exd nor Hth function. Here, we show that Ubx directly regulates a flight appendage-specific cis-regulatory element of the spalt (sal) gene. We find that multiple monomer Ubx-binding sites are required to completely repress this cis-element in the haltere, and that individual Ubx-binding sites are sufficient to mediate its partial repression. These results suggest that Hox proteins can directly regulate target genes in the absence of the cofactor Extradenticle. We propose that the regulation of some Hox target genes evolves via the accumulation of multiple Hox monomer binding sites. Furthermore, because the development and morphological diversity of the distal parts of most arthropod and vertebrate appendages involve Hox, but not Exd/Pbx or Hth/Meis proteins, this mode of target gene regulation appears to be important for distal appendage development and the evolution of appendage diversity.

Download Full-text

Microrna-130a Downregulates HCV Replication through an atg5-Dependent Autophagy Pathway

Cells ◽

10.3390/cells8040338 ◽

2019 ◽

Vol 8 (4) ◽

pp. 338 ◽

Cited By ~ 5

Author(s):

Xiaoqiong Duan ◽

Xiao Liu ◽

Wenting Li ◽

Jacinta A. Holmes ◽

Annie J. Kruger ◽

...

Keyword(s):

Immune Responses ◽

Binding Sites ◽

Target Gene ◽

Target Genes ◽

Pyruvate Metabolism ◽

Host Immune Responses ◽

Qrt Pcr ◽

Antiviral Genes ◽

Autophagy Pathway ◽

Host Antiviral Response

We previously identified that miR-130a downregulates HCV replication through two independent pathways: restoration of host immune responses and regulation of pyruvate metabolism. In this study, we further sought to explore host antiviral target genes regulated by miR-130a. We performed a RT² Profiler™ PCR array to identify the host antiviral genes regulated by miR-130a. The putative binding sites between miR-130a and its downregulated genes were predicted by miRanda. miR-130a and predicted target genes were over-expressed or knocked down by siRNA or CRISPR/Cas9 gRNA. Selected gene mRNAs and their proteins, together with HCV replication in JFH1 HCV-infected Huh7.5.1 cells were monitored by qRT-PCR and Western blot. We identified 32 genes that were significantly differentially expressed more than 1.5-fold following miR-130a overexpression, 28 of which were upregulated and 4 downregulated. We found that ATG5, a target gene for miR-130a, significantly upregulated HCV replication and downregulated interferon stimulated gene expression. miR-130a downregulated ATG5 expression and its conjugation complex with ATG12. ATG5 and ATG5-ATG12 complex affected interferon stimulated gene (ISG) such as MX1 and OAS3 expression and subsequently HCV replication. We concluded that miR-130a regulates host antiviral response and HCV replication through targeting ATG5 via the ATG5-dependent autophagy pathway.

Download Full-text

DELPHI: accurate deep ensemble model for protein interaction sites prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa750 ◽

2020 ◽

Author(s):

Yiwei Li ◽

G Brian Golding ◽

Lucian Ilie

Keyword(s):

Deep Learning ◽

Protein Interaction ◽

Binding Sites ◽

Web Server ◽

Fine Tuning ◽

Supplementary Information ◽

Ensemble Model ◽

Position Information ◽

Interaction Sites ◽

Protein Interaction Sites

Abstract Motivation Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein–protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods. Results We propose DEep Learning Prediction of Highly probable protein Interaction sites (DELPHI), a new sequence-based deep learning suite for PPI-binding sites prediction. DELPHI has an ensemble structure which combines a CNN and a RNN component with fine tuning technique. Three novel features, HSP, position information and ProtVec are used in addition to nine existing ones. We comprehensively compare DELPHI to nine state-of-the-art programmes on five datasets, and DELPHI outperforms the competing methods in all metrics even though its training dataset shares the least similarities with the testing datasets. In the most important metrics, AUPRC and MCC, it surpasses the second best programmes by as much as 18.5% and 27.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model and, especially, the three new features. Using DELPHI it is shown that there is a strong correlation with protein-binding residues (PBRs) and sites with strong evolutionary conservation. In addition, DELPHI’s predicted PBR sites closely match known data from Pfam. DELPHI is available as open-sourced standalone software and web server. Availability and implementation The DELPHI web server can be found at delphi.csd.uwo.ca/, with all datasets and results in this study. The trained models, the DELPHI standalone source code, and the feature computation pipeline are freely available at github.com/lucian-ilie/DELPHI. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MyDGR: a server for identification and characterization of diversity-generating retroelements

Nucleic Acids Research ◽

10.1093/nar/gkz329 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W289-W294 ◽

Cited By ~ 5

Author(s):

Fatemeh Sharifi ◽

Yuzhen Ye

Keyword(s):

Target Gene ◽

Target Genes ◽

Web Server ◽

Nucleotide Sequences ◽

Bacterial Genomes ◽

Fasta Format ◽

Reverse Transcriptases ◽

Identification And Characterization ◽

The Web

Abstract MyDGR is a web server providing integrated prediction and visualization of Diversity-Generating Retroelements (DGR) systems in query nucleotide sequences. It is built upon an enhanced version of DGRscan, a tool we previously developed for identification of DGR systems. DGR systems are remarkable genetic elements that use error-prone reverse transcriptases to generate vast sequence variants in specific target genes, which have been shown to benefit their hosts (bacteria, archaea or phages). As the first web server for annotation of DGR systems, myDGR is freely available on the web at http://omics.informatics.indiana.edu/myDGR with all major browsers supported. MyDGR accepts query nucleotide sequences in FASTA format, and outputs all the important features of a predicted DGR system, including a reverse transcriptase, a template repeat and one (or more) variable repeats and their alignment featuring A-to-N (N can be C, T or G) substitutions, and VR-containing target gene(s). In addition to providing the results as text files for download, myDGR generates a visual summary of the results for users to explore the predicted DGR systems. Users can also directly access pre-calculated, putative DGR systems identified in currently available reference bacterial genomes and a few other collections of sequences (including human microbiomes).

Download Full-text

An H3K4me3 reader, BAP18 as an adaptor of COMPASS-like core subunits co-activates ERα action and associates with the sensitivity of antiestrogen in breast cancer

Nucleic Acids Research ◽

10.1093/nar/gkaa787 ◽

2020 ◽

Vol 48 (19) ◽

pp. 10768-10784

Author(s):

Ge Sun ◽

Chunyu Wang ◽

Shengli Wang ◽

Hongmiao Sun ◽

Kai Zeng ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Progression ◽

Binding Sites ◽

Estrogen Receptor Alpha ◽

Target Genes ◽

Regulatory Element ◽

Therapy Resistance ◽

Ctcf Binding ◽

Promoter Regions ◽

Positive Breast Cancer

Abstract Estrogen receptor alpha (ERα) signaling pathway is essential for ERα-positive breast cancer progression and endocrine therapy resistance. Bromodomain PHD Finger Transcription Factor (BPTF) associated protein of 18kDa (BAP18) has been recognized as a crucial H3K4me3 reader. However, the whole genomic occupation of BAP18 and its biological function in breast cancer is still elusive. Here, we found that higher expression of BAP18 in ERα-positive breast cancer is positively correlated with poor prognosis. ChIP-seq analysis further demonstrated that the half estrogen response elements (EREs) and the CCCTC binding factor (CTCF) binding sites are the significant enrichment sites found in estrogen-induced BAP18 binding sites. Also, we provide the evidence to demonstrate that BAP18 as a novel co-activator of ERα is required for the recruitment of COMPASS-like core subunits to the cis-regulatory element of ERα target genes in breast cancer cells. BAP18 is recruited to the promoter regions of estrogen-induced genes, accompanied with the enrichment of the lysine 4-trimethylated histone H3 tail (H3K4me3) in the presence of E2. Furthermore, BAP18 promotes cell growth and associates the sensitivity of antiestrogen in ERα-positive breast cancer. Our data suggest that BAP18 facilitates the association between ERα and COMPASS-like core subunits, which might be an essential epigenetic therapeutic target for breast cancer.

Download Full-text

Study on the Function of miRNA-155 Target Using Bioinformatics Methods

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.709.858 ◽

2013 ◽

Vol 709 ◽

pp. 858-861

Author(s):

De Ming Han ◽

Zi Jun Shen ◽

Li Hui Zhao

Keyword(s):

Protein Expression ◽

Mirna Target ◽

Target Gene ◽

Target Genes ◽

Gene Prediction ◽

Diagnosis And Treatment ◽

Transcriptional Level ◽

Mirna Target Gene ◽

Non Coding Rnas

MicroRNAs are small non-coding RNAs that act at the post-transcriptional level, regulating protein expression by repressing translation or destabilizing mRNA target. We searched information about miR-155 in miRBase. Target genes of miR-155 are predicted by four miRNA target gene prediction softwares. The result shows that miR-155 was involved in proliferation, differentiation and apoptosis. These results can contribute to further study on the role of microRNA in diagnosis and treatment of cancer.

Download Full-text

Raw sequence to target gene prediction: An integrated inference pipeline for ChIP-seq and RNA-seq datasets

10.1101/220152 ◽

2017 ◽

Author(s):

Nisar Wani ◽

Khalid Raza

Keyword(s):

Gene Expression ◽

Target Gene ◽

Target Genes ◽

Gene Prediction ◽

Expression Patterns ◽

Dna Binding Proteins ◽

Gene Expression Patterns ◽

Rna Seq ◽

Cellular Processes ◽

Regulatory Effects

AbstractGene expression patterns determine the manner whereby organisms regulate various cellular processes and therefore their organ functions.These patterns do not emerge on their own, but as a result of diverse regulatory factors such as, DNA binding proteins known as transcription factors (TF), chromatin structure and various other environmental factors. TFs play a pivotal role in gene regulation by binding to different locations on the genome and influencing the expression of their target genes. Therefore, predicting target genes and their regulation becomes an important task for understanding mechanisms that control cellular processes governing both healthy and diseased cells.In this paper, we propose an integrated inference pipeline for predicting target genes and their regulatory effects for a specific TF using next-generation data analysis tools.

Download Full-text

Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations

F1000Research ◽

10.12688/f1000research.17363.1 ◽

2018 ◽

Vol 7 ◽

pp. 1933 ◽

Cited By ~ 1

Author(s):

Ruipeng Lu ◽

Peter K. Rogan

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Transcription Factor ◽

Binding Site ◽

In Silico ◽

Target Gene ◽

Target Genes ◽

Gene Prediction ◽

Expression Patterns ◽

Tree Classifier

Background:The distribution and composition ofcis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets.Methods:Genes with correlated expression patterns across 53 tissues and TF targets were respectively identified from Bray-Curtis Similarity and TF knockdown experiments. Corresponding promoter sequences were reduced to DNase I-accessible intervals; TFBSs were then identified within these intervals using information theory-based position weight matrices for each TF (iPWMs) and clustered. Features from information-dense TFBS clusters predicted these genes with machine learning classifiers, which were evaluated for accuracy, specificity and sensitivity. Mutations in TFBSs were analyzed toin silicoexamine their impact on cluster densities and the regulatory states of target genes.Results: We initially chose the glucocorticoid receptor gene (NR3C1), whose regulation has been extensively studied, to test this approach.SLC25A32andTANKwere found to exhibit the most similar expression patterns toNR3C1. A Decision Tree classifier exhibited the largest area under the Receiver Operating Characteristic (ROC) curve in detecting such genes. Target gene prediction was confirmed using siRNA knockdown of TFs, which was found to be more accurate than those predicted after CRISPR/CAS9 inactivation.In-silicomutation analyses of TFBSs also revealed that one or more information-dense TFBS clusters in promoters are required for accurate target gene prediction. Conclusions: Machine learning based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.

Download Full-text

Transcription factor binding site clusters identify target genes with similar tissue-wide expression and buffer against mutations

F1000Research ◽

10.12688/f1000research.17363.2 ◽

2019 ◽

Vol 7 ◽

pp. 1933 ◽

Cited By ~ 8

Author(s):

Ruipeng Lu ◽

Peter K. Rogan

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Binding Site ◽

Target Gene ◽

Target Genes ◽

Gene Prediction ◽

Expression Patterns ◽

Specificity And Sensitivity ◽

Tree Classifier ◽

Glucocorticoid Receptor Gene

Background:The distribution and composition ofcis-regulatory modules composed of transcription factor (TF) binding site (TFBS) clusters in promoters substantially determine gene expression patterns and TF targets. TF knockdown experiments have revealed that TF binding profiles and gene expression levels are correlated. We use TFBS features within accessible promoter intervals to predict genes with similar tissue-wide expression patterns and TF targets using Machine Learning (ML).Methods:Bray-Curtis Similarity was used to identify genes with correlated expression patterns across 53 tissues. TF targets from knockdown experiments were also analyzed by this approach to set up the ML framework. TFBSs were selected within DNase I-accessible intervals of corresponding promoter sequences using information theory-based position weight matrices (iPWMs) for each TF. Features from information-dense clusters of TFBSs were input to ML classifiers which predict these gene targets along with their accuracy, specificity and sensitivity. Mutations in TFBSs were analyzedin silicoto examine their impact on TFBS clustering and predict changes in gene regulation.Results: The glucocorticoid receptor gene (NR3C1), whose regulation has been extensively studied, was selected to test this approach.SLC25A32andTANKexhibited the most similar expression patterns toNR3C1. A Decision Tree classifier exhibited the best performance in detecting such genes, based on Area Under the Receiver Operating Characteristic curve (ROC). TF target gene prediction was confirmed using siRNA knockdown, which was more accurate than CRISPR/CAS9 inactivation. TFBS mutation analyses revealed that accurate target gene prediction required at least 1 information-dense TFBS cluster. Conclusions: ML based on TFBS information density, organization, and chromatin accessibility accurately identifies gene targets with comparable tissue-wide expression patterns. Multiple information-dense TFBS clusters in promoters appear to protect promoters from effects of deleterious binding site mutations in a single TFBS that would otherwise alter regulation of these genes.

Download Full-text