scholarly journals PCRMS: a database of predicted cis-regulatory modules and constituent transcription factor binding sites in genomes

2021 ◽  
Author(s):  
Pengyu Ni ◽  
Zhengchang Su

More accurate and more complete predictions of cis-regulatory modules (CRMs) and constituent transcriptional factor (TF) binding sites (TFBSs) in genomes can facilitate characterizing functions of regulatory sequences. Here, we developed a database PCRMS (https://cci-bioinfo.uncc.edu) that stores highly accurate and unprecedentedly complete maps of predicted CRMs and TFBSs in the human and mouse genomes. The web interface allows the user to browse CRMs and TFBSs in an organism, find the closest CRMs to a gene, search CRMs around a gene, and find all TFBSs of a TF. PCRMS can be a useful resource for the research community to characterize regulatory genomes.

2015 ◽  
Vol 9S4 ◽  
pp. BBI.S29330
Author(s):  
Stephen A. Ramsey

A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5’ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis-Hastings with an information entropybased move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences.


2007 ◽  
Vol 4 (2) ◽  
pp. 1-23
Author(s):  
Amitava Karmaker ◽  
Kihoon Yoon ◽  
Mark Doderer ◽  
Russell Kruzelock ◽  
Stephen Kwek

Summary Revealing the complex interaction between trans- and cis-regulatory elements and identifying these potential binding sites are fundamental problems in understanding gene expression. The progresses in ChIP-chip technology facilitate identifying DNA sequences that are recognized by a specific transcription factor. However, protein-DNA binding is a necessary, but not sufficient, condition for transcription regulation. We need to demonstrate that their gene expression levels are correlated to further confirm regulatory relationship. Here, instead of using a linear correlation coefficient, we used a non-linear function that seems to better capture possible regulatory relationships. By analyzing tissue-specific gene expression profiles of human and mouse, we delineate a list of pairs of transcription factor and gene with highly correlated expression levels, which may have regulatory relationships. Using two closely-related species (human and mouse), we perform comparative genome analysis to cross-validate the quality of our prediction. Our findings are confirmed by matching publicly available TFBS databases (like TRANFAC and ConSite) and by reviewing biological literature. For example, according to our analysis, 80% and 85.71% of the targets genes associated with E2F5 and RELB transcription factors have the corresponding known binding sites. We also substantiated our results on some oncogenes with the biomedical literature. Moreover, we performed further analysis on them and found that BCR and DEK may be regulated by some common transcription factors. Similar results for BTG1, FCGR2B and LCK genes were also reported.


2021 ◽  
Vol 118 (20) ◽  
pp. e2026754118
Author(s):  
Chun-Ping Yu ◽  
Chen-Hao Kuo ◽  
Chase W. Nelson ◽  
Chi-An Chen ◽  
Zhi Thong Soh ◽  
...  

Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density ±2 kb around transcription start sites (TSSs) with a peak at −50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (−1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.


2021 ◽  
Vol 12 (3) ◽  
pp. 649-656
Author(s):  
Akshara Pande ◽  
Richa Gupta ◽  
Amit Gupta ◽  
Rishika Yadav ◽  
Navin Garg ◽  
...  

Background & Objective: Ayurveda, the “Mother of all healing”, has existed for over 5,000 years and hence is considered to be the oldest healing science. Ayurveda states that the mind can heal and transform a person's whole being as the mind and body are associated. Herbs are the heart of Ayurvedic belief. They are used to boost defense against diseases and viruses and keep the brain, body, and soul in complete balance. Although ayurvedic medicines and herbs have natural components, they should still be used with certain precautions under the supervision of a medical practitioner. This study aims to manually curate information for the various ayurvedic medicinal herbs that have antiviral activity against harmful viruses. Methods: Detailed information is collected from the literature regarding the following (a) types of viruses (b) which particular category they belong to(c) the respective components of herbs that are responsible for curing viruses. We developed a web interface with the help of php and mysql to get the desired output. Results: The database consists of 104 viruses and 704 natural components. The web server is available at: http://ayurvir.com. Interpretation & Conclusion: We believe that AyurVirDB database will be extremely beneficial for the research community. It not only aids in investigations of Ayurvedic medicinal plants and their components. On the emergence or re-emergence of a virus, one could be able to predict the ayurvedic plants/herbs used for viral treatment based on virus similarity or disease symptoms.   


2020 ◽  
Author(s):  
Jiayue-Clara Jiang ◽  
Joseph Rothnagel ◽  
Kyle Upton

ABSTRACTTransposons, a type of repetitive DNA elements, can contribute cis-regulatory sequences and regulate the expression of human genes. L1PA2 is a hominoid-specific subfamily of LINE1 transposons, with approximately 4,940 copies in the human genome. Individual transposons have been demonstrated to contribute specific biological functions, such as cancer-specific alternate promoter activity for the MET oncogene, which is correlated with enhanced malignancy and poor prognosis in cancer. Given the sequence similarity between L1PA2 elements, we hypothesise that transposons within the L1PA2 subfamily likely have a common regulatory potential and may provide a mechanism for global genome regulation. Here we show that in breast cancer, the regulatory potential of L1PA2 is not limited to single transposons, but is common within the subfamily. We demonstrate that the L1PA2 subfamily is an abundant reservoir of transcription factor binding sites, the majority of which cluster in the LINE1 5’UTR. In MCF7 breast cancer cells, over 27% of L1PA2 transposons harbour binding sites of functionally interacting, cancer-associated transcription factors. The ubiquitous and replicative nature of L1PA2 makes them an exemplary vector to disperse co-localised transcription factor binding sites, facilitating the co-ordinated regulation of genes. In MCF7 cells, L1PA2 transposons also supply transcription start sites to up-regulated transcripts. These transcriptionally active L1PA2 transposons display a cancer-specific active epigenetic profile, and likely play an oncogenic role in breast cancer aetiology. Overall, we show that the L1PA2 subfamily contributes abundant regulatory sequences in breast cancer cells, and likely plays a global role in modulating the tumorigenic state in breast cancer.


2005 ◽  
Vol 19 (3) ◽  
pp. 595-606 ◽  
Author(s):  
Albin Sandelin ◽  
Wyeth W. Wasserman

Abstract The nuclear receptor (NR) class of transcription factors controls critical regulatory events in key developmental processes, homeostasis maintenance, and medically important diseases and conditions. Identification of the members of a regulon controlled by a NR could provide an accelerated understanding of development and disease. New bioinformatics methods for the analysis of regulatory sequences are required to address the complex properties associated with known regulatory elements targeted by the receptors because the standard methods for binding site prediction fail to reflect the diverse target site configurations. We have constructed a flexible Hidden Markov Model framework capable of predicting NHR binding sites. The model allows for variable spacing and orientation of half-sites. In a genome-scale analysis enabled by the model, we show that NRs in Fugu rubripes have a significant cross-regulatory potential. The model is implemented in a web interface, freely available for academic researchers, available at http://mordor.cgb.ki.se/NHR-scan.


2018 ◽  
Author(s):  
Stephanie L. Barnes ◽  
Nathan M. Belliveau ◽  
William T. Ireland ◽  
Justin B. Kinney ◽  
Rob Phillips

AbstractDespite the central importance of transcriptional regulation in systems biology, it has proven difficult to determine the regulatory mechanisms of individual genes, let alone entire gene networks. It is particularly difficult to analyze a promoter sequence and identify the locations, regulatory roles, and energetic properties of binding sites for transcription factors and RNA polymerase. In this work, we present a strategy for interpreting transcriptional regulatory sequences using in vivo methods (i.e. the massively parallel reporter assay Sort-Seq) to formulate quantitative models that map a transcription factor binding site’s DNA sequence to transcription factor-DNA binding energy. We use these models to predict the binding energies of transcription factor binding sites to within 1 kBT of their measured values. We further explore how such a sequence-energy mapping relates to the mechanisms of trancriptional regulation in various promoter contexts. Specifically, we show that our models can be used to design specific induction responses, analyze the effects of amino acid mutations on DNA sequence preference, and determine how regulatory context affects a transcription factor’s sequence specificity.


2020 ◽  
Author(s):  
Pengyu Ni ◽  
Zhengchang Su

AbstractAnnotating all cis-regulatory modules (CRMs) and transcription factor (TF) binding sites(TFBSs) in genomes remains challenging. We tackled the task by integrating putative TFBSs motifs found in available 6,092 datasets covering 77.47% of the human genome. This approach enabled us to partition the covered genome regions into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, like known enhancers, the predicted 1,404,973 CRMCs are under strong evolutionary constraints, suggesting that they might be cis-regulator. In contrast, the non-CRMCs are largely selectively neutral, suggesting that they might not be cis-regulatory. Our method substantially outperforms three state-of-the-art methods (GeneHancers, EnhancerAtlas and ENCODE phase 3) for recalling VISTA enhancers and ClinVar variants, as well as by measurements of evolutionary constraints. We estimated that the human genome might encode about 1.46 million CRMs and 67 million TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought.


Sign in / Sign up

Export Citation Format

Share Document