scholarly journals Identification of Drosophila Promoter Using Positional Differential Matrix and Support Vector Machine from Sequence Data

1970 ◽  
Vol 18 (2) ◽  
pp. 123-130
Author(s):  
Azizul Haque ◽  
Firoz Anwar ◽  
Taskeed Jabid ◽  
Syed Murtuza Baker ◽  
Haseena Khan ◽  
...  

Promoter region plays an important role in controlling gene expression of any living organism. It regulates gene transcription by providing space to the RNA polymerase and transcription factors to bind and interact with. Binding of appropriate transcription initiation complex is determined by the specific promoter sequence carrying gene specific motifs. The promoter recognition process is a part of the complex process where genes interact with each other over time and actually regulates the whole working process of a cell. Thus computational method for identifying promoter is a focal point for researchers. This paper presents an algorithm for identifying Drosophila melanogaster promoter using differential positional frequency matrix between promoter and non-promoter sequences which shows maximum 90.36% tenfold cross validation accuracy. The proposed method exhibits greater accuracy for detecting promoters. Also higher sensitivity and specificity results elucidate that the proposed method is less prone to false negatives and false positives compared to some other existing methods.  Key words:  Drosophila, Promoter, Sequence data D.O.I. 10.3329/ptcb.v18i2.3394 Plant Tissue Cult. & Biotech. 18(2): 123-130, 2008 (December)

2019 ◽  
Author(s):  
Ramesh Padmanabhan ◽  
Dennis Miller

1.1AbstractRNA polymerases (RNAPs) differ from other polymerases in that they can bind promoter sequences and initiate de novo transcription. Promoter recognition requires the presence of specific DNA binding domains in the polymerase. The structure and mechanistic aspects of transcription by the bacteriophage T7 RNA polymerase (T7 RNAP) are well characterized. This single subunit RNAP belongs to the family of RNAPs which also includes the T3, SP6 and mitochondrial RNAPs. High specificity for its promoter, the requirement of no additional transcription factors, and high fidelity of initiation from a specific site in the promoter makes it the polymerase of choice to study the mechanistic aspects of transcription. The structure and function of the catalytic domains of this family of polymerases are highly conserved suggesting a common mechanism underlying transcription. Although the two groups of single subunit RNAPs, mitochondrial and bacteriophage, have remarkable structural conservation, they recognize quite dissimilar promoters. Specifically, the bacteriophage promoters recognize a 23 nucleotide promoter extending from −17 to + 6 nucleotides relative to the site of transcription initiation, while the well characterized promoter recognized by the yeast mitochondrial RNAP is nine nucleotides in length extending from −8 to +1 relative to the site of transcription initiation. Promoters recognized by the bacteriophage RNAPs are also well characterized with distinct functional domains involved in promoter recognition and transcription initiation. Thorough mutational studies have been conducted by altering individual base-pairs within these domains. Here we describe experiments to determine whether the prototype bacteriophage RNAP is able to recognize and initiate at truncated promoters similar to mitochondrial promoters. Using an in vitro oligonucleotide transcriptional system, we have assayed transcription initiation activity by T7 RNAP. When a complete or almost complete (20 to 16 nucleotide) double stranded T7 RNAP promoter sequence is present, small RNA’s are produced through template-independent and promoter-dependent stuttering corresponding to abortive initiation, and this effect was lost with a scrambled promoter sequence. When partial double stranded promoter sequences (10 to 12 nucleotides) are supplied, template dependent de novo initiation of RNA occurs at a site different from the canonical +1-initiation site. The site of transcription initiation is determined by a recessed 3’ end based paired to the template strand of DNA rather than relative to the partial promoter sequence. Understanding the mechanism underlying this observation helps us to understand the role of the elements in the T7 promoter, and provides insights into the promoter evolution of the single-subunit RNAPs.


2019 ◽  
Vol 47 (13) ◽  
pp. 7094-7104 ◽  
Author(s):  
Chengli Fang ◽  
Lingting Li ◽  
Liqiang Shen ◽  
Jing Shi ◽  
Sheng Wang ◽  
...  

Abstract Bacterial RNA polymerase (RNAP) forms distinct holoenzymes with extra-cytoplasmic function (ECF) σ factors to initiate specific gene expression programs. In this study, we report a cryo-EM structure at 4.0 Å of Escherichia coli transcription initiation complex comprising σE—the most-studied bacterial ECF σ factor (Ec σE-RPo), and a crystal structure at 3.1 Å of Mycobacterium tuberculosis transcription initiation complex with a chimeric σH/E (Mtb σH/E-RPo). The structure of Ec σE-RPo reveals key interactions essential for assembly of E. coli σE-RNAP holoenzyme and for promoter recognition and unwinding by E. coli σE. Moreover, both structures show that the non-conserved linkers (σ2/σ4 linker) of the two ECF σ factors are inserted into the active-center cleft and exit through the RNA-exit channel. We performed secondary-structure prediction of 27,670 ECF σ factors and find that their non-conserved linkers probably reach into and exit from RNAP active-center cleft in a similar manner. Further biochemical results suggest that such σ2/σ4 linker plays an important role in RPo formation, abortive production and promoter escape during ECF σ factors-mediated transcription initiation.


2019 ◽  
Author(s):  
Lingting Li ◽  
Chengli Fang ◽  
Ningning Zhuang ◽  
Tiantian Wang ◽  
Yu Zhang

AbstractBacterial RNA polymerase employs extra-cytoplasmic function (ECF) σ factors to regulate context-specific gene expression programs. Despite being the most abundant and divergent σ factor class, the structural basis of ECF σ factor-mediated transcription initiation remains unknown. Here, we determine a crystal structure of Mycobacterium tuberculosis (Mtb) RNAP holoenzyme comprising an RNAP core enzyme and the ECF σ factor σH (σH-RNAP) at 2.7 Å, and solve another crystal structure of a transcription initiation complex of Mtb σH-RNAP (σH-RPo) comprising promoter DNA and an RNA primer at 2.8 Å. The two structures together reveal the interactions between σH and RNAP that are essential for σH-RNAP holoenzyme assembly as well as the interactions between σH-RNAP and promoter DNA responsible for stringent promoter recognition and for promoter unwinding. Our study establishes that ECF σ factors and primary σ factors employ distinct mechanisms for promoter recognition and for promoter unwinding.


2017 ◽  
Author(s):  
Mahmoud M. Ibrahim ◽  
Aslihan Karabacak ◽  
Alexander Glahs ◽  
Ena Kolundzic ◽  
Antje Hirsekorn ◽  
...  

AbstractDivergent transcription from promoters and enhancers is pervasive in many species, but it remains unclear if it is a general and passive feature of all eukaryotic cis regulatory elements. To address this, we define promoters and enhancers in C. elegans, D. melanogaster and H. sapiens using ATAC-Seq and investigate the determinants of their transcription initiation directionalities by analyzing genome-wide nascent, cap-selected, polymerase run-on assays. All three species initiate divergent transcription from separate core promoter sequences. Sequence asymmetry downstream of forward and reverse initiation sites, known to be important for termination and stability in H. sapiens, is unique in each species. Chromatin states of divergent promoters are not entirely conserved, but in all three species, the levels of histone modifications on the +1 nucleosome are independent from those on the -1 nucleosome, arguing for independent initiation events. This is supported by an integrative model of H3K4me3 levels and core promoter sequence that is highly predictive of promoter directionality and of two types of promoters: those with balanced initiation directionality and those with skewed directionality. Lastly, D. melanogaster enhancers display variation in chromatin architecture depending on enhancer location, and D. melanogaster promoter regions with dual enhancer/promoter potential are enriched for divergent transcription. Our results point to a high degree of variation in regulatory element transcription initiation directionality within and between metazoans, and to non-passive regulatory mechanisms of transcription initiation directionality in those species.


2017 ◽  
Author(s):  
Irina O. Vvedenskaya ◽  
Jeremy G. Bird ◽  
Yuanchao Zhang ◽  
Yu Zhang ◽  
Xinfu Jiao ◽  
...  

SUMMARYNucleoside-containing metabolites such as NAD+ can be incorporated as “5′ caps” on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report “CapZyme-Seq,” a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-Seq with multiplexed transcriptomics, we determine efficiencies of NAD+ capping by Escherichia coli RNAP for ~16,000 promoter sequences. The results define preferred transcription start-site (TSS) positions for NAD+ capping and define a consensus promoter sequence for NAD+ capping: HRRASWW (TSS underlined). By applying CapZyme-Seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD+-capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs. Our findings define the promoter-sequence determinants for NCIN capping with NAD+ and provide a general method for analysis of NCIN capping in vitro and in vivo.


Author(s):  
Drake Jensen ◽  
Eric A. Galburt

The fitness of an individual bacterial cell is highly dependent upon temporally tuning gene expression levels when subjected to different environmental cues. Kinetic regulation of transcription initiation is a key step in modulating the levels of transcribed genes to promote bacterial survival. The initiation phase encompasses the binding of RNA polymerase (RNAP) to promoter DNA and a series of coupled protein-DNA conformational changes prior to entry into processive elongation. The time required to complete the initiation phase can vary by orders of magnitude and is ultimately dictated by the DNA sequence of the promoter. In this review, we aim to provide the required background to understand how promoter sequence motifs may affect initiation kinetics during promoter recognition and binding, subsequent conformational changes which lead to DNA opening around the transcription start site, and promoter escape. By calculating the steady-state flux of RNA production as a function of these effects, we illustrate that the presence/absence of a consensus promoter motif cannot be used in isolation to make conclusions regarding promoter strength. Instead, the entire series of linked, sequence-dependent structural transitions must be considered holistically. Finally, we describe how individual transcription factors take advantage of the broad distribution of sequence-dependent basal kinetics to either increase or decrease RNA flux.


Plant Methods ◽  
2021 ◽  
Vol 17 (1) ◽  
Author(s):  
Prabina Kumar Meher ◽  
Ansuman Mohapatra ◽  
Subhrajit Satpathy ◽  
Anuj Sharma ◽  
Isha Saini ◽  
...  

Abstract Background Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. Results Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. Conclusions To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG (https://cran.r-project.org/web/packages/PredCRG/index.html) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.


2021 ◽  
Vol 3 (8) ◽  
Author(s):  
Gustavo Sganzerla Martinez ◽  
Scheila de Ávila e Silva ◽  
Aditya Kumar ◽  
Ernesto Pérez-Rueda

AbstractThe gene transcription of bacteria starts with a promoter sequence being recognized by a transcription factor found in the RNAP enzyme, this process is assisted through the conservation of nucleotides as well as other factors governing these intergenic regions. Faced with this, the coding of genetic information into physical aspects of the DNA such as enthalpy, stability, and base-pair stacking could suggest promoter activity as well as protrude differentiation of promoter and non-promoter data. In this work, a total of 3131 promoter sequences associated to six different sigma factors in the bacterium E. coli were converted into numeric attributes, a strong set of control sequences referring to a shuffled version of the original sequences as well as coding regions is provided. Then, the parameterized genetic information was normalized, exhaustively analyzed through statistical tests. The results suggest that strong signals in the promoter sequences match the binding site of transcription factor proteins, indicating that promoter activity is well represented by its conversion into physical attributes. Moreover, the features tested in this report conveyed significant variances between promoter and control data, enabling these features to be employed in bacterial promoter classification. The results produced here may aid in bacterial promoter recognition by providing a robust set of biological inferences.


1999 ◽  
Vol 181 (4) ◽  
pp. 1269-1280 ◽  
Author(s):  
Kimberly A. Walker ◽  
Carey L. Atkins ◽  
Robert Osuna

ABSTRACT Escherichia coli Fis is a small DNA binding and bending protein that has been implicated in a variety of biological processes. A minimal promoter sequence consisting of 43 bp is sufficient to generate its characteristic growth phase-dependent expression pattern and is also subject to negative regulation by stringent control. However, information about the precise identification of nucleotides contributing to basal promoter activity and its regulation has been scant. In this work, 72 independent mutations were generated in thefis promoter (fis P) region from −108 to +78 using both random and site-directed PCR mutagenesis. β-Galactosidase activities from mutant promoters fused to the (trp-lac)W200 fusion on a plasmid were used to conclusively identify the sequences TTTCAT and TAATAT as the −35 and −10 regions, respectively, which are optimally separated by 17 bp. We found that four consecutive substitutions within the GC-rich sequence just upstream of +1 and mutations in the −35 region, but not in the −10 region, significantly reduced the response to stringent control. Analysis of the effects of mutations on growth phase-dependent regulation showed that replacing the predominant transcription initiation nucleotide +1C with a preferred nucleotide (A or G) profoundly altered expression such that high levels of fisP mRNA were detected during late logarithmic and early stationary phases. A less dramatic effect was seen with improvements in the −10 and −35 consensus sequences. These results suggest that the acute growth phase-dependent regulation pattern observed with this promoter requires an inefficient transcription initiation process that is achieved with promoter sequences deviating from the −10 and −35 consensus sequences and, more importantly, a dependence upon the availability of the least favored transcription initiation nucleotide, CTP.


2019 ◽  
Vol 35 (16) ◽  
pp. 2730-2737 ◽  
Author(s):  
Ramzan Umarov ◽  
Hiroyuki Kuwahara ◽  
Yu Li ◽  
Xin Gao ◽  
Victor Solovyev

Abstract Motivation Computational identification of promoters is notoriously difficult as human genes often have unique promoter sequences that provide regulation of transcription and interaction with transcription initiation complex. While there are many attempts to develop computational promoter identification methods, we have no reliable tool to analyze long genomic sequences. Results In this work, we further develop our deep learning approach that was relatively successful to discriminate short promoter and non-promoter sequences. Instead of focusing on the classification accuracy, in this work we predict the exact positions of the transcription start site inside the genomic sequences testing every possible location. We studied human promoters to find effective regions for discrimination and built corresponding deep learning models. These models use adaptively constructed negative set, which iteratively improves the model’s discriminative ability. Our method significantly outperforms the previously developed promoter prediction programs by considerably reducing the number of false-positive predictions. We have achieved error-per-1000-bp rate of 0.02 and have 0.31 errors per correct prediction, which is significantly better than the results of other human promoter predictors. Availability and implementation The developed method is available as a web server at http://www.cbrc.kaust.edu.sa/PromID/.


Sign in / Sign up

Export Citation Format

Share Document