promoter prediction
Recently Published Documents


TOTAL DOCUMENTS

95
(FIVE YEARS 18)

H-INDEX

19
(FIVE YEARS 2)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruben Chevez-Guardado ◽  
Lourdes Peña-Castillo

AbstractPromoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech’s performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.


2021 ◽  
Author(s):  
Ruben Chevez-Guardado ◽  
Lourdes Pena-Castillo

Promoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compared Promotech's performance with the performance of five other promoter prediction methods. Promotech outperformed these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.


2021 ◽  
Vol 7 ◽  
pp. e365
Author(s):  
Nikita Bhandari ◽  
Satyajeet Khare ◽  
Rahee Walambe ◽  
Ketan Kotecha

Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Lucas Coppens ◽  
Rob Lavigne

Abstract Background In silico promoter prediction represents an important challenge in bioinformatics as it provides a first-line approach to identifying regulatory elements to support wet-lab experiments. Historically, available promoter prediction software have focused on sigma factor-associated promoters in the model organism E. coli. As a consequence, traditional promoter predictors yield suboptimal predictions when applied to other prokaryotic genera, such as Pseudomonas, a Gram-negative bacterium of crucial medical and biotechnological importance. Results We developed SAPPHIRE, a promoter predictor for σ70 promoters in Pseudomonas. This promoter prediction relies on an artificial neural network that evaluates sequences on their similarity to the − 35 and − 10 boxes of σ70 promoters found experimentally in P. aeruginosa and P. putida. SAPPHIRE currently outperforms established predictive software when classifying Pseudomonas σ70 promoters and was built to allow further expansion in the future. Conclusions SAPPHIRE is the first predictive tool for bacterial σ70 promoters in Pseudomonas. SAPPHIRE is free, publicly available and can be accessed online at www.biosapphire.com. Alternatively, users can download the tool as a Python 3 script for local application from this site.


mSystems ◽  
2020 ◽  
Vol 5 (4) ◽  
Author(s):  
Murilo Henrique Anzolini Cassiano ◽  
Rafael Silva-Rocha

The correct mapping of promoter elements is a crucial step in microbial genomics. Also, when combining new DNA elements into synthetic sequences, predicting the potential generation of new promoter sequences is critical. Over the last years, many bioinformatics tools have been created to allow users to predict promoter elements in a sequence or genome of interest. Here, we assess the predictive power of some of the main prediction tools available using well-defined promoter data sets. Using Escherichia coli as a model organism, we demonstrated that while some tools are biased toward AT-rich sequences, others are very efficient in identifying real promoters with low false-negative rates. We hope the potentials and limitations presented here will help the microbiology community to choose promoter prediction tools among many available alternatives.


2020 ◽  
pp. 90-97
Author(s):  
Hugo André Klauck ◽  
Gabriel Dall’Alba ◽  
Scheila de Avila e Silva ◽  
Ana Paula Longaray Delamare

Many computational methods aim to improve the prediction and recognition of transcription elements in prokaryotes. Despite this, the natural features of those elements make their prediction and recognition remain as an open field of research. In this paper, we compared the open-access tools BacPP, BPROM, bTSSfinder, CNNPromoter_b, iPro70-PseZNC, NNPP2, PePPer, and PromPredict. First, we listed the overall functionalities of each tool and the resources available on their web pages. Later, we carried out a comparison of prediction results using 206 intergenic regions. When evaluating the prediction using intergenic regions containing a single promoter within each, NNPP2 and BacPP obtained >90% correct predictions, with NNPP2 obtaining the highest values of match between predicted promoter location and location indicated by RegulonDB. Overall, many discrepancies were observed among the results. They may be explained by the differences in the methodologies that each tool applies for promoter prediction, not excluding the natural features of promoters as a factor as well. In any case, the results highlight the necessity to continue the efforts to improve promoter prediction, perhaps combining multiple approaches. Through said efforts, some of the challenges of the postgenomic era may be tackled as well.


2020 ◽  
Author(s):  
Murilo Henrique Anzolini Cassiano ◽  
Rafael Silva-Rocha

AbstractBackgroundThe promoter region is a key element required for the production of RNA in bacteria. While new high-throughput technology allows massive mapping of promoter elements, we still mainly relay on bioinformatic tools to predict such elements in bacterial genomes. Additionally, despite many different prediction tools have become popular to identify bacterial promoters, there is no systematic comparison of such tools.ResultsHere, we performed a systematic comparison between several widely used promoter prediction tools (BPROM, bTSSfinder, BacPP, CNNProm, IBBP, Virtual Footprint, IPro70-FMWin, 70ProPred, iPromoter-2L and MULTiPly) using well-defined sequence data sets and standardized metrics to determine how well those tools performed related to each other. For this, we used datasets of experimentally validated promoters from Escherichia coli and a control dataset composed by randomly generated sequences with similar nucleotide distributions. We compared the performance of the tools using metrics such as specificity, sensibility, accuracy and Matthews Correlation Coefficient (MCC). We show that the widely used BPROM presented the worse performance among compared tools, while four tools (CNNProm, IPro70-FMWin, 70ProPreda and iPromoter-2L) offered high predictive power. From these, iPro70-FMWin exhibited the best results for most of the metrics used.ConclusionsTherefore, we exploit here some potentials and limitations of available tools and hope future works can be built upon our effort to systematically characterize such quite useful class of bioinformatics tools.


2020 ◽  
Vol 24 (5) ◽  
pp. 300-309
Author(s):  
Rafael Vieira Coelho ◽  
Gabriel Dall'Alba ◽  
Scheila de Avila e Silva ◽  
Sergio Echeverrigaray ◽  
Ana Paula Longaray Delamare

2020 ◽  
Vol 36 (8) ◽  
pp. 2375-2384
Author(s):  
Akhilesh Mishra ◽  
Sahil Dhanda ◽  
Priyanka Siwach ◽  
Shruti Aggarwal ◽  
B Jayaram

Abstract Motivation Despite conservation in general architecture of promoters and protein–DNA interaction interface of RNA polymerases among various prokaryotes, identification of promoter regions in the whole genome sequences remains a daunting challenge. The available tools for promoter prediction do not seem to address the problem satisfactorily, apparently because the biochemical nature of promoter signals is yet to be understood fully. Using 28 structural and 3 energetic parameters, we found that prokaryotic promoter regions have a unique structural and energy state, quite distinct from that of coding regions and the information for this signature state is in-built in their sequences. We developed a novel promoter prediction tool from these 31 parameters using various statistical techniques. Results Here, we introduce SEProm, a novel tool that is developed by studying and utilizing the in-built structural and energy information of DNA sequences, which is applicable to all prokaryotes including archaea. Compared to five most recent, diverged and current best available tools, SEProm performs much better, predicting promoters with an ‘F-value’ of 82.04 and ‘Precision’ of 81.08. The next best ‘F-value’ was obtained with PromPredict (72.14) followed by BProm (68.37). On the basis of ‘Precision’ value, the next best ‘Precision’ was observed for Pepper (75.39) followed by PromPredict (72.01). SEProm maintained the lead even when comparison was done on two test organisms (not involved in training for SEProm). Availability and implementation The software is freely available with easy to follow instructions (www.scfbio-iitd.res.in/software/TSS_Predict.jsp). Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document