Improving Neural Network Promoter Prediction by Exploiting the Lengths of Coding and Non-Coding Sequences

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

Toward Algorithms for Automation of Postgenomic Data Analyses: Bacillus subtilis Promoter Prediction with Artificial Neural Network

OMICS A Journal of Integrative Biology ◽

10.1089/omi.2019.0041 ◽

2020 ◽

Vol 24 (5) ◽

pp. 300-309

Author(s):

Rafael Vieira Coelho ◽

Gabriel Dall'Alba ◽

Scheila de Avila e Silva ◽

Sergio Echeverrigaray ◽

Ana Paula Longaray Delamare

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Bacillus Subtilis ◽

Promoter Prediction ◽

Data Analyses ◽

Artificial Neural

Download Full-text

MetaProm: a neural network based meta-predictor for alternative human promoter prediction

BMC Genomics ◽

10.1186/1471-2164-8-374 ◽

2007 ◽

Vol 8 (1) ◽

pp. 374 ◽

Cited By ~ 18

Author(s):

Junwen Wang ◽

Lyle H Ungar ◽

Hung Tseng ◽

Sridhar Hannenhalli

Keyword(s):

Neural Network ◽

Promoter Prediction ◽

Human Promoter

Download Full-text

DNA numerical representation and neural network based human promoter prediction system

2011 Annual IEEE India Conference ◽

10.1109/indcon.2011.6139326 ◽

2011 ◽

Cited By ~ 2

Author(s):

Swarna Bai Arniker ◽

Hon Keung Kwan ◽

Ngai-Fong Law ◽

Daniel Pak-Kong Lun

Keyword(s):

Neural Network ◽

Numerical Representation ◽

Prediction System ◽

Promoter Prediction ◽

Human Promoter

Download Full-text

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models

10.21203/rs.3.rs-94396/v1 ◽

2020 ◽

Author(s):

Yupeng Wang ◽

Rosario Jaime-Lara ◽

Abhrarup Roy ◽

Ying Sun ◽

Xinyue Liu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Cell Types ◽

Regulatory Elements ◽

Learning Models ◽

Cell Type ◽

Coding Sequences ◽

Sequence Features ◽

A Genome ◽

Cell Type Specific

Abstract ObjectiveComputational identification of cell type-specific regulatory elements on a genome-wide scale is very challenging.ResultsWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

Comparison of machine learning and deep learning techniques in promoter prediction across diverse species

PeerJ Computer Science ◽

10.7717/peerj-cs.365 ◽

2021 ◽

Vol 7 ◽

pp. e365

Author(s):

Nikita Bhandari ◽

Satyajeet Khare ◽

Rahee Walambe ◽

Ketan Kotecha

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Short Term Memory ◽

Regulatory Elements ◽

Gene Promoters ◽

Promoter Prediction ◽

Promoter Sequences ◽

Learning Techniques

Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.

Download Full-text

SAPPHIRE: a neural network based classifier for σ70 promoter prediction in Pseudomonas

BMC Bioinformatics ◽

10.1186/s12859-020-03730-z ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Lucas Coppens ◽

Rob Lavigne

Keyword(s):

Neural Network ◽

Model Organism ◽

Regulatory Elements ◽

Promoter Prediction ◽

Predictive Tool ◽

E Coli ◽

Lab Experiments ◽

Important Challenge ◽

Wet Lab ◽

Predictive Software

Abstract Background In silico promoter prediction represents an important challenge in bioinformatics as it provides a first-line approach to identifying regulatory elements to support wet-lab experiments. Historically, available promoter prediction software have focused on sigma factor-associated promoters in the model organism E. coli. As a consequence, traditional promoter predictors yield suboptimal predictions when applied to other prokaryotic genera, such as Pseudomonas, a Gram-negative bacterium of crucial medical and biotechnological importance. Results We developed SAPPHIRE, a promoter predictor for σ70 promoters in Pseudomonas. This promoter prediction relies on an artificial neural network that evaluates sequences on their similarity to the − 35 and − 10 boxes of σ70 promoters found experimentally in P. aeruginosa and P. putida. SAPPHIRE currently outperforms established predictive software when classifying Pseudomonas σ70 promoters and was built to allow further expansion in the future. Conclusions SAPPHIRE is the first predictive tool for bacterial σ70 promoters in Pseudomonas. SAPPHIRE is free, publicly available and can be accessed online at www.biosapphire.com. Alternatively, users can download the tool as a Python 3 script for local application from this site.

Download Full-text

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models

BMC Research Notes ◽

10.1186/s13104-021-05518-7 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Yupeng Wang ◽

Rosario B. Jaime-Lara ◽

Abhrarup Roy ◽

Ying Sun ◽

Xinyue Liu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Cell Types ◽

Regulatory Elements ◽

Learning Models ◽

Cell Type ◽

Coding Sequences ◽

Sequence Features ◽

A Genome ◽

Cell Type Specific

Abstract Objective To address the challenge of computational identification of cell type-specific regulatory elements on a genome-wide scale. Results We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

NGS read classification using AI

PLoS ONE ◽

10.1371/journal.pone.0261548 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261548

Author(s):

Benjamin Voigt ◽

Oliver Fischer ◽

Christian Krumnow ◽

Christian Herta ◽

Piotr Wojciech Dabrowski

Keyword(s):

Neural Network ◽

Reference Database ◽

Metagenomic Sequencing ◽

Huge Amount ◽

Coding Sequences ◽

The Past ◽

Novel Approach ◽

Reference Databases ◽

Powerful Diagnostic Tool

Clinical metagenomics is a powerful diagnostic tool, as it offers an open view into all DNA in a patient’s sample. This allows the detection of pathogens that would slip through the cracks of classical specific assays. However, due to this unspecific nature of metagenomic sequencing, a huge amount of unspecific data is generated during the sequencing itself and the diagnosis only takes place at the data analysis stage where relevant sequences are filtered out. Typically, this is done by comparison to reference databases. While this approach has been optimized over the past years and works well to detect pathogens that are represented in the used databases, a common challenge in analysing a metagenomic patient sample arises when no pathogen sequences are found: How to determine whether truly no evidence of a pathogen is present in the data or whether the pathogen’s genome is simply absent from the database and the sequences in the dataset could thus not be classified? Here, we present a novel approach to this problem of detecting novel pathogens in metagenomic datasets by classifying the (segments of) proteins encoded by the sequences in the datasets. We train a neural network on the sequences of coding sequences, labeled by taxonomic domain, and use this neural network to predict the taxonomic classification of sequences that can not be classified by comparison to a reference database, thus facilitating the detection of potential novel pathogens.

Download Full-text

Promoter prediction using DNA numerical representation and neural network: Case study with three organisms

2011 Annual IEEE India Conference ◽

10.1109/indcon.2011.6139397 ◽

2011 ◽

Author(s):

Swarna Bai Arniker ◽

Hon Keung Kwan ◽

Ngai-Fong Law ◽

Daniel Pak-Kong Lun

Keyword(s):

Neural Network ◽

Numerical Representation ◽

Promoter Prediction

Download Full-text