splice site prediction Latest Research Papers

Abstract Background Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. Results We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89–92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. Conclusions Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.

Download Full-text

MutationTaster2021

Nucleic Acids Research ◽

10.1093/nar/gkab266 ◽

2021 ◽

Author(s):

Robin Steinhaus ◽

Sebastian Proft ◽

Markus Schuelke ◽

David N Cooper ◽

Jana Marie Schwarz ◽

...

Keyword(s):

Prediction Model ◽

Clinical Phenotype ◽

The Novel ◽

User Friendliness ◽

Major Overhaul ◽

Splice Site Prediction ◽

Disease Mutations ◽

Many Sources ◽

Variant Effect Prediction ◽

Mutation Search

Abstract Here we present an update to MutationTaster, our DNA variant effect prediction tool. The new version uses a different prediction model and attains higher accuracy than its predecessor, especially for rare benign variants. In addition, we have integrated many sources of data that only became available after the last release (such as gnomAD and ExAC pLI scores) and changed the splice site prediction model. To more easily assess the relevance of detected known disease mutations to the clinical phenotype of the patient, MutationTaster now provides information on the diseases they cause. Further changes represent a major overhaul of the interfaces to increase user-friendliness whilst many changes under the hood have been designed to accelerate the processing of uploaded VCF files. We also offer an API for the rapid automated query of smaller numbers of variants from within other software. MutationTaster2021 integrates our disease mutation search engine, MutationDistiller, to prioritise variants from VCF files using the patient's clinical phenotype. The novel version is available at https://www.genecascade.org/MutationTaster2021/. This website is free and open to all users and there is no login requirement.

Download Full-text

Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA

Gene X ◽

10.1016/j.gene.2020.100035 ◽

2020 ◽

Vol 5 ◽

pp. 100035 ◽

Cited By ~ 2

Author(s):

Somayah Albaradei ◽

Arturo Magana-Mora ◽

Maha Thafar ◽

Mahmut Uludag ◽

Vladimir B. Bajic ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Splice Site ◽

Genomic Dna ◽

Deep Convolutional Neural Networks ◽

Splice Site Prediction ◽

Site Prediction

Download Full-text

Splice site prediction across different organisms: A transfer learning approach

11th Hellenic Conference on Artificial Intelligence ◽

10.1145/3411408.3411458 ◽

2020 ◽

Author(s):

Despoina Kalfakakou ◽

Anastasia Krithara ◽

Georgios Paliouras

Keyword(s):

Transfer Learning ◽

Splice Site ◽

Learning Approach ◽

Splice Site Prediction ◽

Site Prediction

Download Full-text

EDeepSSP: Explainable deep neural networks for exact splice sites prediction

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020500249 ◽

2020 ◽

Vol 18 (04) ◽

pp. 2050024

Author(s):

Santhosh Amilpur ◽

Raju Bhukya

Keyword(s):

Neural Networks ◽

Splice Site ◽

Operating Characteristic ◽

Characteristic Curve ◽

Splice Sites ◽

Human Donor ◽

Splice Site Prediction ◽

Site Prediction ◽

Automatic Feature Extraction ◽

Precision Recall Curve

Splice site prediction is crucial for understanding underlying gene regulation, gene function for better genome annotation. Many computational methods exist for recognizing the splice sites. Although most of the methods achieve a competent performance, their interpretability remains challenging. Moreover, all traditional machine learning methods manually extract features, which is tedious job. To address these challenges, we propose a deep learning-based approach (EDeepSSP) that employs convolutional neural networks (CNNs) architecture for automatic feature extraction and effectively predicts splice sites. Our model, EDeepSSP, divulges the opaque nature of CNN by extracting significant motifs and explains why these motifs are vital for predicting splice sites. In this study, experiments have been conducted on six benchmark acceptors and donor datasets of humans, cress, and fly. The results show that EDeepSSP has outperformed many state-of-the-art approaches. EDeepSSP achieves the highest area under the receiver operating characteristic curve (AUC_ROC) and area under the precision-recall curve (AUC_PR) of 99.32% and 99.26% on human donor datasets, respectively. We also analyze various filter activities, feature activations, and extracted significant motifs responsible for the splice site prediction. Further, we validate the learned motifs of our model against known motifs of JASPAR splice site database.

Download Full-text

Predicting the effect of variants on splicing using Convolutional Neural Networks

PeerJ ◽

10.7717/peerj.9470 ◽

2020 ◽

Vol 8 ◽

pp. e9470

Author(s):

Thanyathorn Thanapattheerakul ◽

Worrawat Engchuan ◽

Jonathan H. Chan

Keyword(s):

Splice Site ◽

Predictive Models ◽

Rna Splicing ◽

Messenger Rna ◽

Computational Models ◽

Splice Variants ◽

Splice Sites ◽

Splice Site Prediction ◽

Donor And Acceptor ◽

Rna Splice Sites

Mutations that cause an error in the splicing of a messenger RNA (mRNA) can lead to diseases in humans. Various computational models have been developed to recognize the sequence pattern of the splice sites. In recent studies, Convolutional Neural Network (CNN) architectures were shown to outperform other existing models in predicting the splice sites. However, an insufficient effort has been put into extending the CNN model to predict the effect of the genomic variants on the splicing of mRNAs. This study proposes a framework to elaborate on the utility of CNNs to assess the effect of splice variants on the identification of potential disease-causing variants that disrupt the RNA splicing process. Five models, including three CNN-based and two non-CNN machine learning based, were trained and compared using two existing splice site datasets, Genome Wide Human splice sites (GWH) and a dataset provided at the Deep Learning and Artificial Intelligence winter school 2018 (DLAI). The donor sites were also used to test on the HSplice tool to evaluate the predictive models. To improve the effectiveness of predictive models, two datasets were combined. The CNN model with four convolutional layers showed the best splice site prediction performance with an AUPRC of 93.4% and 88.8% for donor and acceptor sites, respectively. The effects of variants on splicing were estimated by applying the best model on variant data from the ClinVar database. Based on the estimation, the framework could effectively differentiate pathogenic variants from the benign variants (p = 5.9 × 10−7). These promising results support that the proposed framework could be applied in future genetic studies to identify disease causing loci involving the splicing mechanism. The datasets and Python scripts used in this study are available on the GitHub repository at https://github.com/smiile8888/rna-splice-sites-recognition.

Download Full-text

SpliceFinder: ab initio prediction of splice sites using convolutional neural network

BMC Bioinformatics ◽

10.1186/s12859-019-3306-3 ◽

2019 ◽

Vol 20 (S23) ◽

Cited By ~ 5

Author(s):

Ruohan Wang ◽

Zishuai Wang ◽

Jianping Wang ◽

Shuaicheng Li

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Ab Initio ◽

Splice Site ◽

False Positives ◽

Genomic Sequences ◽

Splice Sites ◽

Operating Characteristics ◽

Splice Site Prediction ◽

Site Prediction

Abstract Background Identifying splice sites is a necessary step to analyze the location and structure of genes. Two dinucleotides, GT and AG, are highly frequent on splice sites, and many other patterns are also on splice sites with important biological functions. Meanwhile, the dinucleotides occur frequently at the sequences without splice sites, which makes the prediction prone to generate false positives. Most existing tools select all the sequences with the two dimers and then focus on distinguishing the true splice sites from those pseudo ones. Such an approach will lead to a decrease in false positives; however, it will result in non-canonical splice sites missing. Result We have designed SpliceFinder based on convolutional neural network (CNN) to predict splice sites. To achieve the ab initio prediction, we used human genomic data to train our neural network. An iterative approach is adopted to reconstruct the dataset, which tackles the data unbalance problem and forces the model to learn more features of splice sites. The proposed CNN obtains the classification accuracy of 90.25%, which is 10% higher than the existing algorithms. The method outperforms other existing methods in terms of area under receiver operating characteristics (AUC), recall, precision, and F1 score. Furthermore, SpliceFinder can find the exact position of splice sites on long genomic sequences with a sliding window. Compared with other state-of-the-art splice site prediction tools, SpliceFinder generates results in about half lower false positive while keeping recall higher than 0.8. Also, SpliceFinder captures the non-canonical splice sites. In addition, SpliceFinder performs well on the genomic sequences of Drosophila melanogaster, Mus musculus, Rattus, and Danio rerio without retraining. Conclusion Based on CNN, we have proposed a new ab initio splice site prediction tool, SpliceFinder, which generates less false positives and can detect non-canonical splice sites. Additionally, SpliceFinder is transferable to other species without retraining. The source code and additional materials are available at https://gitlab.deepomics.org/wangruohan/SpliceFinder.

Download Full-text

PSSP: Protein splice site prediction algorithm using Bayesian approach

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019500343 ◽

2019 ◽

Vol 17 (06) ◽

pp. 1950034

Author(s):

Abolfazl Bahrami ◽

Ali Najafi ◽

Mohammadreza Hashemi ◽

Seyed Reza Miraie-Ashtiani

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Structural Features ◽

Prediction Algorithm ◽

Protein Splicing ◽

Atpase Subunit ◽

Protein Motifs ◽

Splice Site Prediction ◽

Protein Splice ◽

The One

This study aimed to introduce an algorithm and identify intein motif and blocks involved in protein splicing, and explore the underlying methods in the development of detection of protein motifs. Inteins are mobile protein splicing elements capable of self-splicing post-translationally. They exist in viruses and bacteriophage, notwithstanding this broad phylogenetic distribution, all inteins apportion common structural features. A method was developed to predict intein in a raw sequence, using a ranking and scoring scheme based on amino acid [Formula: see text] value tables. This method aided in the identification and assessment of patterns characterizing the intein sequences. New intein conserved properties are revealed and the known ones are described and localized. We have computed the [Formula: see text] value of each amino acid at block A positions [Formula: see text] to [Formula: see text], block B positions [Formula: see text] to [Formula: see text] and block G positions [Formula: see text]7 to [Formula: see text] for the three categories. The consensus amino acids thus found are listed at the end of each row. We gave statistics for the distance between the blocks, block A to B, block B to F, and block F to G with the average being 66.1, 294, and 10.2 amino acids, respectively. The actual blocks A, B, and G of the one intein found in vacuolar membrane ATPase subunit, a precursor protein, are ranked 1. The results indicate all of the block sequences that are found in nine proteins are ranked at top of the list. The intein sequence is used to search the databases for intein-like proteins. Understanding the functional, structural, and dynamical aspects of inteins is important for intein engineering and the betterment of intein database.

Download Full-text

SpliceCombo: A Hybrid Technique Efficiently Use for Principal Component Analysis of Splice Site Prediction

Ingénierie des systèmes d information ◽

10.18280/isi.240110 ◽

2019 ◽

Vol 24 (1) ◽

pp. 67-75

Author(s):

Srabanti Maji ◽

Soumen Kanrar

Keyword(s):

Principal Component Analysis ◽

Splice Site ◽

Principal Component ◽

Component Analysis ◽

Hybrid Technique ◽

Splice Site Prediction ◽

Site Prediction

Download Full-text

Human Splice-Site Prediction with Deep Neural Networks

Journal of Computational Biology ◽

10.1089/cmb.2018.0041 ◽

2018 ◽

Vol 25 (8) ◽

pp. 954-961 ◽

Cited By ~ 7

Author(s):

Tatsuhiko Naito

Keyword(s):

Neural Networks ◽

Splice Site ◽

Deep Neural Networks ◽

Splice Site Prediction ◽

Site Prediction

Download Full-text

splice site prediction
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Spliceator: multi-species splice site prediction using convolutional neural networks

MutationTaster2021

Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA

Splice site prediction across different organisms: A transfer learning approach

EDeepSSP: Explainable deep neural networks for exact splice sites prediction

Predicting the effect of variants on splicing using Convolutional Neural Networks

SpliceFinder: ab initio prediction of splice sites using convolutional neural network

PSSP: Protein splice site prediction algorithm using Bayesian approach

SpliceCombo: A Hybrid Technique Efficiently Use for Principal Component Analysis of Splice Site Prediction

Human Splice-Site Prediction with Deep Neural Networks

Export Citation Format

splice site predictionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Spliceator: multi-species splice site prediction using convolutional neural networks

MutationTaster2021

Splice2Deep: An ensemble of deep convolutional neural networks for improved splice site prediction in genomic DNA

Splice site prediction across different organisms: A transfer learning approach

EDeepSSP: Explainable deep neural networks for exact splice sites prediction

Predicting the effect of variants on splicing using Convolutional Neural Networks

SpliceFinder: ab initio prediction of splice sites using convolutional neural network

PSSP: Protein splice site prediction algorithm using Bayesian approach

SpliceCombo: A Hybrid Technique Efficiently Use for Principal Component Analysis of Splice Site Prediction

Human Splice-Site Prediction with Deep Neural Networks

splice site prediction
Recently Published Documents