Direct inference of base-pairing probabilities with neural networks improves RNA secondary structure prediction with pseudoknots

AbstractMotivationExisting approaches for predicting RNA secondary structures depend on howto decompose a secondary structure into substructures, so-called the architecture, to define their parameter space. However, the architecture has not been sufficiently investigated especially for pseudoknotted secondary structures.ResultsIn this paper, we propose a novel algorithm to directly infer base-pairing probabilities with neural networks that does not depend on the architecture of RNA secondary structures, followed by performing the maximum expected accuracy (MEA) based decoding algorithms; Nussinov-style decoding for pseudoknot-free structures, and IPknot-style decoding for pseudoknotted structures. To train the neural networks connected to each base-pair, we adopt a max-margin framework, called structured support vector machines (SSVM), as the output layer. Our benchmarks for predicting RNA secondary structures with and without pseudoknots show that our algorithm achieves the best prediction accuracy compared with existing methods.AvailabilityThe source code is available at https://github.com/keio-bioinformatics/neuralfold/[email protected]

Download Full-text

RNA secondary structure prediction using deep learning with thermodynamic integration

10.1101/2020.08.10.244442 ◽

2020 ◽

Author(s):

Kengo Sato ◽

Manato Akiyama ◽

Yasubumi Sakakibara

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Thermodynamic Integration ◽

Rna Secondary Structure Prediction ◽

Rna Secondary Structures ◽

Non Coding Rnas

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.

Download Full-text

A Combination of Support Vector Machines and Bidirectional Recurrent Neural Networks for Protein Secondary Structure Prediction

AI*IA 2003: Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-39853-0_12 ◽

2003 ◽

pp. 142-153 ◽

Cited By ~ 4

Author(s):

Alessio Ceroni ◽

Paolo Frasconi ◽

Andrea Passerini ◽

Alessandro Vullo

Keyword(s):

Neural Networks ◽

Support Vector Machines ◽

Secondary Structure ◽

Recurrent Neural Networks ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Support Vector ◽

Protein Secondary Structure Prediction ◽

Vector Machines

Download Full-text

A NON-PARAMETRIC BAYESIAN APPROACH FOR PREDICTING RNA SECONDARY STRUCTURES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720010004926 ◽

2010 ◽

Vol 08 (04) ◽

pp. 727-742 ◽

Cited By ~ 8

Author(s):

KENGO SATO ◽

MICHIAKI HAMADA ◽

TOUTAI MITUYAMA ◽

KIYOSHI ASAI ◽

YASUBUMI SAKAKIBARA

Keyword(s):

Secondary Structure ◽

Bayesian Approach ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Generative Models ◽

Rna Secondary Structures ◽

Stochastic Context Free Grammars ◽

Non Parametric

Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models.

Download Full-text

RNA secondary structure prediction using deep learning with thermodynamic integration

Nature Communications ◽

10.1038/s41467-021-21194-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Kengo Sato ◽

Manato Akiyama ◽

Yasubumi Sakakibara

Keyword(s):

Free Energy ◽

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Rna Secondary Structures ◽

Non Coding Rnas

AbstractAccurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner’s nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure.

Download Full-text

Caveats to deep learning approaches to RNA secondary structure prediction

10.1101/2021.12.14.472648 ◽

2021 ◽

Author(s):

Christoph Flamm ◽

Julia Wielach ◽

Michael T. Wolfinger ◽

Stefan Badelt ◽

Ronny Lorenz ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Training Data ◽

Sequence Length ◽

Learning Approaches ◽

Rna Secondary Structures

Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. These methods even yield exact solutions under certain simplifying assumptions. Nevertheless, the accuracy of these classical methods is limited and has seen little improvement over the last decade. This makes it an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data that can not only be generated in arbitrary amounts, but are also guaranteed to be free of biases. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.

Download Full-text

Comparative Sequence Analysis and Patterns of Covariation in RNA Secondary Structures

Genetics ◽

10.1093/genetics/154.2.909 ◽

2000 ◽

Vol 154 (2) ◽

pp. 909-921 ◽

Cited By ~ 4

Author(s):

John Parsch ◽

John M Braverman ◽

Wolfgang Stephan

Keyword(s):

Structure Prediction ◽

Secondary Structure Prediction ◽

Rnase P ◽

Secondary Structures ◽

Comparative Sequence Analysis ◽

Small Subunit ◽

Physical Parameters ◽

Small Subunit Rrna ◽

Base Pairing ◽

Compensatory Evolution

Abstract A novel method of RNA secondary structure prediction based on a comparison of nucleotide sequences is described. This method correctly predicts nearly all evolutionarily conserved secondary structures of five different RNAs: tRNA, 5S rRNA, bacterial ribonuclease P (RNase P) RNA, eukaryotic small subunit rRNA, and the 3′ untranslated region (UTR) of the Drosophila bicoid (bcd) mRNA. Furthermore, covariations occurring in the helices of these conserved RNA structures are analyzed. Two physical parameters are found to be important determinants of the evolution of compensatory mutations: the length of a helix and the distance between base-pairing nucleotides. For the helices of bcd 3′ UTR mRNA and RNase P RNA, a positive correlation between the rate of compensatory evolution and helix length is found. The analysis of Drosophila bcd 3′ UTR mRNA further revealed that the rate of compensatory evolution decreases with the physical distance between base-pairing residues. This result is in qualitative agreement with Kimura's model of compensatory fitness interactions, which assumes that mutations occurring in RNA helices are individually deleterious but become neutral in appropriate combinations.

Download Full-text

Protein Secondary Structure Prediction Using Support Vector Machines (SVMs)

2013 International Conference on Machine Intelligence and Research Advancement ◽

10.1109/icmira.2013.124 ◽

2013 ◽

Author(s):

Hitesh Shah

Keyword(s):

Support Vector Machines ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Secondary Structure ◽

Support Vector ◽

Protein Secondary Structure Prediction ◽

Vector Machines

Download Full-text

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

10.1101/205047 ◽

2017 ◽

Cited By ~ 1

Author(s):

Manato Akiyama ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Training Data ◽

Support Vector ◽

Rna Secondary Structure Prediction ◽

Fine Grained

AbstractMotivation: A popular approach for predicting RNA secondary structure is the thermodynamic nearest neighbor model that finds a thermodynamically most stable secondary structure with the minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such model has been reported.Results: In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach. Ourfine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the ℓ1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed.Availability: The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold.Contact:[email protected]

Download Full-text