RNA secondary structure prediction using deep learning with thermodynamic integration

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.

Download Full-text

RNA secondary structure prediction using deep learning with thermodynamic integration

Nature Communications ◽

10.1038/s41467-021-21194-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Kengo Sato ◽

Manato Akiyama ◽

Yasubumi Sakakibara

Keyword(s):

Free Energy ◽

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Rna Secondary Structures ◽

Non Coding Rnas

AbstractAccurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner’s nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure.

Download Full-text

Deep Learning Method for RNA Secondary Structure Prediction with Pseudoknots Based on Large-Scale Data

Journal of Healthcare Engineering ◽

10.1155/2021/6699996 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Bowen Shen ◽

Hao Zhang ◽

Cong Li ◽

Tianheng Zhao ◽

Yuanning Liu

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Large Scale ◽

Secondary Structure Prediction ◽

Learning Methods ◽

Rna Secondary Structure Prediction ◽

Large Scale Data ◽

Scale Data

Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.

Download Full-text

UFold: Fast and Accurate RNA Secondary Structure Prediction with Deep Learning

10.1101/2020.08.17.254896 ◽

2020 ◽

Author(s):

Yingxin Cao ◽

Laiyi Fu ◽

Jie Wu ◽

Qing Nie ◽

Xiaohui Xie

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Correction Function ◽

Thermodynamic Models ◽

Rna Secondary Structure Prediction ◽

Rna Molecules ◽

Online Web

AbstractFor many RNA molecules, the secondary structure is essential for the correction function of the RNA. Predicting RNA secondary structure from nucleotide sequences is a long-standing problem in genomics, but the prediction performance has reached a plateau over time. Traditional RNA secondary structure prediction algorithms are primarily based on thermodynamic models through free energy minimization. Here we propose a deep learning-based method, called UFold, for RNA secondary structure prediction, trained directly on annotated data without any thermodynamic assumptions. UFold improves substantially upon previous models, with approximately 31% improvement over traditional thermodynamic models and 24.5% improvement over other learning-based methods. It achieves an F1 score of 0.96 on base pair prediction accuracy. An online web server running UFold is publicly available at http://ufold.ics.uci.edu.

Download Full-text

A NON-PARAMETRIC BAYESIAN APPROACH FOR PREDICTING RNA SECONDARY STRUCTURES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720010004926 ◽

2010 ◽

Vol 08 (04) ◽

pp. 727-742 ◽

Cited By ~ 8

Author(s):

KENGO SATO ◽

MICHIAKI HAMADA ◽

TOUTAI MITUYAMA ◽

KIYOSHI ASAI ◽

YASUBUMI SAKAKIBARA

Keyword(s):

Secondary Structure ◽

Bayesian Approach ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Generative Models ◽

Rna Secondary Structures ◽

Stochastic Context Free Grammars ◽

Non Parametric

Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models.

Download Full-text

NNfold: RNA Secondary Structure Prediction by Deep Learning with an Architecture Imposing Contextual Constraining

SSRN Electronic Journal ◽

10.2139/ssrn.3813288 ◽

2021 ◽

Author(s):

Christophe Van Neste ◽

Ramzan Umarov ◽

Yu Li ◽

Adil Salhi ◽

Hiroyuki Kuwahara ◽

...

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Rna Secondary Structure Prediction

Download Full-text

Caveats to deep learning approaches to RNA secondary structure prediction

10.1101/2021.12.14.472648 ◽

2021 ◽

Author(s):

Christoph Flamm ◽

Julia Wielach ◽

Michael T. Wolfinger ◽

Stefan Badelt ◽

Ronny Lorenz ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Training Data ◽

Sequence Length ◽

Learning Approaches ◽

Rna Secondary Structures

Machine learning (ML) and in particular deep learning techniques have gained popularity for predicting structures from biopolymer sequences. An interesting case is the prediction of RNA secondary structures, where well established biophysics based methods exist. These methods even yield exact solutions under certain simplifying assumptions. Nevertheless, the accuracy of these classical methods is limited and has seen little improvement over the last decade. This makes it an attractive target for machine learning and consequently several deep learning models have been proposed in recent years. In this contribution we discuss limitations of current approaches, in particular due to biases in the training data. Furthermore, we propose to study capabilities and limitations of ML models by first applying them on synthetic data that can not only be generated in arbitrary amounts, but are also guaranteed to be free of biases. We apply this idea by testing several ML models of varying complexity. Finally, we show that the best models are capable of capturing many, but not all, properties of RNA secondary structures. Most severely, the number of predicted base pairs scales quadratically with sequence length, even though a secondary structure can only accommodate a linear number of pairs.

Download Full-text