P-DCFOLD OR HOW TO PREDICT ALL KINDS OF PSEUDOKNOTS IN RNA SECONDARY STRUCTURES

2005 ◽  
Vol 14 (05) ◽  
pp. 703-716 ◽  
Author(s):  
FARIZA TAHI ◽  
ENGELEN STEFAN ◽  
MIREILLE REGNIER

Pseudoknots play important roles in many RNAs. But for computational reasons, pseudoknots are usually excluded from the definition of RNA secondary structures. Indeed, prediction of pseudoknots increase very highly the complexities in time of the algorithms, knowing that all existing algorithms for RNA secondary structure prediction have complexities at least of O(n3). Some algorithms have been developed for searching pseudoknots, but all of them have very high complexities, and consider generally particular kinds of pseudoknots. We present an algorithm, called P-DCFold based on the comparative approach, for the prediction of RNA secondary structures including all kinds of pseudoknots. The helices are searched recursively using the "Divide and Conquer" approach, searching the helices from the "most significant" to the "less significant". A selected helix subdivide the sequence into two sub-sequences, the internal one and a concatenation of the two externals. This approach is used to search non-interleaved helices and allows to limit the space of searching. To search for pseudoknots, the processing is reiterated. Therefore, each helix of the pseudoknot is selected in a different step. P-DCFold has been applied to several RNA sequences. In less than two seconds, their respective secondary structures, including their pseudoknots, have been recovered very efficiently.

2020 ◽  
Author(s):  
Kengo Sato ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.


2020 ◽  
Vol 36 (8) ◽  
pp. 2451-2457
Author(s):  
Louis Becquey ◽  
Eric Angel ◽  
Fariza Tahi

Abstract Motivation RNA loops have been modelled and clustered from solved 3D structures into ordered collections of recurrent non-canonical interactions called ‘RNA modules’, available in databases. This work explores what information from such modules can be used to improve secondary structure prediction. We propose a bi-objective method for predicting RNA secondary structures by minimizing both an energy-based and a knowledge-based potential. The tool, called BiORSEO, outputs secondary structures corresponding to the optimal solutions from the Pareto set. Results We compare several approaches to predict secondary structures using inserted RNA modules information: two module data sources, Rna3Dmotif and the RNA 3D Motif Atlas, and different ways to score the module insertions: module size, module complexity or module probability according to models like JAR3D and BayesPairing. We benchmark them against a large set of known secondary structures, including some state-of-the-art tools, and comment on the usefulness of the half physics-based, half data-based approach. Availability and implementation The software is available for download on the EvryRNA website, as well as the datasets. Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Vol 4 (3) ◽  
Author(s):  
Mária Šimalová ◽  
Gabriela Andrejková

AbstractIn the paper, we describe and develop more effective solutions of two important problems in bioinformatics. The first problem is the multiple sequence alignment problem and the second problem is RNA secondary structure prediction (folding) problem. Each of these problems should be solved with better results if we know the solution of the other one, but usually we only have sequences and we know neither the alignment nor the secondary structure. Precise algorithms solving both of these problems simultaneously are computationally pretentious according to the big length of RNA sequences. In this paper, we have described the method of speeding up the Sankoff’s simultaneous alignment and folding algorithm using the Carrillo-Lipman approach to cut off those computations, that can never lead to an optimal solution.


2006 ◽  
Vol 7 (1) ◽  
pp. 37-43 ◽  
Author(s):  
T. A. Hughes ◽  
J. N. McElwaine

Secondary structures within the 5′ untranslated regions of messenger RNAs can have profound effects on the efficiency of translation of their messages and thereby on gene expression. Consequently they can act as important regulatory motifs in both physiological and pathological settings. Current approaches to predicting the secondary structure of these RNA sequences find the structure with the global-minimum free energy. However, since RNA folds progressively from the 5′ end when synthesised or released from the translational machinery, this may not be the most probable structure. We discuss secondary structure prediction based on local-minimisation of free energy with thermodynamic fluctuations as nucleotides are added to the 3′ end and show that these can result in different secondary structures. We also discuss approaches for studying the extent of the translational inhibition specified by structures within the 5′ untranslated region.


2019 ◽  
Vol 17 (05) ◽  
pp. 1950031 ◽  
Author(s):  
Abdelhakim El Fatmi ◽  
M. Ali Bekri ◽  
Said Benhlima

The prediction of the optimal secondary structure for a given RNA sequence represents a challenging computational problem in bioinformatics. This challenge becomes harder especially with the discovery of different pseudoknot classes, which is a complex topology that plays diverse roles in biological processes. Many recent studies have been proposed to predict RNA secondary structure with some pseudoknot classes, but only a few of them have reached satisfying results in terms of both complexity and accuracy. Here we present RNAknot, a new method for predicting RNA secondary structure that contains the following components: stems, hairpin loops, multi-branched loops or multi-loops, bulge loops, and internal loops, in addition to two types of pseudoknots, H-type pseudoknot and Hairpin kissing. RNAknot is based on a genetic algorithm and Greedy Randomized Adaptive Search Procedure (GRASP), and it uses the free energy as fitness function to evaluate the obtained structures. In order to validate the performance of the presented method 131 tests have been performed using two datasets of 26 and 105 RNA sequences, which have been taken from the two data bases RNAstrand and Pseudobase respectively. The obtained results are compared with those of some RNA secondary structure prediction programs such as Vs_subopt, CyloFold, IPknot, Kinefold, RNAstructure, and Sfold. The results of this comparative study show that the prediction accuracy of our proposed approach is significantly improved compared to those obtained by the other programs. For the first dataset, RNAknot has the highest specificity (SP) (71.23%) and sensitivity (SN) (72.15%) averages compared to the other programs. Concerning the second dataset, the RNA secondary structure predictions obtained by the RNAknot correspond to the highest averages of SP (85.49%) and F-measure (79.97%) compared to the other programs. The program is available as a jar file in the link: www.bachmek.umi.ac.ma/wp-content/uploads/RNAknot.0.0.2.rar .


2010 ◽  
Vol 08 (04) ◽  
pp. 727-742 ◽  
Author(s):  
KENGO SATO ◽  
MICHIAKI HAMADA ◽  
TOUTAI MITUYAMA ◽  
KIYOSHI ASAI ◽  
YASUBUMI SAKAKIBARA

Since many functional RNAs form stable secondary structures which are related to their functions, RNA secondary structure prediction is a crucial problem in bioinformatics. We propose a novel model for generating RNA secondary structures based on a non-parametric Bayesian approach, called hierarchical Dirichlet processes for stochastic context-free grammars (HDP-SCFGs). Here non-parametric means that some meta-parameters, such as the number of non-terminal symbols and production rules, do not have to be fixed. Instead their distributions are inferred in order to be adapted (in the Bayesian sense) to the training sequences provided. The results of our RNA secondary structure predictions show that HDP-SCFGs are more accurate than the MFE-based and other generative models.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Kengo Sato ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

AbstractAccurate predictions of RNA secondary structures can help uncover the roles of functional non-coding RNAs. Although machine learning-based models have achieved high performance in terms of prediction accuracy, overfitting is a common risk for such highly parameterized models. Here we show that overfitting can be minimized when RNA folding scores learnt using a deep neural network are integrated together with Turner’s nearest-neighbor free energy parameters. Training the model with thermodynamic regularization ensures that folding scores and the calculated free energy are as close as possible. In computational experiments designed for newly discovered non-coding RNAs, our algorithm (MXfold2) achieves the most robust and accurate predictions of RNA secondary structures without sacrificing computational efficiency compared to several other algorithms. The results suggest that integrating thermodynamic information could help improve the robustness of deep learning-based predictions of RNA secondary structure.


2018 ◽  
Author(s):  
Manato Akiyama ◽  
Yasubumi Sakakibara ◽  
Kengo Sato

AbstractMotivationExisting approaches for predicting RNA secondary structures depend on howto decompose a secondary structure into substructures, so-called the architecture, to define their parameter space. However, the architecture has not been sufficiently investigated especially for pseudoknotted secondary structures.ResultsIn this paper, we propose a novel algorithm to directly infer base-pairing probabilities with neural networks that does not depend on the architecture of RNA secondary structures, followed by performing the maximum expected accuracy (MEA) based decoding algorithms; Nussinov-style decoding for pseudoknot-free structures, and IPknot-style decoding for pseudoknotted structures. To train the neural networks connected to each base-pair, we adopt a max-margin framework, called structured support vector machines (SSVM), as the output layer. Our benchmarks for predicting RNA secondary structures with and without pseudoknots show that our algorithm achieves the best prediction accuracy compared with existing methods.AvailabilityThe source code is available at https://github.com/keio-bioinformatics/neuralfold/[email protected]


Sign in / Sign up

Export Citation Format

Share Document