scholarly journals Predicting pseudoknotted structures across two RNA sequences

2012 ◽  
Vol 28 (23) ◽  
pp. 3058-3065 ◽  
Author(s):  
Jana Sperschneider ◽  
Amitava Datta ◽  
Michael J. Wise

Abstract Motivation Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an important task. Single sequence methods for RNA secondary structure prediction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences, one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity. Results A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of secondary structure elements and H-type pseudoknot candidates. DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support. Availability DotKnot-PW and the RNA structure test set are available at the web site http://dotknot.csse.uwa.edu.au/pw. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Jaswinder Singh ◽  
Jack Hanson ◽  
Kuldip Paliwal ◽  
Yaoqi Zhou

AbstractThe majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only $$<$$<250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of $$> $$>10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.


2020 ◽  
Vol 15 (2) ◽  
pp. 135-143
Author(s):  
Sha Shi ◽  
Xin-Li Zhang ◽  
Le Yang ◽  
Wei Du ◽  
Xian-Li Zhao ◽  
...  

Background: The prediction of RNA secondary structure using optimization algorithms is key to understand the real structure of an RNA. Evolutionary algorithms (EAs) are popular strategies for RNA secondary structure prediction. However, compared to most state-of-the-art software based on DPAs, the performances of EAs are a bit far from satisfactory. Objective: Therefore, a more powerful strategy is required to improve the performances of EAs when applied to the prediciton of RNA secondary structures. Methods: The idea of quantum computing is introduced here yielding a new strategy to find all possible legal paired-bases with the constraint of minimum free energy. The sate of a stem pool with size N is encoded as a population of QGA, which is represented by N quantum bits but not classical bits. The updating of populations is accomplished by so-called quantum crossover operations, quantum mutation operations and quantum rotation operations. Results: The numerical results show that the performances of traditional EAs are significantly improved by using QGA with regard to not only prediction accuracy and sensitivity but also complexity. Moreover, for RNA sequences with middle-short length, QGA even improves the state-of-art software based on DPAs in terms of both prediction accuracy and sensitivity. Conclusion: This work sheds an interesting light on the applications of quantum computing on RNA structure prediction.


2012 ◽  
Vol 20 (04) ◽  
pp. 455-469
Author(s):  
RAJASEKHAR KAKUMANI ◽  
M. OMAIR AHMAD ◽  
VIJAY KUMAR DEVABHAKTUNI

Prediction of ribonucleic acid (RNA) secondary structure is an important task in bioinformatics. The RNA structure is known to influence its biological functionality. RNA secondary structure contains many substructures such as stems, loops and pseudoknots. The substructure pseudoknot occurs in several classes of RNAs, and plays a vital role in many biological processes. Prediction of pseudoknots in RNA is challenging and still an open research problem. Several computational methods based on dynamic programming, genetic algorithms, statistical models, etc., have been proposed with varying success. In this paper, we employ matched filtering approach to determine the RNA secondary structure containing pseudoknots. The central idea is to use a matched filter to identify the longest possible stem patterns in the base-pairing matrix of an RNA. The stem patterns obtained are then used to determine the locations of the other substructures such as loops and pseudoknots present in the RNA. Comparison of the prediction results, for RNA sequences derived from PseudoBase, illustrate the effectiveness and the accuracy of our proposed approach as compared to some of the existing popular RNA secondary structure prediction methods.


Author(s):  
Louis Becquey ◽  
Eric Angel ◽  
Fariza Tahi

Abstract Motivation Applied research in machine learning progresses faster when a clean dataset is available and ready to use. Several datasets have been proposed and released over the years for specific tasks such as image classification, speech-recognition and more recently for protein structure prediction. However, for the fundamental problem of RNA structure prediction, information is spread between several databases depending on the level we are interested in: sequence, secondary structure, 3D structure or interactions with other macromolecules. In order to speed-up advances in machine-learning based approaches for RNA secondary and/or 3D structure prediction, a dataset integrating all this information is required, to avoid spending time on data gathering and cleaning. Results Here, we propose the first attempt of a standardized and automatically generated dataset dedicated to RNA combining together: RNA sequences, homology information (under the form of position-specific scoring matrices) and information derived by annotation of available 3D structures (including secondary structure, canonical and non-canonical interactions and backbone torsion angles). The data are retrieved from public databases PDB, Rfam and SILVA. The paper describes the procedure to build such dataset and the RNA structure descriptors we provide. Some statistical descriptions of the resulting dataset are also provided. Availability and implementation The dataset is updated every month and available online (in flat-text file format) on the EvryRNA software platform (https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet). An efficient parallel pipeline to build the dataset is also provided for easy reproduction or modification. Supplementary information Supplementary data are available at Bioinformatics online.


RNA ◽  
2010 ◽  
Vol 16 (6) ◽  
pp. 1108-1117 ◽  
Author(s):  
S. Quarrier ◽  
J. S. Martin ◽  
L. Davis-Neulander ◽  
A. Beauregard ◽  
A. Laederach

2020 ◽  
Vol 36 (20) ◽  
pp. 5021-5026 ◽  
Author(s):  
Gang Xu ◽  
Qinghua Wang ◽  
Jianpeng Ma

Abstract Motivation Predictions of protein backbone torsion angles (ϕ and ψ) and secondary structure from sequence are crucial subproblems in protein structure prediction. With the development of deep learning approaches, their accuracies have been significantly improved. To capture the long-range interactions, most studies integrate bidirectional recurrent neural networks into their models. In this study, we introduce and modify a recently proposed architecture named Transformer to capture the interactions between the two residues theoretically with arbitrary distance. Moreover, we take advantage of multitask learning to improve the generalization of neural network by introducing related tasks into the training process. Similar to many previous studies, OPUS-TASS uses an ensemble of models and achieves better results. Results OPUS-TASS uses the same training and validation sets as SPOT-1D. We compare the performance of OPUS-TASS and SPOT-1D on TEST2016 (1213 proteins) and TEST2018 (250 proteins) proposed in the SPOT-1D paper, CASP12 (55 proteins), CASP13 (32 proteins) and CASP-FM (56 proteins) proposed in the SAINT paper, and a recently released PDB structure collection from CAMEO (93 proteins) named as CAMEO93. On these six test sets, OPUS-TASS achieves consistent improvements in both backbone torsion angles prediction and secondary structure prediction. On CAMEO93, SPOT-1D achieves the mean absolute errors of 16.89 and 23.02 for ϕ and ψ predictions, respectively, and the accuracies for 3- and 8-state secondary structure predictions are 87.72 and 77.15%, respectively. In comparison, OPUS-TASS achieves 16.56 and 22.56 for ϕ and ψ predictions, and 89.06 and 78.87% for 3- and 8-state secondary structure predictions, respectively. In particular, after using our torsion angles refinement method OPUS-Refine as the post-processing procedure for OPUS-TASS, the mean absolute errors for final ϕ and ψ predictions are further decreased to 16.28 and 21.98, respectively. Availability and implementation The training and the inference codes of OPUS-TASS and its data are available at https://github.com/thuxugang/opus_tass. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document