scholarly journals RNANet: an automatically built dual-source dataset integrating homologous sequences and RNA structures

Author(s):  
Louis Becquey ◽  
Eric Angel ◽  
Fariza Tahi

Abstract Motivation Applied research in machine learning progresses faster when a clean dataset is available and ready to use. Several datasets have been proposed and released over the years for specific tasks such as image classification, speech-recognition and more recently for protein structure prediction. However, for the fundamental problem of RNA structure prediction, information is spread between several databases depending on the level we are interested in: sequence, secondary structure, 3D structure or interactions with other macromolecules. In order to speed-up advances in machine-learning based approaches for RNA secondary and/or 3D structure prediction, a dataset integrating all this information is required, to avoid spending time on data gathering and cleaning. Results Here, we propose the first attempt of a standardized and automatically generated dataset dedicated to RNA combining together: RNA sequences, homology information (under the form of position-specific scoring matrices) and information derived by annotation of available 3D structures (including secondary structure, canonical and non-canonical interactions and backbone torsion angles). The data are retrieved from public databases PDB, Rfam and SILVA. The paper describes the procedure to build such dataset and the RNA structure descriptors we provide. Some statistical descriptions of the resulting dataset are also provided. Availability and implementation The dataset is updated every month and available online (in flat-text file format) on the EvryRNA software platform (https://evryrna.ibisc.univ-evry.fr/evryrna/rnanet). An efficient parallel pipeline to build the dataset is also provided for easy reproduction or modification. Supplementary information Supplementary data are available at Bioinformatics online.

2012 ◽  
Vol 28 (23) ◽  
pp. 3058-3065 ◽  
Author(s):  
Jana Sperschneider ◽  
Amitava Datta ◽  
Michael J. Wise

Abstract Motivation Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an important task. Single sequence methods for RNA secondary structure prediction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences, one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity. Results A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of secondary structure elements and H-type pseudoknot candidates. DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support. Availability DotKnot-PW and the RNA structure test set are available at the web site http://dotknot.csse.uwa.edu.au/pw. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Marcin Magnus ◽  
Kalli Kappel ◽  
Rhiju Das ◽  
Janusz M. Bujnicki

Abstract Background The understanding of the importance of RNA has dramatically changed over recent years. As in the case of proteins, the function of an RNA molecule is encoded in its tertiary structure, which in turn is determined by the molecule’s sequence. The prediction of tertiary structures of complex RNAs is still a challenging task. Results Using the observation that RNA sequences from the same RNA family fold into conserved structure, we test herein whether parallel modeling of RNA homologs can improve ab initio RNA structure prediction. EvoClustRNA is a multi-step modeling process, in which homologous sequences for the target sequence are selected using the Rfam database. Subsequently, independent folding simulations using Rosetta FARFAR and SimRNA are carried out. The model of the target sequence is selected based on the most common structural arrangement of the common helical fragments. As a test, on two blind RNA-Puzzles challenges, EvoClustRNA predictions ranked as the first of all submissions for the L-glutamine riboswitch and as the second for the ZMP riboswitch. Moreover, through a benchmark of known structures, we discovered several cases in which particular homologs were unusually amenable to structure recovery in folding simulations compared to the single original target sequence. Conclusion This work, for the first time to our knowledge, demonstrates the importance of the selection of the target sequence from an alignment of an RNA family for the success of RNA 3D structure prediction. These observations prompt investigations into a new direction of research for checking 3D structure “foldability” or “predictability” of related RNA sequences to obtain accurate predictions. To support new research in this area, we provide all relevant scripts in a documented and ready-to-use form. By exploring new ideas and identifying limitations of the current RNA 3D structure prediction methods, this work is bringing us closer to the near-native computational RNA 3D models.


2020 ◽  
Vol 15 (2) ◽  
pp. 135-143
Author(s):  
Sha Shi ◽  
Xin-Li Zhang ◽  
Le Yang ◽  
Wei Du ◽  
Xian-Li Zhao ◽  
...  

Background: The prediction of RNA secondary structure using optimization algorithms is key to understand the real structure of an RNA. Evolutionary algorithms (EAs) are popular strategies for RNA secondary structure prediction. However, compared to most state-of-the-art software based on DPAs, the performances of EAs are a bit far from satisfactory. Objective: Therefore, a more powerful strategy is required to improve the performances of EAs when applied to the prediciton of RNA secondary structures. Methods: The idea of quantum computing is introduced here yielding a new strategy to find all possible legal paired-bases with the constraint of minimum free energy. The sate of a stem pool with size N is encoded as a population of QGA, which is represented by N quantum bits but not classical bits. The updating of populations is accomplished by so-called quantum crossover operations, quantum mutation operations and quantum rotation operations. Results: The numerical results show that the performances of traditional EAs are significantly improved by using QGA with regard to not only prediction accuracy and sensitivity but also complexity. Moreover, for RNA sequences with middle-short length, QGA even improves the state-of-art software based on DPAs in terms of both prediction accuracy and sensitivity. Conclusion: This work sheds an interesting light on the applications of quantum computing on RNA structure prediction.


2017 ◽  
Vol 1 (3) ◽  
pp. 275-285 ◽  
Author(s):  
Bernhard C. Thiel ◽  
Christoph Flamm ◽  
Ivo L. Hofacker

We summarize different levels of RNA structure prediction, from classical 2D structure to extended secondary structure and motif-based research toward 3D structure prediction of RNA. We outline the importance of classical secondary structure during all those levels of structure prediction.


Author(s):  
Grace Meng ◽  
Marva Tariq ◽  
Swati Jain ◽  
Shereef Elmetwaly ◽  
Tamar Schlick

Abstract Summary We launch a webserver for RNA structure prediction and design corresponding to tools developed using our RNA-As-Graphs (RAG) approach. RAG uses coarse-grained tree graphs to represent RNA secondary structure, allowing the application of graph theory to analyze and advance RNA structure discovery. Our webserver consists of three modules: (a) RAG Sampler: samples tree graph topologies from an RNA secondary structure to predict corresponding tertiary topologies, (b) RAG Builder: builds three-dimensional atomic models from candidate graphs generated by RAG Sampler, and (c) RAG Designer: designs sequences that fold onto novel RNA motifs (described by tree graph topologies). Results analyses are performed for further assessment/selection. The Results page provides links to download results and indicates possible errors encountered. RAG-Web offers a user-friendly interface to utilize our RAG software suite to predict and design RNA structures and sequences. Availability and implementation The webserver is freely available online at: http://www.biomath.nyu.edu/ragtop/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Marcin Magnus ◽  
Kalli Kappel ◽  
Rhiju Das ◽  
Janusz Bujnicki

Abstract Background The understanding of the importance of RNA has dramatically changed over recent years. As in the case of proteins, the function of an RNA molecule is encoded in its tertiary structure, which in turn is determined by the molecule's sequence. The prediction of tertiary structures of complex RNAs is still a challenging task. Results Using the observation that RNA sequences from the same RNA family fold into conserved structure, we test herein whether parallel modeling of RNA homologs can improve ab initio RNA structure prediction method. EvoClustRNA is a multi- step modeling process, in which homologous sequences for the target sequence are selected using the Rfam database. Subsequently, independent folding simulations using Rosetta FARFAR and SimRNA are carried out. The model of the target sequence is selected based on the most common structural arrangement of the common helical fragments. As a test, on two blind RNA-Puzzles challenges, EvoClustRNA predictions ranked as the first of all submissions for the L-glutamine riboswitch and as the second for the ZMP riboswitch. Conclusion Through a benchmark of known structures, we discovered several cases in which particular homologs were unusually amenable to structure recovery in folding simulations compared to the single original target sequence.


2019 ◽  
Vol 39 (2) ◽  
Author(s):  
Almudena Ponce-Salvatierra ◽  
Astha ◽  
Katarzyna Merdas ◽  
Chandran Nithin ◽  
Pritha Ghosh ◽  
...  

Abstract RNA molecules are master regulators of cells. They are involved in a variety of molecular processes: they transmit genetic information, sense cellular signals and communicate responses, and even catalyze chemical reactions. As in the case of proteins, RNA function is dictated by its structure and by its ability to adopt different conformations, which in turn is encoded in the sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore the majority of known RNAs remain structurally uncharacterized. To address this problem, predictive computational methods were developed based on the accumulated knowledge of RNA structures determined so far, the physical basis of the RNA folding, and taking into account evolutionary considerations, such as conservation of functionally important motifs. However, all theoretical methods suffer from various limitations, and they are generally unable to accurately predict structures for RNA sequences longer than 100-nt residues unless aided by additional experimental data. In this article, we review experimental methods that can generate data usable by computational methods, as well as computational approaches for RNA structure prediction that can utilize data from experimental analyses. We outline methods and data types that can be potentially useful for RNA 3D structure modeling but are not commonly used by the existing software, suggesting directions for future development.


2019 ◽  
Vol 35 (20) ◽  
pp. 4004-4010 ◽  
Author(s):  
Zafer Aydin ◽  
Nuh Azginoglu ◽  
Halil Ibrahim Bilgin ◽  
Mete Celik

Abstract Motivation Predicting secondary structure and solvent accessibility of proteins are among the essential steps that preclude more elaborate 3D structure prediction tasks. Incorporating class label information contained in templates with known structures has the potential to improve the accuracy of prediction methods. Building a structural profile matrix is one such technique that provides a distribution for class labels at each amino acid position of the target. Results In this paper, a new structural profiling technique is proposed that is based on deriving PFAM families and is combined with an existing approach. Cross-validation experiments on two benchmark datasets and at various similarity intervals demonstrate that the proposed profiling strategy performs significantly better than Homolpro, a state-of-the-art method for incorporating template information, as assessed by statistical hypothesis tests. Availability and implementation The DSPRED method can be accessed by visiting the PSP server at http://psp.agu.edu.tr. Source code and binaries are freely available at https://github.com/yusufzaferaydin/dspred. Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 20 (04) ◽  
pp. 455-469
Author(s):  
RAJASEKHAR KAKUMANI ◽  
M. OMAIR AHMAD ◽  
VIJAY KUMAR DEVABHAKTUNI

Prediction of ribonucleic acid (RNA) secondary structure is an important task in bioinformatics. The RNA structure is known to influence its biological functionality. RNA secondary structure contains many substructures such as stems, loops and pseudoknots. The substructure pseudoknot occurs in several classes of RNAs, and plays a vital role in many biological processes. Prediction of pseudoknots in RNA is challenging and still an open research problem. Several computational methods based on dynamic programming, genetic algorithms, statistical models, etc., have been proposed with varying success. In this paper, we employ matched filtering approach to determine the RNA secondary structure containing pseudoknots. The central idea is to use a matched filter to identify the longest possible stem patterns in the base-pairing matrix of an RNA. The stem patterns obtained are then used to determine the locations of the other substructures such as loops and pseudoknots present in the RNA. Comparison of the prediction results, for RNA sequences derived from PseudoBase, illustrate the effectiveness and the accuracy of our proposed approach as compared to some of the existing popular RNA secondary structure prediction methods.


2018 ◽  
Author(s):  
Riccardo Delli ponti ◽  
Alexandros Armaos ◽  
Stefanie Marti ◽  
Gian Gaetano Tartaglia

AbstractTo compare the secondary structures of RNA molecules we developed the CROSSalign method. CROSSalign is based on the combination of the Computational Recognition Of Secondary Structure (CROSS) algorithm to predict the RNA secondary structure at single-nucleotide resolution using sequence information, and the Dynamic Time Warping (DTW) method to align profiles of different lengths. We applied CROSSalign to investigate the structural conservation of long non-coding RNAs such as XIST and HOTAIR as well as ssRNA viruses including HIV. In a pool of sequences with the same secondary structure CROSSalign accurately recognizes repeat A of XIST and domain D2 of HOTAIR and outperforms other methods based on covariance modelling. CROSSalign can be applied to perform pair-wise comparisons and is able to find homologues between thousands of matches identifying the exact regions of similarity between profiles of different lengths. The algorithm is freely available at the webpage http://service.tartaglialab.com//new_submission/CROSSalign.


Sign in / Sign up

Export Citation Format

Share Document