RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning

AbstractThe majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only $$<$$<250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of $$> $$>10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.

Download Full-text

Predicting pseudoknotted structures across two RNA sequences

Bioinformatics ◽

10.1093/bioinformatics/bts575 ◽

2012 ◽

Vol 28 (23) ◽

pp. 3058-3065 ◽

Cited By ~ 4

Author(s):

Jana Sperschneider ◽

Amitava Datta ◽

Michael J. Wise

Keyword(s):

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Prediction Method ◽

Supplementary Information ◽

Rna Structures ◽

Rna Sequences ◽

Test Set ◽

Comparative Structure

Abstract Motivation Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an important task. Single sequence methods for RNA secondary structure prediction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences, one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity. Results A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of secondary structure elements and H-type pseudoknot candidates. DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support. Availability DotKnot-PW and the RNA structure test set are available at the web site http://dotknot.csse.uwa.edu.au/pw. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Evaluation of the information content of RNA structure mapping data for secondary structure prediction

RNA ◽

10.1261/rna.1988510 ◽

2010 ◽

Vol 16 (6) ◽

pp. 1108-1117 ◽

Cited By ~ 39

Author(s):

S. Quarrier ◽

J. S. Martin ◽

L. Davis-Neulander ◽

A. Beauregard ◽

A. Laederach

Keyword(s):

Secondary Structure ◽

Information Content ◽

Rna Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Structure Mapping ◽

Mapping Data ◽

Rna Structure Mapping

Download Full-text

Quantitative high-throughput tests of ubiquitous RNA secondary structure prediction algorithms via RNA/protein binding

10.1101/571588 ◽

2019 ◽

Cited By ~ 2

Author(s):

Winston R. Becker ◽

Inga Jarmoskaite ◽

Kalli Kappel ◽

Pavanapuresan P. Vaidyanathan ◽

Sarah K. Denny ◽

...

Keyword(s):

Secondary Structure ◽

Protein Binding ◽

Rna Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Nearest Neighbor ◽

Secondary Structure Prediction ◽

Structural Features ◽

Vast Number ◽

Prediction Algorithms

AbstractNearest-neighbor (NN) rules provide a simple and powerful quantitative framework for RNA structure prediction that is strongly supported for canonical Watson-Crick duplexes from a plethora of thermodynamic measurements. Predictions of RNA secondary structure based on nearest-neighbor (NN) rules are routinely used to understand biological function and to engineer and control new functions in biotechnology. However, NN applications to RNA structural features such as internal and terminal loops rely on approximations and assumptions, with sparse experimental coverage of the vast number of possible sequence and structural features. To test to what extent NN rules accurately predict thermodynamic stabilities across RNAs with non-WC features, we tested their predictions using a quantitative high-throughput assay platform, RNA-MaP. Using a thermodynamic assay with coupled protein binding, we carried out equilibrium measurements for over 1000 RNAs with a range of predicted secondary structure stabilities. Our results revealed substantial scatter and systematic deviations between NN predictions and observed stabilities. Solution salt effects and incorrect or omitted loop parameters contribute to these observed deviations. Our results demonstrate the need to independently and quantitatively test NN computational algorithms to identify their capabilities and limitations. RNA-MaP and related approaches can be used to test computational predictions and can be adapted to obtain experimental data to improve RNA secondary structure and other prediction algorithms.Significance statementRNA secondary structure prediction algorithms are routinely used to understand, predict and design functional RNA structures in biology and biotechnology. Given the vast number of RNA sequence and structural features, these predictions rely on a series of approximations, and independent tests are needed to quantitatively evaluate the accuracy of predicted RNA structural stabilities. Here we measure the stabilities of over 1000 RNA constructs by using a coupled protein binding assay. Our results reveal substantial deviations from the RNA stabilities predicted by popular algorithms, and identify factors contributing to the observed deviations. We demonstrate the importance of quantitative, experimental tests of computational RNA structure predictions and present an approach that can be used to routinely test and improve the prediction accuracy.

Download Full-text

Prediction of RNA Secondary Structure Using Quantum-inspired Genetic Algorithms

Current Bioinformatics ◽

10.2174/1574893614666190916154103 ◽

2020 ◽

Vol 15 (2) ◽

pp. 135-143

Author(s):

Sha Shi ◽

Xin-Li Zhang ◽

Le Yang ◽

Wei Du ◽

Xian-Li Zhao ◽

...

Keyword(s):

Quantum Computing ◽

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Minimum Free Energy ◽

Rna Sequences ◽

New Strategy

Background: The prediction of RNA secondary structure using optimization algorithms is key to understand the real structure of an RNA. Evolutionary algorithms (EAs) are popular strategies for RNA secondary structure prediction. However, compared to most state-of-the-art software based on DPAs, the performances of EAs are a bit far from satisfactory. Objective: Therefore, a more powerful strategy is required to improve the performances of EAs when applied to the prediciton of RNA secondary structures. Methods: The idea of quantum computing is introduced here yielding a new strategy to find all possible legal paired-bases with the constraint of minimum free energy. The sate of a stem pool with size N is encoded as a population of QGA, which is represented by N quantum bits but not classical bits. The updating of populations is accomplished by so-called quantum crossover operations, quantum mutation operations and quantum rotation operations. Results: The numerical results show that the performances of traditional EAs are significantly improved by using QGA with regard to not only prediction accuracy and sensitivity but also complexity. Moreover, for RNA sequences with middle-short length, QGA even improves the state-of-art software based on DPAs in terms of both prediction accuracy and sensitivity. Conclusion: This work sheds an interesting light on the applications of quantum computing on RNA structure prediction.

Download Full-text

Finding recurrent RNA structural networks with fast maximal common subgraphs of edge-colored graphs

10.1101/2020.02.02.930453 ◽

2020 ◽

Author(s):

Antoine Soulé ◽

Vladimir Reinharz ◽

Roman Sarrazin-Gendron ◽

Alain Denise ◽

Jérôme Waldispühl

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Large Scale ◽

Tertiary Structure ◽

Future Research ◽

Rna Structures ◽

Base Pairs ◽

Colored Graphs ◽

Canonical Base ◽

Main Challenge

AbstractMotivationsRNA tertiary structure is crucial to its many non-coding molecular functions. RNA architecture is shaped by its secondary structure composed of stems, stacked canonical base pairs, enclosing loops. While stems are captured by free-energy models, loops composed of non-canonical base pairs are not. Nor are distant interactions linking together those secondary structure elements (SSEs). Databases of conserved 3D geometries (a.k.a. modules) not captured by energetic models are lever-aged for structure prediction and design, but the computational complexity has limited their study to local elements, loops, and recently to those covering pairs of SSEs. Systematically capturing recurrent patterns on a large scale is a main challenge in the study of RNA structures.ResultsIn this paper, we present an efficient algorithm to compute maximal isomorphisms in edge colored graphs. This framework is well suited to RNA structures and allows us to generalize previous approaches. In particular, we apply our techniques to find for the first time modules spanning more than 2 SSEs, while improving speed a hundredfold. We extract all recurrent base pair networks among all non-redundant RNA tertiary structures and identify a module connecting 36 different SSEs common to the 23S ribosome of E. Coli and Thermus thermophilus. We organize this information as a hierarchy of modules sharing similarities in their structure, which can serve as a basis for future research on the emergence of structural patterns.Availabilityhttp://csb.cs.mcgill.ca/carnaval2

Download Full-text

Statistical Mechanical Prediction of Ligand Perturbation to RNA Secondary Structure and Application to the SAM-I Riboswitch

10.1101/461749 ◽

2018 ◽

Author(s):

Osama Alaidi ◽

Fareed Aboul-ela

Keyword(s):

Secondary Structure ◽

Ligand Binding ◽

Rna Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Rna Folding ◽

Tertiary Structure ◽

Secondary Structure Prediction ◽

Key Factor ◽

Statistical Mechanical

ABSTRACTThe realization that non protein-coding RNA (ncRNA) is implicated in an increasing number of cellular processes, many related to human disease, makes it imperative to understand and predict RNA folding. RNA secondary structure prediction is more tractable than tertiary structure or protein structure. Yet insights into RNA structure-function relationships are complicated by coupling between RNA folding and ligand binding. Here, we introduce a simple statistical mechanical formalism to calculate perturbations to equilibrium secondary structure conformational distributions for RNA, in the presence of bound cognate ligands. For the first time, this formalism incorporates a key factor in coupling ligand binding to RNA conformation: the differential affinity of the ligand for a range of RNA-folding intermediates. We apply the approach to the SAM-I riboswitch, for which binding data is available for analogs of intermediate secondary structure conformers. Calculations of equilibrium secondary structure distributions during the transcriptional “decision window” predict subtle shifts due to the ligand, rather than an on/off switch. The results suggest how ligand perturbation can release a kinetic block to the formation of a terminator hairpin in the full-length riboswitch. Such predictions identify aspects of folding that are most affected by ligand binding, and can readily be compared with experiment.

Download Full-text

RNAProbe: a web server for normalization and analysis of RNA structure probing data

Nucleic Acids Research ◽

10.1093/nar/gkaa396 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W292-W299 ◽

Cited By ~ 2

Author(s):

Tomasz K Wirecki ◽

Katarzyna Merdas ◽

Agata Bernat ◽

Michał J Boniecki ◽

Janusz M Bujnicki ◽

...

Keyword(s):

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Structural Characteristics ◽

Web Server ◽

Rna Molecules ◽

Chemical Probing ◽

Structure Probing ◽

Low Pass

Abstract RNA molecules play key roles in all living cells. Knowledge of the structural characteristics of RNA molecules allows for a better understanding of the mechanisms of their action. RNA chemical probing allows us to study the susceptibility of nucleotides to chemical modification, and the information obtained can be used to guide secondary structure prediction. These experimental results can be analyzed using various computational tools, which, however, requires additional, tedious steps (e.g., further normalization of the reactivities and visualization of the results), for which there are no fully automated methods. Here, we introduce RNAProbe, a web server that facilitates normalization, analysis, and visualization of the low-pass SHAPE, DMS and CMCT probing results with the modification sites detected by capillary electrophoresis. RNAProbe automatically analyzes chemical probing output data and turns tedious manual work into a one-minute assignment. RNAProbe performs normalization based on a well-established protocol, utilizes recognized secondary structure prediction methods, and generates high-quality images with structure representations and reactivity heatmaps. It summarizes the results in the form of a spreadsheet, which can be used for comparative analyses between experiments. Results of predictions with normalized reactivities are also collected in text files, providing interoperability with bioinformatics workflows. RNAProbe is available at https://rnaprobe.genesilico.pl.

Download Full-text

An efficient simulated annealing algorithm for the RNA secondary structure prediction with Pseudoknots

BMC Genomics ◽

10.1186/s12864-019-6300-2 ◽

2019 ◽

Vol 20 (S13) ◽

Cited By ~ 1

Author(s):

Zhang Kai ◽

Wang Yuting ◽

Lv Yulin ◽

Liu Jun ◽

He Juanjuan

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Simulated Annealing Algorithm ◽

Secondary Structure Prediction ◽

Stem Length ◽

Base Pairs ◽

Rna Secondary Structure Prediction ◽

Prediction Algorithms

Abstract Background RNA pseudoknot structures play an important role in biological processes. However, existing RNA secondary structure prediction algorithms cannot predict the pseudoknot structure efficiently. Although random matching can improve the number of base pairs, these non-consecutive base pairs cannot make contributions to reduce the free energy. Result In order to improve the efficiency of searching procedure, our algorithm take consecutive base pairs as the basic components. Firstly, our algorithm calculates and archive all the consecutive base pairs in triplet data structure, if the number of consecutive base pairs is greater than given minimum stem length. Secondly, the annealing schedule is adapted to select the optimal solution that has minimum free energy. Finally, the proposed algorithm is evaluated with the real instances in PseudoBase. Conclusion The experimental results have been demonstrated to provide a competitive and oftentimes better performance when compared against some chosen state-of-the-art RNA structure prediction algorithms.

Download Full-text

A Novel Framework Based on ACO and PSO for RNA Secondary Structure Prediction

Mathematical Problems in Engineering ◽

10.1155/2013/796304 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Gang Wang ◽

Wen-yi Zhang ◽

Qiao Ning ◽

Hui-ling Chen

Keyword(s):

Secondary Structure ◽

Rna Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Genetic Diseases ◽

New Drugs ◽

Accuracy Rate ◽

Rna Secondary Structure Prediction ◽

Comparison Results

Prediction of RNA structure is a useful process for creating new drugs and understanding genetic diseases. In this paper, we proposed a particle swarm optimization (PSO) and ant colony optimization (ACO) based framework (PAF) for RNA secondary structure prediction. PAF consists of crucial stem searching (CSS) and global sequence building (GSB). In CSS, a modified ACO (MACO) is used to search the crucial stems, and then a set of stems are generated. In GSB, we used a modified PSO (MPSO) to construct all the stems in one sequence. We evaluated the performance of PAF on ten sequences, which have length from 122 to 1494. We also compared the performance of PAF with the results obtained from six existing well-known methods, SARNA-Predict, RnaPredict, ACRNA, PSOfold, IPSO, and mfold. The comparison results show that PAF could not only predict structures with higher accuracy rate but also find crucial stems.

Download Full-text

Machine learning a model for RNA structure prediction

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa090 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

Nicola Calonaci ◽

Alisha Jones ◽

Francesca Cuturello ◽

Michael Sattler ◽

Giovanni Bussi

Keyword(s):

Free Energy ◽

Rna Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Structural Information ◽

Minimum Free Energy ◽

Experimental Information ◽

Rna Structures ◽

Chemical Probing ◽

Validation Set

Abstract RNA function crucially depends on its structure. Thermodynamic models currently used for secondary structure prediction rely on computing the partition function of folding ensembles, and can thus estimate minimum free-energy structures and ensemble populations. These models sometimes fail in identifying native structures unless complemented by auxiliary experimental data. Here, we build a set of models that combine thermodynamic parameters, chemical probing data (DMS and SHAPE) and co-evolutionary data (direct coupling analysis) through a network that outputs perturbations to the ensemble free energy. Perturbations are trained to increase the ensemble populations of a representative set of known native RNA structures. In the chemical probing nodes of the network, a convolutional window combines neighboring reactivities, enlightening their structural information content and the contribution of local conformational ensembles. Regularization is used to limit overfitting and improve transferability. The most transferable model is selected through a cross-validation strategy that estimates the performance of models on systems on which they are not trained. With the selected model we obtain increased ensemble populations for native structures and more accurate predictions in an independent validation set. The flexibility of the approach allows the model to be easily retrained and adapted to incorporate arbitrary experimental information.

Download Full-text