ASYMPTOTICS OF CANONICAL AND SATURATED RNA SECONDARY STRUCTURES

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366 · n-3/2 · 2.618034n. In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures — canonical and saturated structures. Canonical secondary structures are defined to have no lonely (isolated) base pairs. This class of secondary structures was introduced by Bompfünewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up, by proving that the asymptotic number of canonical RNA secondary structures is 2.1614 · n-3/2 · 1.96798n and that the expected number of base pairs in a canonical secondary structure is 0.31724 · n. The asymptotic number of canonical secondary structures was obtained much earlier by Hofacker, Schuster and Stadler using a different method. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we show that the asymptotic number of saturated structures is 1.07427 · n-3/2 · 2.35467n, the asymptotic expected number of base pairs is 0.337361 · n, and the asymptotic number of saturated stem-loop structures is 0.323954 · 1.69562n, in contrast to the number 2n - 2 of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we apply the work of Drmota to show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs is 0.340633 · n.

Download Full-text

RNA secondary structure prediction using deep learning with thermodynamic integration

10.1101/2020.08.10.244442 ◽

2020 ◽

Author(s):

Kengo Sato ◽

Manato Akiyama ◽

Yasubumi Sakakibara

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Thermodynamic Integration ◽

Rna Secondary Structure Prediction ◽

Rna Secondary Structures ◽

Non Coding Rnas

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.

Download Full-text

Widespread selection for extremely high and low levels of secondary structure in coding sequences across all domains of life

Open Biology ◽

10.1098/rsob.190020 ◽

2019 ◽

Vol 9 (5) ◽

pp. 190020 ◽

Cited By ~ 3

Author(s):

Daniel Gebert ◽

Julia Jehn ◽

David Rosenkranz

Keyword(s):

Secondary Structure ◽

Gc Content ◽

Secondary Structures ◽

Coding Sequences ◽

Rna Secondary Structures ◽

Synonymous Mutations ◽

Codon Composition ◽

Domains Of Life ◽

Low Levels ◽

Selection For

Codon composition, GC content and local RNA secondary structures can have a profound effect on gene expression, and mutations affecting these parameters, even though they do not alter the protein sequence, are not neutral in terms of selection. Although evidence exists that, in some cases, selection favours more stable RNA secondary structures, we currently lack a concrete idea of how many genes are affected within a species, and whether this is a universal phenomenon in nature. We searched for signs of structural selection in a global manner, analysing a set of 1 million coding sequences from 73 species representing all domains of life, as well as viruses, by means of our newly developed software PACKEIS. We show that codon composition and amino acid identity are main determinants of RNA secondary structure. In addition, we show that the arrangement of synonymous codons within coding sequences is non-random, yielding extremely high, but also extremely low, RNA structuredness significantly more often than expected by chance. Taken together, we demonstrate that selection for high and low levels of secondary structure is a widespread phenomenon. Our results provide another line of evidence that synonymous mutations are less neutral than commonly thought, which is of importance for many evolutionary models.

Download Full-text

RNAvista: a webserver to assess RNA secondary structures with non-canonical base pairs

Bioinformatics ◽

10.1093/bioinformatics/bty609 ◽

2018 ◽

Vol 35 (1) ◽

pp. 152-155 ◽

Cited By ~ 5

Author(s):

Maciej Antczak ◽

Marcin Zablocki ◽

Tomasz Zok ◽

Agnieszka Rybarczyk ◽

Jacek Blazewicz ◽

...

Keyword(s):

Secondary Structures ◽

Base Pairs ◽

Rna Secondary Structures ◽

Canonical Base

Download Full-text

Secondary Structural Elements within the 3′ Untranslated Region of Mouse Hepatitis Virus Strain JHM Genomic RNA

Journal of Virology ◽

10.1128/jvi.75.24.12105-12113.2001 ◽

2001 ◽

Vol 75 (24) ◽

pp. 12105-12113 ◽

Cited By ~ 35

Author(s):

Qi Liu ◽

Reed F. Johnson ◽

Julian L. Leibowitz

Keyword(s):

Secondary Structure ◽

Protein Binding ◽

Hepatitis Virus ◽

Mouse Hepatitis Virus ◽

Rna Replication ◽

Secondary Structures ◽

Host Protein ◽

Wild Type ◽

Base Pairing ◽

Stem Loop

ABSTRACT Previously, we characterized two host protein binding elements located within the 3′-terminal 166 nucleotides of the mouse hepatitis virus (MHV) genome and assessed their functions in defective-interfering (DI) RNA replication. To determine the role of RNA secondary structures within these two host protein binding elements in viral replication, we explored the secondary structure of the 3′-terminal 166 nucleotides of the MHV strain JHM genome using limited RNase digestion assays. Our data indicate that multiple stem-loop and hairpin-loop structures exist within this region. Mutant and wild-type DIssEs were employed to test the function of secondary structure elements in DI RNA replication. Three stem structures were chosen as targets for the introduction of transversion mutations designed to destroy base pairing structures. Mutations predicted to destroy the base pairing of nucleotides 142 to 136 with nucleotides 68 to 74 exhibited a deleterious effect on DIssE replication. Destruction of base pairing between positions 96 to 99 and 116 to 113 also decreased DI RNA replication. Mutations interfering with the pairing of nucleotides 67 to 63 with nucleotides 52 to 56 had only minor effects on DIssE replication. The introduction of second complementary mutations which restored the predicted base pairing of positions 142 to 136 with 68 to 74 and nucleotides 96 to 99 with 116 to 113 largely ameliorated defects in replication ability, restoring DI RNA replication to levels comparable to that of wild-type DIssE RNA, suggesting that these secondary structures are important for efficient MHV replication. We also identified a conserved 23-nucleotide stem-loop structure involving nucleotides 142 to 132 and nucleotides 68 to 79. The upstream side of this conserved stem-loop is contained within a host protein binding element (nucleotides 166 to 129).

Download Full-text

PSRna: Prediction of small RNA secondary structures based on reverse complementary folding method

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016430010 ◽

2016 ◽

Vol 14 (04) ◽

pp. 1643001 ◽

Cited By ~ 1

Author(s):

Jin Li ◽

Chengzhen Xu ◽

Lei Wang ◽

Hong Liang ◽

Weixing Feng ◽

...

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Small Rna ◽

Small Rnas ◽

Dynamic Programming Algorithm ◽

Real Data ◽

Secondary Structures ◽

Minimum Free Energy ◽

Programming Algorithm ◽

Rna Secondary Structures

Prediction of RNA secondary structures is an important problem in computational biology and bioinformatics, since RNA secondary structures are fundamental for functional analysis of RNA molecules. However, small RNA secondary structures are scarce and few algorithms have been specifically designed for predicting the secondary structures of small RNAs. Here we propose an algorithm named “PSRna” for predicting small-RNA secondary structures using reverse complementary folding and characteristic hairpin loops of small RNAs. Unlike traditional algorithms that usually generate multi-branch loops and 5[Formula: see text] end self-folding, PSRna first estimated the maximum number of base pairs of RNA secondary structures based on the dynamic programming algorithm and a path matrix is constructed at the same time. Second, the backtracking paths are extracted from the path matrix based on backtracking algorithm, and each backtracking path represents a secondary structure. To improve accuracy, the predicted RNA secondary structures are filtered based on their free energy, where only the secondary structure with the minimum free energy was identified as the candidate secondary structure. Our experiments on real data show that the proposed algorithm is superior to two popular methods, RNAfold and RNAstructure, in terms of sensitivity, specificity and Matthews correlation coefficient (MCC).

Download Full-text

Visualization of RNA secondary structure with pseudoknots

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691321500363 ◽

2021 ◽

pp. 2150036

Author(s):

Lina Yang ◽

Yang Liu ◽

Huiwu Luo ◽

Xichun Li ◽

Yuan Yan Tang

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Large Scale ◽

Secondary Structures ◽

Combination Method ◽

Joint Analysis ◽

Discrete Wavelet ◽

Visualization Method ◽

Number Representation ◽

Rna Secondary Structures

The function of pseudoknots cannot be ignored in the RNA secondary structure. Existing methods for analyzing RNA secondary structures with pseudoknots exhibit many shortcomings. This paper presents a novel RNA secondary structure visualization method in the case of a joint analysis of RNA primary structures and secondary structures. The way is based on the page number representation of the RNA secondary structure. It innovatively uses five vectors to represent bases, which are sequentially connected to outline the characteristics of the RNA secondary structure. The method covers almost all the constituent elements of the RNA secondary structure and extracts features completely. Experiments are based on the available techniques for large-scale annotation of RNA secondary structures, using a combination method of discrete wavelet transform and fractal dimension. The classification effect is compared with the previous RNA secondary structure representation methods. Experimental results show that the RNA secondary structure visualization method proposed in this paper has good application prospects in RNA secondary structure classification.

Download Full-text

Asymptotic Number of Hairpins of Saturated RNA Secondary Structures

Bulletin of Mathematical Biology ◽

10.1007/s11538-013-9899-1 ◽

2013 ◽

Vol 75 (12) ◽

pp. 2410-2430

Author(s):

Peter Clote ◽

Evangelos Kranakis ◽

Danny Krizanc

Keyword(s):

Secondary Structures ◽

Rna Secondary Structures ◽

Asymptotic Number

Download Full-text

RNA In Silico The Computational Biology of RNA Secondary Structures

Advances in Complex Systems ◽

10.1142/s0219525999000059 ◽

1999 ◽

Vol 02 (01) ◽

pp. 65-90 ◽

Cited By ~ 19

Author(s):

Chirstoph Flamm ◽

Ivo L. Hofacker ◽

Peter F. Stadler

Keyword(s):

Secondary Structure ◽

Fitness Landscape ◽

Rna Virus ◽

Point Mutations ◽

Secondary Structures ◽

Hill Climbing ◽

Stable States ◽

Punctuated Equilibria ◽

Rna Secondary Structures

RNA secondary structures provide a unique computer model for investigating the most important aspects of structural and evolutionary biology. The existence of efficient algorithms for solving the folding problem, i.e., for predicting the secondary structure given only the sequence, allows the construction of realistic computer simulations. The notion of a "landscape" underlies both the structure formation (folding) and the (in vitro) evolution of RNA. Evolutionary adaptation may be seen as hill climbing process on a fitness landscape which is determined by the phenotype of the RNA molecule (within the model this is its secondary structure) and the selection constraints acting on the molecules. We find that a substantial fraction of point mutations do not change an RNA secondary structure. On the other hand, a comparable fraction of mutations leads to very different structures. This interplay of smoothness and ruggedness (or robustness and sensitivity) is a generic feature of both RNA and protein sequence-structure maps. Its consequences, "shape space covering" and "neutral networks" are inherited by the fitness landscapes and determine the dynamics of RNA evolution. Punctuated equilibria at phenotype level and a diffusion like evolution of the underlying genotypes are a characteristics feature of such models. As a practical application of these theoretical findings we have designed an algorithm that finds conserved (and therefore potentially functional substructures of RNA virus genomes from spares data sets. The folding dynamics of particular RNA molecule can also be studied successfully based on secondary structure. Given an RNA sequence, we consider the energy landscape formed by all possible conformations (secondary structures). A straight formward implementation of the Metropolis algorithm is sufficient to produce a quite realistic folding kinetics, allowing to identify meta-stable states and folding pathways. Just as in the protein case there are good and bad folders which can be distinguished by the properties of their landscapes.

Download Full-text

P-DCFOLD OR HOW TO PREDICT ALL KINDS OF PSEUDOKNOTS IN RNA SECONDARY STRUCTURES

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300500234x ◽

2005 ◽

Vol 14 (05) ◽

pp. 703-716 ◽

Cited By ~ 2

Author(s):

FARIZA TAHI ◽

ENGELEN STEFAN ◽

MIREILLE REGNIER

Keyword(s):

Structure Prediction ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Divide And Conquer ◽

Comparative Approach ◽

Rna Sequences ◽

Rna Secondary Structure Prediction ◽

Rna Secondary Structures ◽

Knowing That ◽

Definition Of

Pseudoknots play important roles in many RNAs. But for computational reasons, pseudoknots are usually excluded from the definition of RNA secondary structures. Indeed, prediction of pseudoknots increase very highly the complexities in time of the algorithms, knowing that all existing algorithms for RNA secondary structure prediction have complexities at least of O(n3). Some algorithms have been developed for searching pseudoknots, but all of them have very high complexities, and consider generally particular kinds of pseudoknots. We present an algorithm, called P-DCFold based on the comparative approach, for the prediction of RNA secondary structures including all kinds of pseudoknots. The helices are searched recursively using the "Divide and Conquer" approach, searching the helices from the "most significant" to the "less significant". A selected helix subdivide the sequence into two sub-sequences, the internal one and a concatenation of the two externals. This approach is used to search non-interleaved helices and allows to limit the space of searching. To search for pseudoknots, the processing is reiterated. Therefore, each helix of the pseudoknot is selected in a different step. P-DCFold has been applied to several RNA sequences. In less than two seconds, their respective secondary structures, including their pseudoknots, have been recovered very efficiently.

Download Full-text