scholarly journals ASYMPTOTICS OF CANONICAL AND SATURATED RNA SECONDARY STRUCTURES

2009 ◽  
Vol 07 (05) ◽  
pp. 869-893 ◽  
Author(s):  
PETER CLOTE ◽  
EVANGELOS KRANAKIS ◽  
DANNY KRIZANC ◽  
BRUNO SALVY

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366 · n-3/2 · 2.618034n. In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures — canonical and saturated structures. Canonical secondary structures are defined to have no lonely (isolated) base pairs. This class of secondary structures was introduced by Bompfünewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up, by proving that the asymptotic number of canonical RNA secondary structures is 2.1614 · n-3/2 · 1.96798n and that the expected number of base pairs in a canonical secondary structure is 0.31724 · n. The asymptotic number of canonical secondary structures was obtained much earlier by Hofacker, Schuster and Stadler using a different method. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we show that the asymptotic number of saturated structures is 1.07427 · n-3/2 · 2.35467n, the asymptotic expected number of base pairs is 0.337361 · n, and the asymptotic number of saturated stem-loop structures is 0.323954 · 1.69562n, in contrast to the number 2n - 2 of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we apply the work of Drmota to show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs is 0.340633 · n.

2020 ◽  
Author(s):  
Kengo Sato ◽  
Manato Akiyama ◽  
Yasubumi Sakakibara

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.


Open Biology ◽  
2019 ◽  
Vol 9 (5) ◽  
pp. 190020 ◽  
Author(s):  
Daniel Gebert ◽  
Julia Jehn ◽  
David Rosenkranz

Codon composition, GC content and local RNA secondary structures can have a profound effect on gene expression, and mutations affecting these parameters, even though they do not alter the protein sequence, are not neutral in terms of selection. Although evidence exists that, in some cases, selection favours more stable RNA secondary structures, we currently lack a concrete idea of how many genes are affected within a species, and whether this is a universal phenomenon in nature. We searched for signs of structural selection in a global manner, analysing a set of 1 million coding sequences from 73 species representing all domains of life, as well as viruses, by means of our newly developed software PACKEIS. We show that codon composition and amino acid identity are main determinants of RNA secondary structure. In addition, we show that the arrangement of synonymous codons within coding sequences is non-random, yielding extremely high, but also extremely low, RNA structuredness significantly more often than expected by chance. Taken together, we demonstrate that selection for high and low levels of secondary structure is a widespread phenomenon. Our results provide another line of evidence that synonymous mutations are less neutral than commonly thought, which is of importance for many evolutionary models.


2018 ◽  
Vol 35 (1) ◽  
pp. 152-155 ◽  
Author(s):  
Maciej Antczak ◽  
Marcin Zablocki ◽  
Tomasz Zok ◽  
Agnieszka Rybarczyk ◽  
Jacek Blazewicz ◽  
...  

2001 ◽  
Vol 75 (24) ◽  
pp. 12105-12113 ◽  
Author(s):  
Qi Liu ◽  
Reed F. Johnson ◽  
Julian L. Leibowitz

ABSTRACT Previously, we characterized two host protein binding elements located within the 3′-terminal 166 nucleotides of the mouse hepatitis virus (MHV) genome and assessed their functions in defective-interfering (DI) RNA replication. To determine the role of RNA secondary structures within these two host protein binding elements in viral replication, we explored the secondary structure of the 3′-terminal 166 nucleotides of the MHV strain JHM genome using limited RNase digestion assays. Our data indicate that multiple stem-loop and hairpin-loop structures exist within this region. Mutant and wild-type DIssEs were employed to test the function of secondary structure elements in DI RNA replication. Three stem structures were chosen as targets for the introduction of transversion mutations designed to destroy base pairing structures. Mutations predicted to destroy the base pairing of nucleotides 142 to 136 with nucleotides 68 to 74 exhibited a deleterious effect on DIssE replication. Destruction of base pairing between positions 96 to 99 and 116 to 113 also decreased DI RNA replication. Mutations interfering with the pairing of nucleotides 67 to 63 with nucleotides 52 to 56 had only minor effects on DIssE replication. The introduction of second complementary mutations which restored the predicted base pairing of positions 142 to 136 with 68 to 74 and nucleotides 96 to 99 with 116 to 113 largely ameliorated defects in replication ability, restoring DI RNA replication to levels comparable to that of wild-type DIssE RNA, suggesting that these secondary structures are important for efficient MHV replication. We also identified a conserved 23-nucleotide stem-loop structure involving nucleotides 142 to 132 and nucleotides 68 to 79. The upstream side of this conserved stem-loop is contained within a host protein binding element (nucleotides 166 to 129).


2016 ◽  
Vol 14 (04) ◽  
pp. 1643001 ◽  
Author(s):  
Jin Li ◽  
Chengzhen Xu ◽  
Lei Wang ◽  
Hong Liang ◽  
Weixing Feng ◽  
...  

Prediction of RNA secondary structures is an important problem in computational biology and bioinformatics, since RNA secondary structures are fundamental for functional analysis of RNA molecules. However, small RNA secondary structures are scarce and few algorithms have been specifically designed for predicting the secondary structures of small RNAs. Here we propose an algorithm named “PSRna” for predicting small-RNA secondary structures using reverse complementary folding and characteristic hairpin loops of small RNAs. Unlike traditional algorithms that usually generate multi-branch loops and 5[Formula: see text] end self-folding, PSRna first estimated the maximum number of base pairs of RNA secondary structures based on the dynamic programming algorithm and a path matrix is constructed at the same time. Second, the backtracking paths are extracted from the path matrix based on backtracking algorithm, and each backtracking path represents a secondary structure. To improve accuracy, the predicted RNA secondary structures are filtered based on their free energy, where only the secondary structure with the minimum free energy was identified as the candidate secondary structure. Our experiments on real data show that the proposed algorithm is superior to two popular methods, RNAfold and RNAstructure, in terms of sensitivity, specificity and Matthews correlation coefficient (MCC).


Author(s):  
Lina Yang ◽  
Yang Liu ◽  
Huiwu Luo ◽  
Xichun Li ◽  
Yuan Yan Tang

The function of pseudoknots cannot be ignored in the RNA secondary structure. Existing methods for analyzing RNA secondary structures with pseudoknots exhibit many shortcomings. This paper presents a novel RNA secondary structure visualization method in the case of a joint analysis of RNA primary structures and secondary structures. The way is based on the page number representation of the RNA secondary structure. It innovatively uses five vectors to represent bases, which are sequentially connected to outline the characteristics of the RNA secondary structure. The method covers almost all the constituent elements of the RNA secondary structure and extracts features completely. Experiments are based on the available techniques for large-scale annotation of RNA secondary structures, using a combination method of discrete wavelet transform and fractal dimension. The classification effect is compared with the previous RNA secondary structure representation methods. Experimental results show that the RNA secondary structure visualization method proposed in this paper has good application prospects in RNA secondary structure classification.


2013 ◽  
Vol 75 (12) ◽  
pp. 2410-2430
Author(s):  
Peter Clote ◽  
Evangelos Kranakis ◽  
Danny Krizanc

1999 ◽  
Vol 02 (01) ◽  
pp. 65-90 ◽  
Author(s):  
Chirstoph Flamm ◽  
Ivo L. Hofacker ◽  
Peter F. Stadler

RNA secondary structures provide a unique computer model for investigating the most important aspects of structural and evolutionary biology. The existence of efficient algorithms for solving the folding problem, i.e., for predicting the secondary structure given only the sequence, allows the construction of realistic computer simulations. The notion of a "landscape" underlies both the structure formation (folding) and the (in vitro) evolution of RNA. Evolutionary adaptation may be seen as hill climbing process on a fitness landscape which is determined by the phenotype of the RNA molecule (within the model this is its secondary structure) and the selection constraints acting on the molecules. We find that a substantial fraction of point mutations do not change an RNA secondary structure. On the other hand, a comparable fraction of mutations leads to very different structures. This interplay of smoothness and ruggedness (or robustness and sensitivity) is a generic feature of both RNA and protein sequence-structure maps. Its consequences, "shape space covering" and "neutral networks" are inherited by the fitness landscapes and determine the dynamics of RNA evolution. Punctuated equilibria at phenotype level and a diffusion like evolution of the underlying genotypes are a characteristics feature of such models. As a practical application of these theoretical findings we have designed an algorithm that finds conserved (and therefore potentially functional substructures of RNA virus genomes from spares data sets. The folding dynamics of particular RNA molecule can also be studied successfully based on secondary structure. Given an RNA sequence, we consider the energy landscape formed by all possible conformations (secondary structures). A straight formward implementation of the Metropolis algorithm is sufficient to produce a quite realistic folding kinetics, allowing to identify meta-stable states and folding pathways. Just as in the protein case there are good and bad folders which can be distinguished by the properties of their landscapes.


2005 ◽  
Vol 14 (05) ◽  
pp. 703-716 ◽  
Author(s):  
FARIZA TAHI ◽  
ENGELEN STEFAN ◽  
MIREILLE REGNIER

Pseudoknots play important roles in many RNAs. But for computational reasons, pseudoknots are usually excluded from the definition of RNA secondary structures. Indeed, prediction of pseudoknots increase very highly the complexities in time of the algorithms, knowing that all existing algorithms for RNA secondary structure prediction have complexities at least of O(n3). Some algorithms have been developed for searching pseudoknots, but all of them have very high complexities, and consider generally particular kinds of pseudoknots. We present an algorithm, called P-DCFold based on the comparative approach, for the prediction of RNA secondary structures including all kinds of pseudoknots. The helices are searched recursively using the "Divide and Conquer" approach, searching the helices from the "most significant" to the "less significant". A selected helix subdivide the sequence into two sub-sequences, the internal one and a concatenation of the two externals. This approach is used to search non-interleaved helices and allows to limit the space of searching. To search for pseudoknots, the processing is reiterated. Therefore, each helix of the pseudoknot is selected in a different step. P-DCFold has been applied to several RNA sequences. In less than two seconds, their respective secondary structures, including their pseudoknots, have been recovered very efficiently.


2007 ◽  
Vol 155 (6-7) ◽  
pp. 759-787 ◽  
Author(s):  
Peter Clote ◽  
Evangelos Kranakis ◽  
Danny Krizanc ◽  
Ladislav Stacho

Sign in / Sign up

Export Citation Format

Share Document