scholarly journals An algebraic representation for tree alignment of RNA pseudoknotted structures

Author(s):  
Michela Quadrini ◽  
Luca Tesei ◽  
Emanuela Merelli ◽  

The methods proposed in the literature for RNA comparison focus mainly on pseudoknot free structures. The comparison of pseudoknotted structures is still a challenge. In this work, we propose a new algebraic representation of RNA secondary structures based on relations among hairpins in terms of nesting, crossing, and concatenation. Such algebraic representation is obtained from a defined multiple context-free grammar, which maps any kind of RNA secondary structures into extended trees, i.e., ordered trees where internal nodes are labeled with algebraic operators and leaves are labeled with loops. These extended trees permit the definition of the RNA secondary structure comparison as a tree alignment problem.

2017 ◽  
Author(s):  
Michela Quadrini ◽  
Luca Tesei ◽  
Emanuela Merelli ◽  

The methods proposed in the literature for RNA comparison focus mainly on pseudoknot free structures. The comparison of pseudoknotted structures is still a challenge. In this work, we propose a new algebraic representation of RNA secondary structures based on relations among hairpins in terms of nesting, crossing, and concatenation. Such algebraic representation is obtained from a defined multiple context-free grammar, which maps any kind of RNA secondary structures into extended trees, i.e., ordered trees where internal nodes are labeled with algebraic operators and leaves are labeled with loops. These extended trees permit the definition of the RNA secondary structure comparison as a tree alignment problem.


2012 ◽  
Vol 2012 ◽  
pp. 1-5 ◽  
Author(s):  
Julien Allali ◽  
Cédric Saule ◽  
Cédric Chauve ◽  
Yves d’Aubenton-Carafa ◽  
Alain Denise ◽  
...  

The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.


2009 ◽  
Vol 07 (05) ◽  
pp. 869-893 ◽  
Author(s):  
PETER CLOTE ◽  
EVANGELOS KRANAKIS ◽  
DANNY KRIZANC ◽  
BRUNO SALVY

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366 · n-3/2 · 2.618034n. In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures — canonical and saturated structures. Canonical secondary structures are defined to have no lonely (isolated) base pairs. This class of secondary structures was introduced by Bompfünewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up, by proving that the asymptotic number of canonical RNA secondary structures is 2.1614 · n-3/2 · 1.96798n and that the expected number of base pairs in a canonical secondary structure is 0.31724 · n. The asymptotic number of canonical secondary structures was obtained much earlier by Hofacker, Schuster and Stadler using a different method. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we show that the asymptotic number of saturated structures is 1.07427 · n-3/2 · 2.35467n, the asymptotic expected number of base pairs is 0.337361 · n, and the asymptotic number of saturated stem-loop structures is 0.323954 · 1.69562n, in contrast to the number 2n - 2 of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we apply the work of Drmota to show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs is 0.340633 · n.


2005 ◽  
Vol 14 (05) ◽  
pp. 703-716 ◽  
Author(s):  
FARIZA TAHI ◽  
ENGELEN STEFAN ◽  
MIREILLE REGNIER

Pseudoknots play important roles in many RNAs. But for computational reasons, pseudoknots are usually excluded from the definition of RNA secondary structures. Indeed, prediction of pseudoknots increase very highly the complexities in time of the algorithms, knowing that all existing algorithms for RNA secondary structure prediction have complexities at least of O(n3). Some algorithms have been developed for searching pseudoknots, but all of them have very high complexities, and consider generally particular kinds of pseudoknots. We present an algorithm, called P-DCFold based on the comparative approach, for the prediction of RNA secondary structures including all kinds of pseudoknots. The helices are searched recursively using the "Divide and Conquer" approach, searching the helices from the "most significant" to the "less significant". A selected helix subdivide the sequence into two sub-sequences, the internal one and a concatenation of the two externals. This approach is used to search non-interleaved helices and allows to limit the space of searching. To search for pseudoknots, the processing is reiterated. Therefore, each helix of the pseudoknot is selected in a different step. P-DCFold has been applied to several RNA sequences. In less than two seconds, their respective secondary structures, including their pseudoknots, have been recovered very efficiently.


2018 ◽  
Vol 29 (05) ◽  
pp. 741-767 ◽  
Author(s):  
Cedric Chauve ◽  
Julien Courtiel ◽  
Yann Ponty

Pairwise ordered tree alignment are combinatorial objects that appear in important applications, such as RNA secondary structure comparison. However, the usual representation of tree alignments as supertrees is ambiguous, i.e. two distinct supertrees may induce identical sets of matches between identical pairs of trees. This ambiguity is uninformative, and detrimental to any probabilistic analysis. In this work, we consider tree alignments up to equivalence. Our first result is a precise asymptotic enumeration of tree alignments, obtained from a context-free grammar by mean of basic analytic combinatorics. Our second result focuses on alignments between two given ordered trees [Formula: see text] and [Formula: see text]. By refining our grammar to align specific trees, we obtain a decomposition scheme for the space of alignments, and use it to design an efficient dynamic programming algorithm for sampling alignments under the Gibbs-Boltzmann probability distribution. This generalizes existing tree alignment algorithms, and opens the door for a probabilistic analysis of the space of suboptimal alignments.


10.29007/bhsr ◽  
2020 ◽  
Author(s):  
Mutlu Mete ◽  
Abdullah Arslan

This study is part of our perpetual effort to develop improved RNA secondary structure analysis tools and databases. In this work we present a new Graphical Processing Unit (GPU)-based RNA structural analysis framework that supports fast multiple RNA secondary structure comparison for very large databases. A search-based secondary structure comparison algorithm deployed in RNASSAC website helps bioinformaticians find common RNA substructures from the underlying database. The algorithm performs two levels of binary searches on the database. Its time requirement is affected by the database size. Experiments on the RNASSAC website show that the algorithm takes seconds for a database of 4,666 RNAs. For example, it takes about 4.4 sec for comparing 25 RNAs from this database. In another case, when many non-overlapping common substructures are desired, a heuristic approach requires as long as 85 sec in comparing 40 RNAs from the same database. The comparisons by this sequential algorithm takes at least 50% more time when RNAs are compared from the database of several millions of RNAs. The most recently curated databases already have millions of RNA secondary structures. The improvement in run-time performance of comparison algorithms is necessary. This study present a GPU-based RNA substructure comparison algorithm with which running time for multiple RNA secondary structures remains feasible for large databases. Our new parallel algorithm is 12 times faster than the CPU version (sequential) comparison algorithm of the RNASSAC website. The response time significantly reduces towards development of a realtime RNA comparison web service for bioinformatics community.


2003 ◽  
Vol 47 (1) ◽  
pp. 1-22 ◽  
Author(s):  
Jaume Casasnovas ◽  
Joe Miro-Julia ◽  
Francesc Rosselló

Sign in / Sign up

Export Citation Format

Share Document