Widespread selection for extremely high and low levels of secondary structure in coding sequences across all domains of life

Codon composition, GC content and local RNA secondary structures can have a profound effect on gene expression, and mutations affecting these parameters, even though they do not alter the protein sequence, are not neutral in terms of selection. Although evidence exists that, in some cases, selection favours more stable RNA secondary structures, we currently lack a concrete idea of how many genes are affected within a species, and whether this is a universal phenomenon in nature. We searched for signs of structural selection in a global manner, analysing a set of 1 million coding sequences from 73 species representing all domains of life, as well as viruses, by means of our newly developed software PACKEIS. We show that codon composition and amino acid identity are main determinants of RNA secondary structure. In addition, we show that the arrangement of synonymous codons within coding sequences is non-random, yielding extremely high, but also extremely low, RNA structuredness significantly more often than expected by chance. Taken together, we demonstrate that selection for high and low levels of secondary structure is a widespread phenomenon. Our results provide another line of evidence that synonymous mutations are less neutral than commonly thought, which is of importance for many evolutionary models.

Download Full-text

Widespread selection for high and low secondary structure in coding sequences across all domains of life

10.1101/524538 ◽

2019 ◽

Author(s):

Daniel Gebert ◽

Julia Jehn ◽

David Rosenkranz

Keyword(s):

Secondary Structure ◽

Gc Content ◽

Secondary Structures ◽

Amino Acid Identity ◽

Coding Sequences ◽

Rna Secondary Structures ◽

Synonymous Mutations ◽

Codon Composition ◽

Domains Of Life ◽

Selection For

AbstractCodon composition, GC-content and local RNA secondary structures can have a profound effect on gene expression and mutations affecting these parameters, even though they do not alter the protein sequence, are not neutral in terms of selection. Although evidence exists that in some cases selection favors more stable RNA secondary structures, we currently lack a concrete idea of how many genes are affected within a species, and if this is a universal phenomenon in nature.We searched for signs of structural selection in a global manner, analyzing a set of one million coding sequences from 73 species representing all domains of life, as well as viruses, by means of our newly developed software PACKEIS. We show that codon composition and amino acid identity are main determinants of RNA secondary structure. In addition, we show that the arrangement of synonymous codons within coding sequences is non-random, yielding extremely high, but also extremely low secondary structures significantly more often than expected by chance.Together, we demonstrate that selection for high and low secondary structure is a widespread phenomenon. Our results provide another line of evidence that synonymous mutations are less neutral than commonly thought, which is of importance for many evolutionary models.

Download Full-text

RNA secondary structure prediction using deep learning with thermodynamic integration

10.1101/2020.08.10.244442 ◽

2020 ◽

Author(s):

Kengo Sato ◽

Manato Akiyama ◽

Yasubumi Sakakibara

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Thermodynamic Integration ◽

Rna Secondary Structure Prediction ◽

Rna Secondary Structures ◽

Non Coding Rnas

RNA secondary structure prediction is one of the key technologies for revealing the essential roles of functional non-coding RNAs. Although machine learning-based rich-parametrized models have achieved extremely high performance in terms of prediction accuracy, the risk of overfitting for such models has been reported. In this work, we propose a new algorithm for predicting RNA secondary structures that uses deep learning with thermodynamic integration, thereby enabling robust predictions. Similar to our previous work, the folding scores, which are computed by a deep neural network, are integrated with traditional thermodynamic parameters to enable robust predictions. We also propose thermodynamic regularization for training our model without overfitting it to the training data. Our algorithm (MXfold2) achieved the most robust and accurate predictions in computational experiments designed for newly discovered non-coding RNAs, with significant 2–10 % improvements over our previous algorithm (MXfold) and standard algorithms for predicting RNA secondary structures in terms of F-value.

Download Full-text

PSRna: Prediction of small RNA secondary structures based on reverse complementary folding method

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720016430010 ◽

2016 ◽

Vol 14 (04) ◽

pp. 1643001 ◽

Cited By ~ 1

Author(s):

Jin Li ◽

Chengzhen Xu ◽

Lei Wang ◽

Hong Liang ◽

Weixing Feng ◽

...

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Small Rna ◽

Small Rnas ◽

Dynamic Programming Algorithm ◽

Real Data ◽

Secondary Structures ◽

Minimum Free Energy ◽

Programming Algorithm ◽

Rna Secondary Structures

Prediction of RNA secondary structures is an important problem in computational biology and bioinformatics, since RNA secondary structures are fundamental for functional analysis of RNA molecules. However, small RNA secondary structures are scarce and few algorithms have been specifically designed for predicting the secondary structures of small RNAs. Here we propose an algorithm named “PSRna” for predicting small-RNA secondary structures using reverse complementary folding and characteristic hairpin loops of small RNAs. Unlike traditional algorithms that usually generate multi-branch loops and 5[Formula: see text] end self-folding, PSRna first estimated the maximum number of base pairs of RNA secondary structures based on the dynamic programming algorithm and a path matrix is constructed at the same time. Second, the backtracking paths are extracted from the path matrix based on backtracking algorithm, and each backtracking path represents a secondary structure. To improve accuracy, the predicted RNA secondary structures are filtered based on their free energy, where only the secondary structure with the minimum free energy was identified as the candidate secondary structure. Our experiments on real data show that the proposed algorithm is superior to two popular methods, RNAfold and RNAstructure, in terms of sensitivity, specificity and Matthews correlation coefficient (MCC).

Download Full-text

Analysis of Messenger RNA Secondary Structures in Rhodobacter sphaeroides

10.21203/rs.3.rs-36110/v1 ◽

2020 ◽

Author(s):

Damilola Omotajo ◽

Hyuk Cho ◽

Madhusudan Choudhary

Keyword(s):

Secondary Structure ◽

Rhodobacter Sphaeroides ◽

Translation Initiation ◽

Messenger Rna ◽

Gc Content ◽

Secondary Structures ◽

Inhibitory Effect ◽

Nucleotide Composition ◽

Protein Coding ◽

Secondary Structure Analysis

Abstract Background: The Shine-Dalgarno (SD) sequence, when present, is known to promote translation initiation in a bacterial cell. However, the thermodynamic stability of the messenger RNA (mRNA) through its secondary structures has an inhibitory effect on the efficiency of translation. This poses the question of whether bacterial mRNAs with SD have low secondary structure formation or not. Results: About 3500 protein-coding genes in Rhodobacter sphaeroides were analyzed and a sliding window analysis of the last 100 nucleotides of the 5’ UTR and the first 100 nucleotides of ORFs was performed using RNAfold, a software for RNA secondary structure analysis. It was shown that mRNAs with SD are less stable than those without SD for genes located on the primary chromosome, but not for the plasmid encoded genes. Furthermore, mRNA stability is similar for genes within each chromosome except those encoded by the accessory chromosome (second chromosome). Conclusions: Results highlight the possible contribution of other factors like replicon- specific nucleotide composition (GC content), codon bias, and protein stability in determining the efficiency of translation initiation in both SD-dependent and SD-independent translation systems.

Download Full-text

Visualization of RNA secondary structure with pseudoknots

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691321500363 ◽

2021 ◽

pp. 2150036

Author(s):

Lina Yang ◽

Yang Liu ◽

Huiwu Luo ◽

Xichun Li ◽

Yuan Yan Tang

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Large Scale ◽

Secondary Structures ◽

Combination Method ◽

Joint Analysis ◽

Discrete Wavelet ◽

Visualization Method ◽

Number Representation ◽

Rna Secondary Structures

The function of pseudoknots cannot be ignored in the RNA secondary structure. Existing methods for analyzing RNA secondary structures with pseudoknots exhibit many shortcomings. This paper presents a novel RNA secondary structure visualization method in the case of a joint analysis of RNA primary structures and secondary structures. The way is based on the page number representation of the RNA secondary structure. It innovatively uses five vectors to represent bases, which are sequentially connected to outline the characteristics of the RNA secondary structure. The method covers almost all the constituent elements of the RNA secondary structure and extracts features completely. Experiments are based on the available techniques for large-scale annotation of RNA secondary structures, using a combination method of discrete wavelet transform and fractal dimension. The classification effect is compared with the previous RNA secondary structure representation methods. Experimental results show that the RNA secondary structure visualization method proposed in this paper has good application prospects in RNA secondary structure classification.

Download Full-text

RNA In Silico The Computational Biology of RNA Secondary Structures

Advances in Complex Systems ◽

10.1142/s0219525999000059 ◽

1999 ◽

Vol 02 (01) ◽

pp. 65-90 ◽

Cited By ~ 19

Author(s):

Chirstoph Flamm ◽

Ivo L. Hofacker ◽

Peter F. Stadler

Keyword(s):

Secondary Structure ◽

Fitness Landscape ◽

Rna Virus ◽

Point Mutations ◽

Secondary Structures ◽

Hill Climbing ◽

Stable States ◽

Punctuated Equilibria ◽

Rna Secondary Structures

RNA secondary structures provide a unique computer model for investigating the most important aspects of structural and evolutionary biology. The existence of efficient algorithms for solving the folding problem, i.e., for predicting the secondary structure given only the sequence, allows the construction of realistic computer simulations. The notion of a "landscape" underlies both the structure formation (folding) and the (in vitro) evolution of RNA. Evolutionary adaptation may be seen as hill climbing process on a fitness landscape which is determined by the phenotype of the RNA molecule (within the model this is its secondary structure) and the selection constraints acting on the molecules. We find that a substantial fraction of point mutations do not change an RNA secondary structure. On the other hand, a comparable fraction of mutations leads to very different structures. This interplay of smoothness and ruggedness (or robustness and sensitivity) is a generic feature of both RNA and protein sequence-structure maps. Its consequences, "shape space covering" and "neutral networks" are inherited by the fitness landscapes and determine the dynamics of RNA evolution. Punctuated equilibria at phenotype level and a diffusion like evolution of the underlying genotypes are a characteristics feature of such models. As a practical application of these theoretical findings we have designed an algorithm that finds conserved (and therefore potentially functional substructures of RNA virus genomes from spares data sets. The folding dynamics of particular RNA molecule can also be studied successfully based on secondary structure. Given an RNA sequence, we consider the energy landscape formed by all possible conformations (secondary structures). A straight formward implementation of the Metropolis algorithm is sufficient to produce a quite realistic folding kinetics, allowing to identify meta-stable states and folding pathways. Just as in the protein case there are good and bad folders which can be distinguished by the properties of their landscapes.

Download Full-text

Global importance of RNA secondary structures in protein-coding sequences

Bioinformatics ◽

10.1093/bioinformatics/bty678 ◽

2018 ◽

Vol 35 (4) ◽

pp. 579-583 ◽

Cited By ~ 9

Author(s):

Markus Fricke ◽

Ruman Gerst ◽

Bashar Ibrahim ◽

Michael Niepmann ◽

Manja Marz

Keyword(s):

Secondary Structures ◽

Protein Coding ◽

Coding Sequences ◽

Rna Secondary Structures

Download Full-text

Evolutionarily conserved RNA secondary structures in coding and non-coding sequences at the 3′ end of the hepatitis G virus/GB-virus C genome

Journal of General Virology ◽

10.1099/0022-1317-82-4-713 ◽

2001 ◽

Vol 82 (4) ◽

pp. 713-722 ◽

Cited By ~ 13

Author(s):

N. M. Cuceanu ◽

A. Tuplin ◽

P. Simmonds

Keyword(s):

Secondary Structures ◽

Coding Sequences ◽

Rna Secondary Structures ◽

Hepatitis G Virus ◽

Gb Virus C ◽

Evolutionarily Conserved ◽

C Genome ◽

Hepatitis G ◽

Virus C

Download Full-text

ASYMPTOTICS OF CANONICAL AND SATURATED RNA SECONDARY STRUCTURES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004333 ◽

2009 ◽

Vol 07 (05) ◽

pp. 869-893 ◽

Cited By ~ 11

Author(s):

PETER CLOTE ◽

EVANGELOS KRANAKIS ◽

DANNY KRIZANC ◽

BRUNO SALVY

Keyword(s):

Secondary Structure ◽

Secondary Structures ◽

Expected Number ◽

Greedy Method ◽

Base Pairs ◽

Stem Loop ◽

Rna Secondary Structures ◽

Speed Up ◽

Asymptotic Number ◽

Definition Of

It is a classical result of Stein and Waterman that the asymptotic number of RNA secondary structures is 1.104366 · n-3/2 · 2.618034n. In this paper, we study combinatorial asymptotics for two special subclasses of RNA secondary structures — canonical and saturated structures. Canonical secondary structures are defined to have no lonely (isolated) base pairs. This class of secondary structures was introduced by Bompfünewerer et al., who noted that the run time of Vienna RNA Package is substantially reduced when restricting computations to canonical structures. Here we provide an explanation for the speed-up, by proving that the asymptotic number of canonical RNA secondary structures is 2.1614 · n-3/2 · 1.96798n and that the expected number of base pairs in a canonical secondary structure is 0.31724 · n. The asymptotic number of canonical secondary structures was obtained much earlier by Hofacker, Schuster and Stadler using a different method. Saturated secondary structures have the property that no base pairs can be added without violating the definition of secondary structure (i.e. introducing a pseudoknot or base triple). Here we show that the asymptotic number of saturated structures is 1.07427 · n-3/2 · 2.35467n, the asymptotic expected number of base pairs is 0.337361 · n, and the asymptotic number of saturated stem-loop structures is 0.323954 · 1.69562n, in contrast to the number 2n - 2 of (arbitrary) stem-loop structures as classically computed by Stein and Waterman. Finally, we apply the work of Drmota to show that the density of states for [all resp. canonical resp. saturated] secondary structures is asymptotically Gaussian. We introduce a stochastic greedy method to sample random saturated structures, called quasi-random saturated structures, and show that the expected number of base pairs is 0.340633 · n.

Download Full-text

Graphical Processing Unit - Supported RNA Secondary Structure Comparison

10.29007/bhsr ◽

2020 ◽

Author(s):

Mutlu Mete ◽

Abdullah Arslan

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Secondary Structures ◽

Graphical Processing Unit ◽

Processing Unit ◽

Structure Comparison ◽

Rna Secondary Structures ◽

Large Databases ◽

Graphical Processing ◽

Rna Secondary Structure Comparison

This study is part of our perpetual effort to develop improved RNA secondary structure analysis tools and databases. In this work we present a new Graphical Processing Unit (GPU)-based RNA structural analysis framework that supports fast multiple RNA secondary structure comparison for very large databases. A search-based secondary structure comparison algorithm deployed in RNASSAC website helps bioinformaticians find common RNA substructures from the underlying database. The algorithm performs two levels of binary searches on the database. Its time requirement is affected by the database size. Experiments on the RNASSAC website show that the algorithm takes seconds for a database of 4,666 RNAs. For example, it takes about 4.4 sec for comparing 25 RNAs from this database. In another case, when many non-overlapping common substructures are desired, a heuristic approach requires as long as 85 sec in comparing 40 RNAs from the same database. The comparisons by this sequential algorithm takes at least 50% more time when RNAs are compared from the database of several millions of RNAs. The most recently curated databases already have millions of RNA secondary structures. The improvement in run-time performance of comparison algorithms is necessary. This study present a GPU-based RNA substructure comparison algorithm with which running time for multiple RNA secondary structures remains feasible for large databases. Our new parallel algorithm is 12 times faster than the CPU version (sequential) comparison algorithm of the RNASSAC website. The response time significantly reduces towards development of a realtime RNA comparison web service for bioinformatics community.

Download Full-text