Efficient RNA structure comparison algorithms

Recently proposed relative addressing-based ([Formula: see text]) RNA secondary structure representation has important features by which an RNA structure database can be stored into a suffix array. A fast substructure search algorithm has been proposed based on binary search on this suffix array. Using this substructure search algorithm, we present a fast algorithm that finds the largest common substructure of given multiple RNA structures in [Formula: see text] format. The multiple RNA structure comparison problem is NP-hard in its general formulation. We introduced a new problem for comparing multiple RNA structures. This problem has more strict similarity definition and objective, and we propose an algorithm that solves this problem efficiently. We also develop another comparison algorithm that iteratively calls this algorithm to locate nonoverlapping large common substructures in compared RNAs. With the new resulting tools, we improved the RNASSAC website (linked from http://faculty.tamuc.edu/aarslan ). This website now also includes two drawing tools: one specialized for preparing RNA substructures that can be used as input by the search tool, and another one for automatically drawing the entire RNA structure from a given structure sequence.

Download Full-text

A database of flavivirus RNA structures with a search algorithm for pseudoknots and triple base interactions

Bioinformatics ◽

10.1093/bioinformatics/btaa759 ◽

2020 ◽

Cited By ~ 1

Author(s):

Alan Zammit ◽

Leon Helwerda ◽

René C L Olsthoorn ◽

Fons J Verbeek ◽

Alexander P Gultyaev

Keyword(s):

Rna Structure ◽

Yellow Fever Virus ◽

Search Algorithm ◽

General Pattern ◽

Pattern Search ◽

Supplementary Information ◽

Untranslated Regions ◽

Rna Structures ◽

Rna Sequences ◽

Structure Database

Abstract Motivation The Flavivirus genus includes several important pathogens, such as Zika, dengue and yellow fever virus. Flavivirus RNA genomes contain a number of functionally important structures in their 3′ untranslated regions (3′UTRs). Due to the diversity of sequences and topologies of these structures, their identification is often difficult. In contrast, predictions of such structures are important for understanding of flavivirus replication cycles and development of antiviral strategies. Results We have developed an algorithm for structured pattern search in RNA sequences, including secondary structures, pseudoknots and triple base interactions. Using the data on known conserved flavivirus 3′UTR structures, we constructed structural descriptors which covered the diversity of patterns in these motifs. The descriptors and the search algorithm were used for the construction of a database of flavivirus 3′UTR structures. Validating this approach, we identified a number of domains matching a general pattern of exoribonuclease Xrn1-resistant RNAs in the growing group of insect-specific flaviviruses. Availability and implementation The Leiden Flavivirus RNA Structure Database is available at https://rna.liacs.nl. The search algorithm is available at https://github.com/LeidenRNA/SRHS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RNA-Puzzles toolkit: a computational resource of RNA 3D structure benchmark datasets, structure manipulation, and evaluation tools

Nucleic Acids Research ◽

10.1093/nar/gkz1108 ◽

2019 ◽

Cited By ~ 2

Author(s):

Marcin Magnus ◽

Maciej Antczak ◽

Tomasz Zok ◽

Jakub Wiedemann ◽

Piotr Lukasiak ◽

...

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

3D Structure ◽

Prediction Methods ◽

Computational Resource ◽

Structure Comparison ◽

Assessment Protocol ◽

Benchmark Datasets ◽

3D Structure Prediction ◽

Comparison Algorithms

Abstract Significant improvements have been made in the efficiency and accuracy of RNA 3D structure prediction methods during the succeeding challenges of RNA-Puzzles, a community-wide effort on the assessment of blind prediction of RNA tertiary structures. The RNA-Puzzles contest has shown, among others, that the development and validation of computational methods for RNA fold prediction strongly depend on the benchmark datasets and the structure comparison algorithms. Yet, there has been no systematic benchmark set or decoy structures available for the 3D structure prediction of RNA, hindering the standardization of comparative tests in the modeling of RNA structure. Furthermore, there has not been a unified set of tools that allows deep and complete RNA structure analysis, and at the same time, that is easy to use. Here, we present RNA-Puzzles toolkit, a computational resource including (i) decoy sets generated by different RNA 3D structure prediction methods (raw, for-evaluation and standardized datasets), (ii) 3D structure normalization, analysis, manipulation, visualization tools (RNA_format, RNA_normalizer, rna-tools) and (iii) 3D structure comparison metric tools (RNAQUA, MCQ4Structures). This resource provides a full list of computational tools as well as a standard RNA 3D structure prediction assessment protocol for the community.

Download Full-text

RNA thermoswitches modulate Staphylococcus aureus adaptation to ambient temperatures

Nucleic Acids Research ◽

10.1093/nar/gkab117 ◽

2021 ◽

Vol 49 (6) ◽

pp. 3409-3426

Author(s):

Arancha Catalan-Moreno ◽

Marta Cela ◽

Pilar Menendez-Gil ◽

Naiara Irurzun ◽

Carlos J Caballero ◽

...

Keyword(s):

Staphylococcus Aureus ◽

Rna Structure ◽

Cold Shock ◽

Mrna Translation ◽

Site Directed Mutagenesis ◽

Rna Structures ◽

Double Stranded Rna ◽

Ambient Temperatures ◽

Shock Proteins ◽

Rna Hairpin

Abstract Thermoregulation of virulence genes in bacterial pathogens is essential for environment-to-host transition. However, the mechanisms governing cold adaptation when outside the host remain poorly understood. Here, we found that the production of cold shock proteins CspB and CspC from Staphylococcus aureus is controlled by two paralogous RNA thermoswitches. Through in silico prediction, enzymatic probing and site-directed mutagenesis, we demonstrated that cspB and cspC 5′UTRs adopt alternative RNA structures that shift from one another upon temperature shifts. The open (O) conformation that facilitates mRNA translation is favoured at ambient temperatures (22°C). Conversely, the alternative locked (L) conformation, where the ribosome binding site (RBS) is sequestered in a double-stranded RNA structure, is folded at host-related temperatures (37°C). These structural rearrangements depend on a long RNA hairpin found in the O conformation that sequesters the anti-RBS sequence. Notably, the remaining S. aureus CSP, CspA, may interact with a UUUGUUU motif located in the loop of this long hairpin and favour the folding of the L conformation. This folding represses CspB and CspC production at 37°C. Simultaneous deletion of the cspB/cspC genes or their RNA thermoswitches significantly decreases S. aureus growth rate at ambient temperatures, highlighting the importance of CspB/CspC thermoregulation when S. aureus transitions from the host to the environment.

Download Full-text

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

GHOSTX: An Improved Sequence Homology Search Algorithm Using a Query Suffix Array and a Database Suffix Array

PLoS ONE ◽

10.1371/journal.pone.0103833 ◽

2014 ◽

Vol 9 (8) ◽

pp. e103833 ◽

Cited By ~ 43

Author(s):

Shuji Suzuki ◽

Masanori Kakuta ◽

Takashi Ishida ◽

Yutaka Akiyama

Keyword(s):

Sequence Homology ◽

Search Algorithm ◽

Suffix Array ◽

Homology Search

Download Full-text

The global and local distribution of RNA structure throughout the SARS-CoV-2 genome

10.1101/2020.07.06.190660 ◽

2020 ◽

Cited By ~ 2

Author(s):

Rafael de Cesaris Araujo Tavares ◽

Gandhar Mahadeshwar ◽

Anna Marie Pyle

Keyword(s):

Rna Structure ◽

Drug Targets ◽

Structural Complexity ◽

Viral Agent ◽

Rna Structures ◽

Viral Rnas ◽

Functional Regions ◽

Structural Density ◽

Rna Genome ◽

High Base

AbstractSARS-CoV-2 is the causative viral agent of COVID-19, the disease at the center of the current global pandemic. While knowledge of highly structured regions is integral for mechanistic insights into the viral infection cycle, very little is known about the location and folding stability of functional elements within the massive, ~30kb SARS-CoV-2 RNA genome. In this study, we analyze the folding stability of this RNA genome relative to the structural landscape of other well-known viral RNAs. We present an in-silico pipeline to locate regions of high base pair content across this long genome and also identify well-defined RNA structures, a method that allows for direct comparisons of RNA structural complexity within the several domains in SARS-CoV-2 genome. We report that the SARS-CoV-2 genomic propensity to stable RNA folding is exceptional among RNA viruses, superseding even that of HCV, one of the most highly structured viral RNAs in nature. Furthermore, our analysis reveals varying levels of RNA structure across genomic functional regions, with accessory and structural ORFs containing the highest structural density in the viral genome. Finally, we take a step further to examine how individual RNA structures formed by these ORFs are affected by the differences in genomic and subgenomic contexts. The conclusions reported in this study provide a foundation for structure-function hypotheses in SARS-CoV-2 biology, and in turn, may guide the 3D structural characterization of potential RNA drug targets for COVID-19 therapeutics.

Download Full-text

Evidence for the Emergence of β-Trefoils by ‘Peptide Budding’ from an IgG-like β-Sandwich

10.1101/2021.10.04.462989 ◽

2021 ◽

Author(s):

Liam M. Longo ◽

Rachel Kolodny ◽

Shawn E. McGlynn

Keyword(s):

De Novo ◽

General Trend ◽

Sequence Structure ◽

Structure Comparison ◽

Related Sequence ◽

Structure Space ◽

Protein Universe ◽

Remote Islands ◽

Comparison Algorithms ◽

Hallmark Feature

AbstractAs sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both ‘β-trefoil bridging themes’ (evolutionarily-related sequence segments) and ‘β-trefoil-like motifs’ (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself – namely, that it is a derived fold formed by ‘budding’ from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the emergence of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature’s sewing table.

Download Full-text

Quantitative Structure Activity Relationship (QSAR) study predicts small molecule binding to RNA structure

10.33774/chemrxiv-2021-czl9p-v2 ◽

2021 ◽

Author(s):

Zhengguo Cai ◽

Martina Zafferani ◽

Olanrewaju Akande ◽

Amanda Hargrove

Keyword(s):

High Resolution ◽

Small Molecules ◽

Small Molecule ◽

Rna Structure ◽

Quantitative Structure ◽

Rna Structures ◽

Qsar Study ◽

Structure Activity ◽

Qsar Models ◽

Hiv 1

The diversity of RNA structural elements and their documented role in human diseases make RNA an attractive therapeutic target. However, progress in drug discovery and development has been hindered by challenges in the determination of high-resolution RNA structures and a limited understanding of the parameters that drive RNA recognition by small molecules, including a lack of validated quantitative structure-activity relationships (QSAR). Herein, we developed QSAR models that quantitatively predict both thermodynamic and kinetic-based binding parameters of small molecules and the HIV-1 TAR model RNA system. A set of small molecules bearing diverse scaffolds was screened against the HIV-1-TAR construct using surface plasmon resonance, which provided the binding kinetics and affinities. The data was then analyzed using multiple linear regression (MLR) combined with feature selection to afford robust models for binding of diverse RNA-targeted scaffolds. The predictivity of the model was validated on untested small molecules. The QSAR models presented herein represent the first application of validated and predictive 2D-QSAR using multiple scaffolds against an RNA target. We expect the workflow to be generally applicable to other RNA structures, ultimately providing essential insight into the small molecule descriptors that drive selective binding interactions and, consequently, providing a platform that can exponentially increase the efficiency of ligand design and optimization without the need for high-resolution RNA structures.

Download Full-text

The global and local distribution of RNA structure throughout the SARS-CoV-2 genome

Journal of Virology ◽

10.1128/jvi.02190-20 ◽

2020 ◽

Cited By ~ 2

Author(s):

Rafael de Cesaris Araujo Tavares ◽

Gandhar Mahadeshwar ◽

Han Wan ◽

Nicholas C. Huston ◽

Anna Marie Pyle

Keyword(s):

In Silico ◽

Rna Structure ◽

Drug Targets ◽

Rna Folding ◽

Rna Viruses ◽

Viral Agent ◽

Rna Structures ◽

Viral Rnas ◽

Viral Genomes ◽

Rna Genome

SARS-CoV-2 is the causative viral agent of COVID-19, the disease at the center of the current global pandemic. While knowledge of highly structured regions is integral for mechanistic insights into the viral infection cycle, very little is known about the location and folding stability of functional elements within the massive, ∼30kb SARS-CoV-2 RNA genome. In this study, we analyze the folding stability of this RNA genome relative to the structural landscape of other well-known viral RNAs. We present an in-silico pipeline to predict regions of high base pair content across long genomes and to pinpoint hotspots of well-defined RNA structures, a method that allows for direct comparisons of RNA structural complexity within the several domains in SARS-CoV-2 genome. We report that the SARS-CoV-2 genomic propensity for stable RNA folding is exceptional among RNA viruses, superseding even that of HCV, one of the most structured viral RNAs in nature. Furthermore, our analysis suggests varying levels of RNA structure across genomic functional regions, with accessory and structural ORFs containing the highest structural density in the viral genome. Finally, we take a step further to examine how individual RNA structures formed by these ORFs are affected by the differences in genomic and subgenomic contexts, which given the technical difficulty of experimentally separating cellular mixtures of sgRNA from gRNA, is a unique advantage of our in-silico pipeline. The resulting findings provide a useful roadmap for planning focused empirical studies of SARS-CoV-2 RNA biology, and a preliminary guide for exploring potential SARS-CoV-2 RNA drug targets. Importance The RNA genome of SARS-CoV-2 is among the largest and most complex viral genomes, and yet its RNA structural features remain relatively unexplored. Since RNA elements guide function in most RNA viruses, and they represent potential drug targets, it is essential to chart the architectural features of SARS-CoV-2 and pinpoint regions that merit focused study. Here we show that RNA folding stability of SARS-CoV-2 genome is exceptional among viral genomes and we develop a method to directly compare levels of predicted secondary structure across SARS-CoV-2 domains. Remarkably, we find that coding regions display the highest structural propensity in the genome, forming motifs that differ between the genomic and subgenomic contexts. Our approach provides an attractive strategy to rapidly screen for candidate structured regions based on base pairing potential and provides a readily interpretable roadmap to guide functional studies of RNA viruses and other pharmacologically relevant RNA transcripts.

Download Full-text

Intrinsic Regulatory Role of RNA Structural Arrangement in Alternative Splicing Control

International Journal of Molecular Sciences ◽

10.3390/ijms21145161 ◽

2020 ◽

Vol 21 (14) ◽

pp. 5161 ◽

Cited By ~ 1

Author(s):

Katarzyna Taylor ◽

Krzysztof Sobczak

Keyword(s):

Alternative Splicing ◽

Rna Structure ◽

The Other ◽

Splicing Regulation ◽

Rna Structures ◽

Structural Arrangement ◽

Biological Functionality ◽

Cis And Trans ◽

Structural Aspects

Alternative splicing is a highly sophisticated process, playing a significant role in posttranscriptional gene expression and underlying the diversity and complexity of organisms. Its regulation is multilayered, including an intrinsic role of RNA structural arrangement which undergoes time- and tissue-specific alterations. In this review, we describe the principles of RNA structural arrangement and briefly decipher its cis- and trans-acting cellular modulators which serve as crucial determinants of biological functionality of the RNA structure. Subsequently, we engage in a discussion about the RNA structure-mediated mechanisms of alternative splicing regulation. On one hand, the impairment of formation of optimal RNA structures may have critical consequences for the splicing outcome and further contribute to understanding the pathomechanism of severe disorders. On the other hand, the structural aspects of RNA became significant features taken into consideration in the endeavor of finding potential therapeutic treatments. Both aspects have been addressed by us emphasizing the importance of ongoing studies in both fields.

Download Full-text