LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences

Abstract Background The function of non-coding RNA sequences is largely determined by their spatial conformation, namely the secondary structure of the molecule, formed by Watson–Crick interactions between nucleotides. Hence, modern RNA alignment algorithms routinely take structural information into account. In order to discover yet unknown RNA families and infer their possible functions, the structural alignment of RNAs is an essential task. This task demands a lot of computational resources, especially for aligning many long sequences, and it therefore requires efficient algorithms that utilize modern hardware when available. A subset of the secondary structures contains overlapping interactions (called pseudoknots), which add additional complexity to the problem and are often ignored in available software. Results We present the SeqAn-based software LaRA 2 that is significantly faster than comparable software for accurate pairwise and multiple alignments of structured RNA sequences. In contrast to other programs our approach can handle arbitrary pseudoknots. As an improved re-implementation of the LaRA tool for structural alignments, LaRA 2 uses multi-threading and vectorization for parallel execution and a new heuristic for computing a lower boundary of the solution. Our algorithmic improvements yield a program that is up to 130 times faster than the previous version. Conclusions With LaRA 2 we provide a tool to analyse large sets of RNA secondary structures in relatively short time, based on structural alignment. The produced alignments can be used to derive structural motifs for the search in genomic databases.

Download Full-text

DSARna: RNA Secondary Structure Alignment Based on Digital Sequence Representation

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207323666200811100338 ◽

2020 ◽

Vol 23 ◽

Author(s):

Longjian Gao ◽

Chengzhen Xu ◽

Wangan Song ◽

Feng Xiao ◽

Xiaomin Wu ◽

...

Keyword(s):

Secondary Structure ◽

Rna Structure ◽

Rna Secondary Structure ◽

High Throughput Sequencing ◽

Structural Information ◽

Dynamic Programming Algorithm ◽

Structure Alignment ◽

Programming Algorithm ◽

Structure Information ◽

Alignment Algorithms

Background: With increasing applications and development of high-throughput sequencing, knowledge of the primary structure of RNA has expanded exponentially. Moreover, the function of RNA is determined by the secondary or higher RNA structure, and similar structures are related to similar functions, such as the secondary clover structure of tRNA. Therefore, RNA structure alignment is an important subject in computational biology and bioinformatics to accurately predict function. However, the traditional RNA structure alignment algorithms have some drawbacks such as high complexity and easy loss of secondary structure information. Objective: To study RNA secondary structure alignment according to the shortcomings of existing secondary structure alignment algorithms and the characteristics of RNA secondary structure. Method: We propose a new digital sequence RNA structure representation algorithm named “DSARna” . Then based on a dynamic programming algorithm, the scoring matrix and binary path matrix are simultaneously constructed. The backtracking path is identified in the path matrix, and the optimal result is predicted according to the path length. Conclusions: Upon comparison with the existing SimTree algorithm through experimental analysis, the proposed method showed higher accuracy and could ensure that the structural information is not easily lost in terms of improved specificity, sensitivity, and the Matthews correlation coefficient.

Download Full-text

RNAfamProb Plus NeoFold: Estimations of Posterior Probabilities on RNA Structural Alignment and RNA Secondary Structures with Incorporating Homologous-RNA Sequences

10.1101/812891 ◽

2019 ◽

Author(s):

Masaki Tagashira ◽

Kiyoshi Asai

Keyword(s):

Secondary Structure ◽

Sequence Alignment ◽

Structural Alignment ◽

Secondary Structures ◽

Simultaneous Optimization ◽

Supplementary Information ◽

Sequence Alignments ◽

Rna Sequences ◽

Link Type ◽

Rna Structural Alignment

AbstractMotivationThe simultaneous optimization of the sequence alignment and secondary structures among RNAs, structural alignment, has been required for the more appropriate comparison of functional ncRNAs than sequence alignment. Pseudo-probabilities given RNA sequences on structural alignment have been desired for more-accurate secondary structures, sequence alignments, consensus secondary structures, and structural alignments. However, any algorithms have not been proposed for these pseudo-probabilities.ResultsWe invented the RNAfamProb algorithm, an algorithm for estimating these pseudo-probabilities. We performed the application of these pseudo-probabilities to two biological problems, the visualization with these pseudo-probabilities and maximum-expected-accuracy secondary-structure (estimation). The RNAfamProb program, an implementation of this algorithm, plus the NeoFold program, a maximum-expected-accuracy secondary-structure program with these pseudo-probabilities, demonstrated prediction accuracy better than three state-of-the-art programs of maximum-expected-accuracy secondary-structure while demanding running time far longer than these three programs as expected due to the intrinsic serious problem-complexity of structural alignment compared with independent secondary structure and sequence alignment. Both the RNAfamProb and NeoFold programs estimate matters more accurately with incorporating homologous-RNA sequences.AvailabilityThe source code of each of these two programs is available on each of “https://github.com/heartsh/rnafamprob” and “https://github.com/heartsh/neofold”.Contact“[email protected]” and “[email protected]”.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

SCOT: Rethinking the classification of secondary structure elements

Bioinformatics ◽

10.1093/bioinformatics/btz826 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2417-2428

Author(s):

Tobias Brinkjost ◽

Christiane Ehrt ◽

Oliver Koch ◽

Petra Mutzel

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Structural Alignment ◽

Secondary Structure Element ◽

Structure Alignment ◽

Supplementary Information ◽

Structure Quality ◽

Alignment Algorithms ◽

Geometric Consistency

Abstract Motivation Secondary structure classification is one of the most important issues in structure-based analyses due to its impact on secondary structure prediction, structural alignment and protein visualization. There are still open challenges concerning helix and sheet assignments which are currently not addressed by a single multi-purpose software. Results We introduce SCOT (Secondary structure Classification On Turns) as a novel secondary structure element assignment software which supports the assignment of turns, right-handed α-, 310- and π-helices, left-handed α- and 310-helices, 2.27- and polyproline II helices, β-sheets and kinks. We demonstrate that the introduction of helix Purity values enables a clear differentiation between helix classes. SCOT’s unique strengths are highlighted by comparing it to six state-of-the-art methods (DSSP, STRIDE, ASSP, SEGNO, DISICL and SHAFT). The assignment approaches were compared concerning geometric consistency, protein structure quality and flexibility dependency and their impact on secondary structure element-based structural alignments. We show that only SCOT’s combination of hydrogen bonds, geometric criteria and dihedral angles enables robust assignments independent of the structure quality and flexibility. We demonstrate that this combination and the elaborate kink detection lead to SCOT’s clear superiority for protein alignments. As the resulting helices and strands are provided in a PDB conform output format, they can immediately be used for structure alignment algorithms. Taken together, the application of our new method and the straight-forward visualization using the accompanying PyMOL scripts enable the comprehensive analysis of regular backbone geometries in proteins. Availability and implementation https://this-group.rocks Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

BMC Bioinformatics ◽

10.1186/1471-2105-12-108 ◽

2011 ◽

Vol 12 (1) ◽

pp. 108 ◽

Cited By ~ 57

Author(s):

Arif O Harmanci ◽

Gaurav Sharma ◽

David H Mathews

Keyword(s):

Secondary Structures ◽

Rna Sequences ◽

Probabilistic Estimation

Download Full-text

Phases of the secondary structures of RNA sequences

EPL (Europhysics Letters) ◽

10.1209/epl/i2002-00128-3 ◽

2002 ◽

Vol 59 (6) ◽

pp. 903-909 ◽

Cited By ~ 19

Author(s):

R Bundschuh ◽

T Hwa

Keyword(s):

Secondary Structures ◽

Rna Sequences

Download Full-text

TOPAS: network-based structural alignment of RNA sequences

Bioinformatics ◽

10.1093/bioinformatics/btz001 ◽

2019 ◽

Vol 35 (17) ◽

pp. 2941-2948 ◽

Cited By ~ 2

Author(s):

Chun-Chi Chen ◽

Hyundoo Jeong ◽

Xiaoning Qian ◽

Byung-Jun Yoon

Keyword(s):

Computational Complexity ◽

Secondary Structure ◽

Large Scale ◽

Structural Alignment ◽

Programming Approach ◽

Rna Sequences ◽

Optimal Sequence ◽

Dynamic Programming Approach ◽

Probabilistic Network ◽

Rna Structural Alignment

Abstract Motivation For many RNA families, the secondary structure is known to be better conserved among the member RNAs compared to the primary sequence. For this reason, it is important to consider the underlying folding structures when aligning RNA sequences, especially for those with relatively low sequence identity. Given a set of RNAs with unknown structures, simultaneous RNA alignment and folding algorithms aim to accurately align the RNAs by jointly predicting their consensus secondary structure and the optimal sequence alignment. Despite the improved accuracy of the resulting alignment, the computational complexity of simultaneous alignment and folding for a pair of RNAs is O(N6), which is too costly to be used for large-scale analysis. Results In order to address this shortcoming, in this work, we propose a novel network-based scheme for pairwise structural alignment of RNAs. The proposed algorithm, TOPAS, builds on the concept of topological networks that provide structural maps of the RNAs to be aligned. For each RNA sequence, TOPAS first constructs a topological network based on the predicted folding structure, which consists of sequential edges and structural edges weighted by the base-pairing probabilities. The obtained networks can then be efficiently aligned by using probabilistic network alignment techniques, thereby yielding the structural alignment of the RNAs. The computational complexity of our proposed method is significantly lower than that of the Sankoff-style dynamic programming approach, while yielding favorable alignment results. Furthermore, another important advantage of the proposed algorithm is its capability of handling RNAs with pseudoknots while predicting the RNA structural alignment. We demonstrate that TOPAS generally outperforms previous RNA structural alignment methods on RNA benchmarks in terms of both speed and accuracy. Availability and implementation Source code of TOPAS and the benchmark data used in this paper are available at https://github.com/bjyoontamu/TOPAS.

Download Full-text

Structure-based prediction of ligand–protein interactions on a genome-wide scale

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1705381114 ◽

2017 ◽

Vol 114 (52) ◽

pp. 13685-13690 ◽

Cited By ~ 21

Author(s):

Howook Hwang ◽

Fabian Dey ◽

Donald Petrey ◽

Barry Honig

Keyword(s):

Binding Site ◽

Protein Interactions ◽

Kinase Inhibitors ◽

Structural Information ◽

Structural Alignment ◽

Scoring Function ◽

A Genome ◽

Small Molecule Ligands ◽

Wide Scale ◽

Approved Drugs

We report a template-based method, LT-scanner, which scans the human proteome using protein structural alignment to identify proteins that are likely to bind ligands that are present in experimentally determined complexes. A scoring function that rapidly accounts for binding site similarities between the template and the proteins being scanned is a crucial feature of the method. The overall approach is first tested based on its ability to predict the residues on the surface of a protein that are likely to bind small-molecule ligands. The algorithm that we present, LBias, is shown to compare very favorably to existing algorithms for binding site residue prediction. LT-scanner’s performance is evaluated based on its ability to identify known targets of Food and Drug Administration (FDA)-approved drugs and it too proves to be highly effective. The specificity of the scoring function that we use is demonstrated by the ability of LT-scanner to identify the known targets of FDA-approved kinase inhibitors based on templates involving other kinases. Combining sequence with structural information further improves LT-scanner performance. The approach we describe is extendable to the more general problem of identifying binding partners of known ligands even if they do not appear in a structurally determined complex, although this will require the integration of methods that combine protein structure and chemical compound databases.

Download Full-text

Deep Sequencing of Foot-and-Mouth Disease Virus Reveals RNA Sequences Involved in Genome Packaging

Journal of Virology ◽

10.1128/jvi.01159-17 ◽

2017 ◽

Vol 92 (1) ◽

Cited By ~ 4

Author(s):

Grace Logan ◽

Joseph Newman ◽

Caroline F. Wright ◽

Lidia Lasecka-Dykes ◽

Daniel T. Haydon ◽

...

Keyword(s):

Disease Virus ◽

Deep Sequencing ◽

Rna Viruses ◽

Foot And Mouth Disease ◽

Secondary Structures ◽

Mouth Disease ◽

Genome Packaging ◽

Rna Sequences ◽

Mouth Disease Virus ◽

Foot And Mouth

ABSTRACTNonenveloped viruses protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. Packaging and capsid assembly in RNA viruses can involve interactions between capsid proteins and secondary structures in the viral genome, as exemplified by the RNA bacteriophage MS2 and as proposed for other RNA viruses of plants, animals, and human. In the picornavirus family of nonenveloped RNA viruses, the requirements for genome packaging remain poorly understood. Here, we show a novel and simple approach to identify predicted RNA secondary structures involved in genome packaging in the picornavirus foot-and-mouth disease virus (FMDV). By interrogating deep sequencing data generated from both packaged and unpackaged populations of RNA, we have determined multiple regions of the genome with constrained variation in the packaged population. Predicted secondary structures of these regions revealed stem-loops with conservation of structure and a common motif at the loop. Disruption of these features resulted in attenuation of virus growth in cell culture due to a reduction in assembly of mature virions. This study provides evidence for the involvement of predicted RNA structures in picornavirus packaging and offers a readily transferable methodology for identifying packaging requirements in many other viruses.IMPORTANCEIn order to transmit their genetic material to a new host, nonenveloped viruses must protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. For many nonenveloped RNA viruses the requirements for this critical part of the viral life cycle remains poorly understood. We have identified RNA sequences involved in genome packaging of the picornavirus foot-and-mouth disease virus. This virus causes an economically devastating disease of livestock affecting both the developed and developing world. The experimental methods developed to carry out this work are novel, simple, and transferable to the study of packaging signals in other RNA viruses. Improved understanding of RNA packaging may lead to novel vaccine approaches or targets for antiviral drugs with broad-spectrum activity.

Download Full-text

LOCAL SEQUENCE-STRUCTURE MOTIFS IN RNA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000818 ◽

2004 ◽

Vol 02 (04) ◽

pp. 681-698 ◽

Cited By ~ 28

Author(s):

ROLF BACKOFEN ◽

SEBASTIAN WILL

Keyword(s):

Information Structure ◽

Structure Alignment ◽

General Definition ◽

Local Alignment ◽

Sequence Information ◽

Sequence Structure ◽

Worst Case ◽

Rna Molecules ◽

Alignment Algorithms ◽

Local Sequence

Ribonuclic acid (RNA) enjoys increasing interest in molecular biology; despite this interest fundamental algorithms are lacking, e.g. for identifying local motifs. As proteins, RNA molecules have a distinctive structure. Therefore, in addition to sequence information, structure plays an important part in assessing the similarity of RNAs. Furthermore, common sequence-structure features in two or several RNA molecules are often only spatially local, where possibly large parts of the molecules are dissimilar. Consequently, we address the problem of comparing RNA molecules by computing an optimal local alignment with respect to sequence and structure information. While local alignment is superior to global alignment for identifying local similarities, no general local sequence-structure alignment algorithms are currently known. We suggest a new general definition of locality for sequence-structure alignments that is biologically motivated and efficiently tractable. To show the former, we discuss locality of RNA and prove that the defined locality means connectivity by atomic and non-atomic bonds. To show the latter, we present an efficient algorithm for the newly defined pairwise local sequence-structure alignment (lssa) problem for RNA. For molecules of lengthes n and m, the algorithm has worst-case time complexity of O(n2·m2· max (n,m)) and a space complexity of only O(n·m). An implementation of our algorithm is available at . Its runtime is competitive with global sequence-structure alignment.

Download Full-text

Multiple structural alignment by secondary structures: Algorithm and applications

Protein Science ◽

10.1110/ps.03200603 ◽

2009 ◽

Vol 12 (11) ◽

pp. 2492-2507 ◽

Cited By ~ 45

Author(s):

Oranit Dror ◽

Hadar Benyamini ◽

Ruth Nussinov ◽

Haim J. Wolfson

Keyword(s):

Structural Alignment ◽

Secondary Structures

Download Full-text