rna structural alignment
Recently Published Documents


TOTAL DOCUMENTS

15
(FIVE YEARS 6)

H-INDEX

5
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Manato Akiyama ◽  
Yasubumi Sakakibara

Effective embedding is being actively conducted by applying deep learning to biomolecular information. Obtaining better embedding enhances the quality of downstream analysis such as DNA sequence motif detection and protein function prediction. In this study, we adopt a pre-training algorithm for the effective embedding of RNA bases to acquire semantically rich representations, and apply it to two fundamental RNA sequence problems: structural alignment and clustering. By using the pre-learning algorithm to embed the four bases of RNA in a position-dependent manner using a large number of RNA sequences from various RNA families, a context-sensitive embedding representation is obtained. As a result, not only base information but also secondary structure and context information of RNA sequences are embedded for each base. We call this informative base embedding and use it to achieve accuracy superior to that of existing state-of-the-art methods in RNA structural alignment and RNA family clustering tasks. Furthermore, by performing RNA sequence alignment combining this informative base embedding with a simple Needleman-Wunsch alignment algorithm, we succeed in calculating a structural alignment in a time complexity O(n2) instead of the O(n6) time complexity of Sankoff-style algorithms.


2020 ◽  
Author(s):  
Sizhen Li ◽  
He Zhang ◽  
Liang Zhang ◽  
Kaibo Liu ◽  
Boxiang Liu ◽  
...  

Many functional RNA structures are conserved across evolution, and such conserved structures provide critical targets for diagnostics and treatment. TurboFold II is a state-of-the-art software that can predict conserved structures and alignments given homologous sequences, but its cubic runtime and quadratic memory usage with sequence length prevent it from being applied to most full-length viral genomes. As the COVID-19 outbreak spreads, there is a growing need to have a fast and accurate tool to identify conserved regions of SARS-CoV-2. To address this issue, we present LinearTurboFold, which successfully accelerates TurboFold II without sacrificing accuracy on secondary structure and multiple sequence alignment prediction. LinearTurboFold is orders of magnitude faster than Turbo-Fold II, e.g., 372× faster (12 minutes vs. 3.1 days) on a group of five HIV-1 homologs with average length 9,686 nt. LinearTurboFold is able to scale up to the full sequence of SARS-CoV-2, and identifies conserved structures that have been supported by previous studies. Additionally, LinearTurboFold finds a list of novel conserved regions, including long-range base pairs, which may be useful for better understanding the virus.


2019 ◽  
Author(s):  
Masaki Tagashira ◽  
Kiyoshi Asai

AbstractMotivationThe simultaneous optimization of the sequence alignment and secondary structures among RNAs, structural alignment, has been required for the more appropriate comparison of functional ncRNAs than sequence alignment. Pseudo-probabilities given RNA sequences on structural alignment have been desired for more-accurate secondary structures, sequence alignments, consensus secondary structures, and structural alignments. However, any algorithms have not been proposed for these pseudo-probabilities.ResultsWe invented the RNAfamProb algorithm, an algorithm for estimating these pseudo-probabilities. We performed the application of these pseudo-probabilities to two biological problems, the visualization with these pseudo-probabilities and maximum-expected-accuracy secondary-structure (estimation). The RNAfamProb program, an implementation of this algorithm, plus the NeoFold program, a maximum-expected-accuracy secondary-structure program with these pseudo-probabilities, demonstrated prediction accuracy better than three state-of-the-art programs of maximum-expected-accuracy secondary-structure while demanding running time far longer than these three programs as expected due to the intrinsic serious problem-complexity of structural alignment compared with independent secondary structure and sequence alignment. Both the RNAfamProb and NeoFold programs estimate matters more accurately with incorporating homologous-RNA sequences.AvailabilityThe source code of each of these two programs is available on each of “https://github.com/heartsh/rnafamprob” and “https://github.com/heartsh/neofold”.Contact“[email protected]” and “[email protected]”.Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (17) ◽  
pp. 2941-2948 ◽  
Author(s):  
Chun-Chi Chen ◽  
Hyundoo Jeong ◽  
Xiaoning Qian ◽  
Byung-Jun Yoon

Abstract Motivation For many RNA families, the secondary structure is known to be better conserved among the member RNAs compared to the primary sequence. For this reason, it is important to consider the underlying folding structures when aligning RNA sequences, especially for those with relatively low sequence identity. Given a set of RNAs with unknown structures, simultaneous RNA alignment and folding algorithms aim to accurately align the RNAs by jointly predicting their consensus secondary structure and the optimal sequence alignment. Despite the improved accuracy of the resulting alignment, the computational complexity of simultaneous alignment and folding for a pair of RNAs is O(N6), which is too costly to be used for large-scale analysis. Results In order to address this shortcoming, in this work, we propose a novel network-based scheme for pairwise structural alignment of RNAs. The proposed algorithm, TOPAS, builds on the concept of topological networks that provide structural maps of the RNAs to be aligned. For each RNA sequence, TOPAS first constructs a topological network based on the predicted folding structure, which consists of sequential edges and structural edges weighted by the base-pairing probabilities. The obtained networks can then be efficiently aligned by using probabilistic network alignment techniques, thereby yielding the structural alignment of the RNAs. The computational complexity of our proposed method is significantly lower than that of the Sankoff-style dynamic programming approach, while yielding favorable alignment results. Furthermore, another important advantage of the proposed algorithm is its capability of handling RNAs with pseudoknots while predicting the RNA structural alignment. We demonstrate that TOPAS generally outperforms previous RNA structural alignment methods on RNA benchmarks in terms of both speed and accuracy. Availability and implementation Source code of TOPAS and the benchmark data used in this paper are available at https://github.com/bjyoontamu/TOPAS.


RNA ◽  
2012 ◽  
Vol 18 (7) ◽  
pp. 1319-1327 ◽  
Author(s):  
J. Widmann ◽  
J. Stombaugh ◽  
D. McDonald ◽  
J. Chocholousova ◽  
P. Gardner ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document