scholarly journals Indexed Dynamic Programming to Boost Edit Distance and LCSS Computation

Author(s):  
Jérémy Barbay ◽  
Andrés Olivares
2021 ◽  
Vol 25 (2) ◽  
pp. 283-303
Author(s):  
Na Liu ◽  
Fei Xie ◽  
Xindong Wu

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.


2012 ◽  
Vol 19 (10) ◽  
pp. 1089-1104 ◽  
Author(s):  
Tomoya Mori ◽  
Takeyuki Tamura ◽  
Daiji Fukagawa ◽  
Atsuhiro Takasu ◽  
Etsuji Tomita ◽  
...  

Author(s):  
Maren Brand ◽  
Nguyen Khoa Tran ◽  
Philipp Spohr ◽  
Sven Schrinner ◽  
Gunnar W. Klau

AbstractWe consider the homo-edit distance problem, which is the minimum number of homo-deletions or homo-insertions to convert one string into another. A homo-insertion is the insertion of a string of equal characters into another string, while a homo-deletion is the inverse operation. We show how to compute the homo-edit distance of two strings in polynomial time: We first demonstrate that the problem is equivalent to computing a common subsequence of the two input strings with a minimum number of homo-deletions and then present a dynamic programming solution for the reformulated problem.2012 ACM Subject ClassificationApplied computing → Bioinformatics; Applied computing → Molecular sequence analysis; Theory of computation → Dynamic programming


2009 ◽  
Vol 14 (6) ◽  
pp. 739-745 ◽  
Author(s):  
Hongfei Pan ◽  
Dong Liang ◽  
Jun Tang ◽  
Nian Wang ◽  
Wei Li

2021 ◽  
Author(s):  
Pesho Ivanov ◽  
Benjamin Bichsel ◽  
Martin Vechev

We present a novel A* seed heuristic enabling fast and optimal sequence-to-graph alignment, guaranteed to minimize the edit distance of the alignment assuming non-negative edit costs. We phrase optimal alignment as a shortest path problem and solve it by instantiating the A* algorithm with our novel seed heuristic. The key idea of the seed heuristic is to extract seeds from the read, locate them in the reference, mark preceding reference positions by crumbs, and use the crumbs to direct the A* search. We prove admissibility of the seed heuristic, thus guaranteeing alignment optimality. Our implementation extends the free and open source AStarix aligner and demonstrates that the seed heuristic outperforms all state-of-the-art optimal aligners including GraphAligner, Vargas, PaSGAL, and the prefix heuristic previously employed by AStarix. Specifically, we achieve a consistent speedup of >60x on both short Illumina reads and long HiFi reads (up to 25kbp), on both the E. coli linear reference genome (1Mbp) and the MHC variant graph (5Mbp). Our speedup is enabled by the seed heuristic consistently skipping >99.99% of the table cells that optimal aligners based on dynamic programming compute.


Sign in / Sign up

Export Citation Format

Share Document