Indexed Dynamic Programming to Boost Edit Distance and LCSS Computation

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

Speeding-Up the Dynamic Programming Procedure for the Edit Distance of Two Strings

Communications in Computer and Information Science - Database and Expert Systems Applications ◽

10.1007/978-3-030-27684-3_9 ◽

2019 ◽

pp. 59-66

Author(s):

Giuseppe Lancia ◽

Marcello Dalpasso

Keyword(s):

Dynamic Programming ◽

Edit Distance

Download Full-text

A Clique-Based Method Using Dynamic Programming for Computing Edit Distance Between Unordered Trees

Journal of Computational Biology ◽

10.1089/cmb.2012.0133 ◽

2012 ◽

Vol 19 (10) ◽

pp. 1089-1104 ◽

Cited By ~ 12

Author(s):

Tomoya Mori ◽

Takeyuki Tamura ◽

Daiji Fukagawa ◽

Atsuhiro Takasu ◽

Etsuji Tomita ◽

...

Keyword(s):

Dynamic Programming ◽

Edit Distance ◽

Unordered Trees

Download Full-text

The Homo-Edit Distance Problem

10.1101/2020.05.27.118273 ◽

2020 ◽

Cited By ~ 1

Author(s):

Maren Brand ◽

Nguyen Khoa Tran ◽

Philipp Spohr ◽

Sven Schrinner ◽

Gunnar W. Klau

Keyword(s):

Dynamic Programming ◽

Edit Distance ◽

Theory Of Computation ◽

Molecular Sequence ◽

Analysis Theory ◽

Common Subsequence ◽

Minimum Number ◽

Distance Problem ◽

Input Strings ◽

Programming Solution

AbstractWe consider the homo-edit distance problem, which is the minimum number of homo-deletions or homo-insertions to convert one string into another. A homo-insertion is the insertion of a string of equal characters into another string, while a homo-deletion is the inverse operation. We show how to compute the homo-edit distance of two strings in polynomial time: We first demonstrate that the problem is equivalent to computing a common subsequence of the two input strings with a minimum number of homo-deletions and then present a dynamic programming solution for the reformulated problem.2012 ACM Subject ClassificationApplied computing → Bioinformatics; Applied computing → Molecular sequence analysis; Theory of computation → Dynamic programming

Download Full-text

A Dynamic Programming A* Algorithm for Computing Unordered Tree Edit Distance

2013 Second IIAI International Conference on Advanced Applied Informatics ◽

10.1109/iiai-aai.2013.71 ◽

2013 ◽

Author(s):

Takuya Yoshino ◽

Shoichi Higuchi ◽

Kouichi Hirata

Keyword(s):

Dynamic Programming ◽

Edit Distance ◽

A Algorithm ◽

Tree Edit Distance ◽

Unordered Tree

Download Full-text

Shape recognition and retrieval based on edit distance and dynamic programming

Tsinghua Science & Technology ◽

10.1016/s1007-0214(09)70144-0 ◽

2009 ◽

Vol 14 (6) ◽

pp. 739-745 ◽

Cited By ~ 1

Author(s):

Hongfei Pan ◽

Dong Liang ◽

Jun Tang ◽

Nian Wang ◽

Wei Li

Keyword(s):

Dynamic Programming ◽

Edit Distance ◽

Shape Recognition

Download Full-text

Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds

10.1101/2021.11.05.467453 ◽

2021 ◽

Author(s):

Pesho Ivanov ◽

Benjamin Bichsel ◽

Martin Vechev

Keyword(s):

Dynamic Programming ◽

Edit Distance ◽

Reference Genome ◽

State Of The Art ◽

Optimal Alignment ◽

Reference Mark ◽

A Algorithm ◽

Optimal Sequence ◽

E Coli ◽

Graph Alignment

We present a novel A* seed heuristic enabling fast and optimal sequence-to-graph alignment, guaranteed to minimize the edit distance of the alignment assuming non-negative edit costs. We phrase optimal alignment as a shortest path problem and solve it by instantiating the A* algorithm with our novel seed heuristic. The key idea of the seed heuristic is to extract seeds from the read, locate them in the reference, mark preceding reference positions by crumbs, and use the crumbs to direct the A* search. We prove admissibility of the seed heuristic, thus guaranteeing alignment optimality. Our implementation extends the free and open source AStarix aligner and demonstrates that the seed heuristic outperforms all state-of-the-art optimal aligners including GraphAligner, Vargas, PaSGAL, and the prefix heuristic previously employed by AStarix. Specifically, we achieve a consistent speedup of >60x on both short Illumina reads and long HiFi reads (up to 25kbp), on both the E. coli linear reference genome (1Mbp) and the MHC variant graph (5Mbp). Our speedup is enabled by the seed heuristic consistently skipping >99.99% of the table cells that optimal aligners based on dynamic programming compute.

Download Full-text