The Homo-Edit Distance Problem

AbstractWe consider the homo-edit distance problem, which is the minimum number of homo-deletions or homo-insertions to convert one string into another. A homo-insertion is the insertion of a string of equal characters into another string, while a homo-deletion is the inverse operation. We show how to compute the homo-edit distance of two strings in polynomial time: We first demonstrate that the problem is equivalent to computing a common subsequence of the two input strings with a minimum number of homo-deletions and then present a dynamic programming solution for the reformulated problem.2012 ACM Subject ClassificationApplied computing → Bioinformatics; Applied computing → Molecular sequence analysis; Theory of computation → Dynamic programming

Download Full-text

Automata Technique for The LCS Problem

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/35/1/13293 ◽

2019 ◽

Vol 35 (1) ◽

pp. 21-37

Author(s):

Trường Huy Nguyễn

Keyword(s):

Dynamic Programming ◽

Parallel Algorithms ◽

Time Complexity ◽

Dynamic Programming Algorithm ◽

Longest Common Subsequence ◽

Programming Algorithm ◽

Worst Case ◽

Common Subsequence ◽

Input Strings ◽

Upper Estimate

In this paper, we introduce two eﬃcient algorithms in practice for computing the length of a longest common subsequence of two strings, using automata technique, in sequential and parallel ways. For two input strings of lengths m and n with m ≤ n, the parallel algorithm uses k processors (k ≤ m) and costs time complexity O(n) in the worst case, where k is an upper estimate of the length of a longest common subsequence of the two strings. These results are based on the Knapsack Shaking approach proposed by P. T. Huy et al. in 2002. Experimental results show that for the alphabet of size 256, our sequential and parallel algorithms are about 65.85 and 3.41m times faster than the standard dynamic programming algorithm proposed by Wagner and Fisher in 1974, respectively.

Download Full-text

Edit distance computation with minimum number of edit operations in database management system and information retrieval

International Journal of Scientific and Research Publications (IJSRP) ◽

10.29322/ijsrp.9.09.2019.p9385 ◽

2019 ◽

Vol 9 (9) ◽

pp. p9385

Author(s):

Yi Mar Myint

Keyword(s):

Information Retrieval ◽

Management System ◽

Database Management ◽

Edit Distance ◽

Database Management System ◽

Distance Computation ◽

Minimum Number

Download Full-text

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

Speeding-Up the Dynamic Programming Procedure for the Edit Distance of Two Strings

Communications in Computer and Information Science - Database and Expert Systems Applications ◽

10.1007/978-3-030-27684-3_9 ◽

2019 ◽

pp. 59-66

Author(s):

Giuseppe Lancia ◽

Marcello Dalpasso

Keyword(s):

Dynamic Programming ◽

Edit Distance

Download Full-text

Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening

Current Topics in Medicinal Chemistry ◽

10.2174/1568026620666200603122000 ◽

2020 ◽

Vol 20 (18) ◽

pp. 1582-1592 ◽

Cited By ~ 1

Author(s):

Carlos Garcia-Hernandez ◽

Alberto Fernández ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Edit Distance ◽

Graph Matching ◽

Optimization Techniques ◽

Graph Edit Distance ◽

Structure Activity ◽

Future Drug ◽

Minimum Number ◽

Type Node ◽

Activity Information

Background: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem. Objective: This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance. Methods: Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used. Results: In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. Conclusion: This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.

Download Full-text

Efficient Approximate Approach for Graph Edit Distance Problem

Pattern Recognition Letters ◽

10.1016/j.patrec.2021.08.027 ◽

2021 ◽

Author(s):

Adel Dabah ◽

Ibrahim Chegrane ◽

Saïd Yahiaoui

Keyword(s):

Edit Distance ◽

Graph Edit Distance ◽

Approximate Approach ◽

Distance Problem

Download Full-text

Approximate dynamic programming solution for the optimal nitrogen oxides/particulate matter trade-off control of a WAPS engine

Advances in Mechanical Engineering ◽

10.1177/1687814017751781 ◽

2018 ◽

Vol 10 (2) ◽

pp. 168781401775178

Author(s):

Zhijian Huang ◽

Huan Zheng ◽

Wentao Chen ◽

Qin Zhang ◽

Qili Wu ◽

...

Keyword(s):

Dynamic Programming ◽

Particulate Matter ◽

Nitrogen Oxides ◽

Approximate Dynamic Programming ◽

Trade Off ◽

Programming Solution ◽

Dynamic Programming Solution

Download Full-text

Invariance of edit-distance to tempo in rhythm similarity

Psychology of Music ◽

10.1177/0305735620971030 ◽

2020 ◽

pp. 030573562097103

Author(s):

Matthew Moritz ◽

Matthew Heard ◽

Hyun-Woong Kim ◽

Yune S Lee

Keyword(s):

Edit Distance ◽

Ecological Validity ◽

Perceptual Similarity ◽

Music Psychology ◽

Mantel Tests ◽

Distance Model ◽

Minimum Number ◽

History Of ◽

Similarity Ratings ◽

Similarity Judgments

Despite the long history of music psychology, rhythm similarity perception remains largely unexplored. Several studies suggest that edit-distance—the minimum number of notational changes required to transform one rhythm into another—predicts similarity judgments. However, the ecological validity of edit-distance remains elusive. We investigated whether the edit-distance model can predict perceptual similarity between rhythms that also differed in a fundamental characteristic of music—tempo. Eighteen participants rated the similarity between a series of rhythms presented in a pairwise fashion. The edit-distance of these rhythms varied from 1 to 4, and tempo was set at either 90 or 150 beats per minute (BPM). A test of congruence among distance matrices (CADM) indicated significant inter-participant reliability of ratings, and non-metric multidimensional scaling (nMDS) visualized that the ratings were clustered based upon both tempo and whether rhythms shared an identical onset pattern, a novel effect we termed rhythm primacy. Finally, Mantel tests revealed significant correlations of edit-distance with similarity ratings on both within- and between-tempo rhythms. Our findings corroborated that the edit-distance predicts rhythm similarity and demonstrated that the edit-distance accounts for similarity of rhythms that are markedly different in tempo. This suggests that rhythmic gestalt is invariant to differences in tempo.

Download Full-text

Dynamic programming solution for the optimal allocation of mine manpower to multiple work activities

Mining Science and Technology ◽

10.1016/s0167-9031(89)90924-9 ◽

1989 ◽

Vol 8 (1) ◽

pp. 65-71

Author(s):

R. Larry Grayson

Keyword(s):

Dynamic Programming ◽

Optimal Allocation ◽

Work Activities ◽

Programming Solution ◽

Dynamic Programming Solution

Download Full-text

Pipe resistance coefficients identification of water networks considering solvable conditions

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2019-0680 ◽

2020 ◽

Author(s):

Yongxin Liu ◽

Qingting Du ◽

Peng Luo ◽

Pinghua Zou ◽

Zhongyi He

Keyword(s):

Linearization Method ◽

Matrix Analysis ◽

Analysis Process ◽

Analysis Theory ◽

Hydraulic Conditions ◽

The Matrix ◽

Successive Linearization Method ◽

Minimum Number ◽

Coefficients Identification ◽

Resistance Coefficients

To make hydraulic models more accurate and realistic, this paper proposes a method to identify pipe resistance coefficients (PRCs) by using the measured heads at partial nodes. A successive linearization method is adopted to solve for the pipe flows, node heads, and PRCs. Based on the matrix analysis theory, the relationships among the number and location of measurement sites, number of hydraulic conditions (HCs), and solvable condition of PRC identification are established. The proposed method can identify all the PRCs when a solvable condition can be satisfied. In addition, the analysis process can be used as a tool to evaluate whether a given arrangement of measurement sites can meet the solvable condition of PRC identification, and to determine the minimum number of HCs. The performed case studies verified the feasibility of the proposed method, and the determined accuracy of PRC identification was noted to satisfy the actual engineering requirements.

Download Full-text