scholarly journals The Homo-Edit Distance Problem

Author(s):  
Maren Brand ◽  
Nguyen Khoa Tran ◽  
Philipp Spohr ◽  
Sven Schrinner ◽  
Gunnar W. Klau

AbstractWe consider the homo-edit distance problem, which is the minimum number of homo-deletions or homo-insertions to convert one string into another. A homo-insertion is the insertion of a string of equal characters into another string, while a homo-deletion is the inverse operation. We show how to compute the homo-edit distance of two strings in polynomial time: We first demonstrate that the problem is equivalent to computing a common subsequence of the two input strings with a minimum number of homo-deletions and then present a dynamic programming solution for the reformulated problem.2012 ACM Subject ClassificationApplied computing → Bioinformatics; Applied computing → Molecular sequence analysis; Theory of computation → Dynamic programming

2019 ◽  
Vol 35 (1) ◽  
pp. 21-37
Author(s):  
Trường Huy Nguyễn

In this paper, we introduce two efficient algorithms in practice for computing the length of a longest common subsequence of two strings, using automata technique, in sequential and parallel ways. For two input strings of lengths m and n with m ≤ n, the parallel algorithm uses k processors (k ≤ m) and costs time complexity O(n) in the worst case, where k is an upper estimate of the length of a longest common subsequence of the two strings. These results are based on the Knapsack Shaking approach proposed by P. T. Huy et al. in 2002. Experimental results show that for the alphabet of size 256, our sequential and parallel algorithms are about 65.85 and 3.41m times faster than the standard dynamic programming algorithm proposed by Wagner and Fisher in 1974, respectively.


2021 ◽  
Vol 25 (2) ◽  
pp. 283-303
Author(s):  
Na Liu ◽  
Fei Xie ◽  
Xindong Wu

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.


2020 ◽  
Vol 20 (18) ◽  
pp. 1582-1592 ◽  
Author(s):  
Carlos Garcia-Hernandez ◽  
Alberto Fernández ◽  
Francesc Serratosa

Background: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem. Objective: This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance. Methods: Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used. Results: In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. Conclusion: This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.


2020 ◽  
pp. 030573562097103
Author(s):  
Matthew Moritz ◽  
Matthew Heard ◽  
Hyun-Woong Kim ◽  
Yune S Lee

Despite the long history of music psychology, rhythm similarity perception remains largely unexplored. Several studies suggest that edit-distance—the minimum number of notational changes required to transform one rhythm into another—predicts similarity judgments. However, the ecological validity of edit-distance remains elusive. We investigated whether the edit-distance model can predict perceptual similarity between rhythms that also differed in a fundamental characteristic of music—tempo. Eighteen participants rated the similarity between a series of rhythms presented in a pairwise fashion. The edit-distance of these rhythms varied from 1 to 4, and tempo was set at either 90 or 150 beats per minute (BPM). A test of congruence among distance matrices (CADM) indicated significant inter-participant reliability of ratings, and non-metric multidimensional scaling (nMDS) visualized that the ratings were clustered based upon both tempo and whether rhythms shared an identical onset pattern, a novel effect we termed rhythm primacy. Finally, Mantel tests revealed significant correlations of edit-distance with similarity ratings on both within- and between-tempo rhythms. Our findings corroborated that the edit-distance predicts rhythm similarity and demonstrated that the edit-distance accounts for similarity of rhythms that are markedly different in tempo. This suggests that rhythmic gestalt is invariant to differences in tempo.


Author(s):  
Yongxin Liu ◽  
Qingting Du ◽  
Peng Luo ◽  
Pinghua Zou ◽  
Zhongyi He

To make hydraulic models more accurate and realistic, this paper proposes a method to identify pipe resistance coefficients (PRCs) by using the measured heads at partial nodes. A successive linearization method is adopted to solve for the pipe flows, node heads, and PRCs. Based on the matrix analysis theory, the relationships among the number and location of measurement sites, number of hydraulic conditions (HCs), and solvable condition of PRC identification are established. The proposed method can identify all the PRCs when a solvable condition can be satisfied. In addition, the analysis process can be used as a tool to evaluate whether a given arrangement of measurement sites can meet the solvable condition of PRC identification, and to determine the minimum number of HCs. The performed case studies verified the feasibility of the proposed method, and the determined accuracy of PRC identification was noted to satisfy the actual engineering requirements.


Sign in / Sign up

Export Citation Format

Share Document