MLS-Join: An Efficient MapReduce-Based Algorithm for String Similarity Self-joins with Edit Distance Constraint

Author(s):  
Decai Sun ◽  
Xiaoxia Wang
2020 ◽  
Vol 34 (02) ◽  
pp. 1676-1683
Author(s):  
Felix Winter ◽  
Nysret Musliu ◽  
Peter Stuckey

The computation of string similarity measures has been thoroughly studied in the scientific literature and has applications in a wide variety of different areas. One of the most widely used measures is the so called string edit distance which captures the number of required edit operations to transform a string into another given string. Although polynomial time algorithms are known for calculating the edit distance between two strings, there also exist NP-hard problems from practical applications like scheduling or computational biology that constrain the minimum edit distance between arrays of decision variables. In this work, we propose a novel global constraint to formulate restrictions on the minimum edit distance for such problems. Furthermore, we describe a propagation algorithm and investigate an explanation strategy for an edit distance constraint propagator that can be incorporated into state of the art lazy clause generation solvers. Experimental results show that the proposed propagator is able to significantly improve the performance of existing exact methods regarding solution quality and computation speed for benchmark problems from the literature.


2016 ◽  
Vol 26 (2) ◽  
pp. 249-274 ◽  
Author(s):  
Minghe Yu ◽  
Jin Wang ◽  
Guoliang Li ◽  
Yong Zhang ◽  
Dong Deng ◽  
...  

2014 ◽  
Vol 9 (10) ◽  
Author(s):  
Peisen Yuan ◽  
Haoyun Wang ◽  
Jianghua Che ◽  
Shougang Ren ◽  
Huanliang Xu ◽  
...  

2016 ◽  
Vol 42 (1) ◽  
pp. 48-54
Author(s):  
Abbas Al-Bakry ◽  
Marwa Al-Rikaby

Levenshtein is a Minimum Edit Distance method; it is usually used in spell checking applications for generatingcandidates. The method computes the number of the required edit operations to transform one string to another and it canrecognize three types of edit operations: deletion, insertion, and substitution of one letter. Damerau modified the Levenshteinmethod to consider another type of edit operations, the transposition of two adjacent letters, in addition to theconsidered three types. However, the modification suffers from the time complexity which was added to the original quadratictime complexity of the original method. In this paper, we proposed a modification for the original Levenshtein toconsider the same four types using very small number of matching operations which resulted in a shorter execution timeand a similarity measure is also achieved to exploit the resulted distance from any Edit Distance method for finding the amountof similarity between two given strings.


2014 ◽  
Vol 26 (12) ◽  
pp. 2983-2996 ◽  
Author(s):  
Wei Lu ◽  
Xiaoyong Du ◽  
Marios Hadjieleftheriou ◽  
Beng Chin Ooi

Sign in / Sign up

Export Citation Format

Share Document