scholarly journals Efficiently Supporting Edit Distance Based String Similarity Search Using B $^+$-Trees

2014 ◽  
Vol 26 (12) ◽  
pp. 2983-2996 ◽  
Author(s):  
Wei Lu ◽  
Xiaoyong Du ◽  
Marios Hadjieleftheriou ◽  
Beng Chin Ooi
2016 ◽  
Vol 26 (2) ◽  
pp. 249-274 ◽  
Author(s):  
Minghe Yu ◽  
Jin Wang ◽  
Guoliang Li ◽  
Yong Zhang ◽  
Dong Deng ◽  
...  

Author(s):  
Snehal Bobhate

During this Project, we study string similarity search based on edit distance that is supported by many database management systems like Oracle and PostgreSQL. Given the edit distance, ed(s, t), between two strings, s and t, the string similarity search is to search out each string t in a string database D which is almost like a query string s such that ed(s, t) = t for a given threshold t. Within the literature, most existing work takes a filter-and-verify approach, where the filter step is introduced to reduce the high verification cost of 2 strings by utilizing an index engineered offline for D. The two up-to-date approaches are prefix filtering and native filtering. We have a tendency to propose 2 new hash- primarily based labeling techniques, named OX label and XX label, for string similarity search. We have a tendency to assign a hash-label, H s , to a string s, and prune the dissimilar strings by comparing 2 hash-labels, H s and H t , for two strings s and t within the filter step. The key idea is to take the dissimilar bit- patterns between 2 hash-labels.Our hash-based mostly approaches achieve high efficiency, and keep its index size and index construction time one order of magnitude smaller than the present approaches in our experiment at the same time.


2014 ◽  
Vol 9 (10) ◽  
Author(s):  
Peisen Yuan ◽  
Haoyun Wang ◽  
Jianghua Che ◽  
Shougang Ren ◽  
Huanliang Xu ◽  
...  

2016 ◽  
Vol 42 (1) ◽  
pp. 48-54
Author(s):  
Abbas Al-Bakry ◽  
Marwa Al-Rikaby

Levenshtein is a Minimum Edit Distance method; it is usually used in spell checking applications for generatingcandidates. The method computes the number of the required edit operations to transform one string to another and it canrecognize three types of edit operations: deletion, insertion, and substitution of one letter. Damerau modified the Levenshteinmethod to consider another type of edit operations, the transposition of two adjacent letters, in addition to theconsidered three types. However, the modification suffers from the time complexity which was added to the original quadratictime complexity of the original method. In this paper, we proposed a modification for the original Levenshtein toconsider the same four types using very small number of matching operations which resulted in a shorter execution timeand a similarity measure is also achieved to exploit the resulted distance from any Edit Distance method for finding the amountof similarity between two given strings.


Sign in / Sign up

Export Citation Format

Share Document