Enhanced Levenshtein Edit Distance Method functioning as a String-to-String Similarity Measure

Levenshtein is a Minimum Edit Distance method; it is usually used in spell checking applications for generatingcandidates. The method computes the number of the required edit operations to transform one string to another and it canrecognize three types of edit operations: deletion, insertion, and substitution of one letter. Damerau modified the Levenshteinmethod to consider another type of edit operations, the transposition of two adjacent letters, in addition to theconsidered three types. However, the modification suffers from the time complexity which was added to the original quadratictime complexity of the original method. In this paper, we proposed a modification for the original Levenshtein toconsider the same four types using very small number of matching operations which resulted in a shorter execution timeand a similarity measure is also achieved to exploit the resulted distance from any Edit Distance method for finding the amountof similarity between two given strings.

Download Full-text

An Improved String Similarity Measure Based on Combining Information-Theoretic and Edit Distance Methods

Communications in Computer and Information Science - Knowledge Discovery, Knowledge Engineering and Knowledge Management ◽

10.1007/978-3-319-25840-9_15 ◽

2015 ◽

pp. 228-239 ◽

Cited By ~ 1

Author(s):

Thi Thuy Anh Nguyen ◽

Stefan Conrad

Keyword(s):

Similarity Measure ◽

Edit Distance ◽

String Similarity ◽

Information Theoretic ◽

Distance Methods ◽

Combining Information

Download Full-text

Suffix array for multi-pattern matching with variable length wildcards

Intelligent Data Analysis ◽

10.3233/ida-205087 ◽

2021 ◽

Vol 25 (2) ◽

pp. 283-303

Author(s):

Na Liu ◽

Fei Xie ◽

Xindong Wu

Keyword(s):

Dynamic Programming ◽

Data Structure ◽

Pattern Matching ◽

Edit Distance ◽

State Of The Art ◽

Suffix Array ◽

Variable Length ◽

Distance Method ◽

Efficient Data ◽

Comparison Algorithms

Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix array is an efficient data structure for exact string matching in existing studies, as well as for approximate pattern matching and multi-pattern matching. An algorithm called MMSA-S is for the short exact characters in a pattern by dynamic programming, while another algorithm called MMSA-L deals with the long exact characters by the edit distance method. Experimental results of Pizza & Chili corpus demonstrate that these two newly proposed algorithms, in most cases, are more time-efficient than the state-of-the-art comparison algorithms.

Download Full-text

A New Approach to Measuring the Similarity of Indoor Semantic Trajectories

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020090 ◽

2021 ◽

Vol 10 (2) ◽

pp. 90

Author(s):

Jin Zhu ◽

Dayu Cheng ◽

Weiwei Zhang ◽

Ci Song ◽

Jie Chen ◽

...

Keyword(s):

Similarity Measure ◽

Semantic Information ◽

Edit Distance ◽

Similarity Measures ◽

Indoor Positioning ◽

Synthetic Dataset ◽

Shopping Mall ◽

Indoor Space ◽

Trajectory Similarity ◽

Indoor Spaces

People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.

Download Full-text

MLS-Join: An Efficient MapReduce-Based Algorithm for String Similarity Self-joins with Edit Distance Constraint

Cloud Computing and Security - Lecture Notes in Computer Science ◽

10.1007/978-3-030-00006-6_60 ◽

2018 ◽

pp. 662-674

Author(s):

Decai Sun ◽

Xiaoxia Wang

Keyword(s):

Edit Distance ◽

Distance Constraint ◽

String Similarity

Download Full-text

A Novel Similarity Measure for Group Recommender Systems with Optimal Time Complexity

Communications in Computer and Information Science - Bias and Social Aspects in Search and Recommendation ◽

10.1007/978-3-030-52485-2_10 ◽

2020 ◽

pp. 95-109

Author(s):

Guilherme Ramos ◽

Carlos Caleiro

Keyword(s):

Recommender Systems ◽

Similarity Measure ◽

Time Complexity ◽

Optimal Time ◽

Group Recommender Systems ◽

Group Recommender

Download Full-text

An Edge-Based Approach for Virtual Network Embedding Based on the Graph Edit Distance

10.21203/rs.3.rs-1029589/v1 ◽

2021 ◽

Author(s):

Ze Xi Xu ◽

Lei Zhuang ◽

Meng Yang He ◽

Si Jin Yang ◽

Yu Song ◽

...

Keyword(s):

Edit Distance ◽

Virtual Network ◽

Cost Ratio ◽

Virtual Network Embedding ◽

Network Embedding ◽

Graph Edit Distance ◽

Network Resources ◽

Network Resource ◽

Distance Method ◽

Edge Based

Abstract Virtualization and resource isolation techniques have enabled the efficient sharing of networked resources. How to control network resource allocation accurately and flexibly has gradually become a research hotspot due to the growth in user demands. Therefore, this paper presents a new edge-based virtual network embedding approach to studying this problem that employs a graph edit distance method to accurately control resource usage. In particular, to manage network resources efficiently, we restrict the use conditions of network resources and restrict the structure based on common substructure isomorphism and an improved spider monkey optimization algorithm is employed to prune redundant information from the substrate network. Experimental results showed that the proposed method achieves better performance than existing algorithms in terms of resource management capacity, including energy savings and the revenue-cost ratio.

Download Full-text

A New LCS-Neutrosophic Similarity Measure for Text Information Retrieval

Neutrosophic Sets in Decision Analysis and Operations Research - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-7998-2555-5.ch012 ◽

2020 ◽

pp. 258-280

Author(s):

Misturah Adunni Alaran ◽

AbdulAkeem Adesina Agboola ◽

Adio Taofiki Akinwale ◽

Olusegun Folorunso

Keyword(s):

Information Retrieval ◽

Similarity Measure ◽

Information Search ◽

Longest Common Subsequence ◽

Data Set ◽

String Similarity ◽

True Match ◽

Neutrosophic Logic ◽

Common Subsequence ◽

Text Information

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.

Download Full-text