scholarly journals Serial Computations of Levenshtein Distances

Author(s):  
D.S. Hirschberg

In the previous chapters, we discussed problems involving an exact match of string patterns. We now turn to problems involving similar but not necessarily exact pattern matches. There are a number of similarity or distance measures, and many of them are special cases or generalizations of the Levenshtein metric. The problem of evaluating the measure of string similarity has numerous applications, including one arising in the study of the evolution of long molecules such as proteins. In this chapter, we focus on the problem of evaluating a longest common subsequence, which is expressively equivalent to the simple form of the Levenshtein distance. The Levenshtein distance is a metric that measures the similarity of two strings. In its simple form, the Levenshtein distance, D(x , y), between strings x and y is the minimum number of character insertions and/or deletions (indels) required to transform string x into string y. A commonly used generalization of the Levenshtein distance is the minimum cost of transforming x into y when the allowable operations are character insertion, deletion, and substitution, with costs δ(λ , σ), δ(σ, λ), and δ(σ1, σ2) , that are functions of the involved character(s). There are direct correspondences between the Levenshtein distance of two strings, the length of the shortest edit sequence from one string to the other, and the length of the longest common subsequence (LCS) of those strings. If D is the simple Levenshtein distance between two strings having lengths m and n, SES is the length of the shortest edit sequence between the strings, and L is the length of an LCS of the strings, then SES = D and L = (m + n — D)/2. We will focus on the problem of determining the length of an LCS and also on the related problem of recovering an LCS. Another related problem, which will be discussed in Chapter 6, is that of approximate string matching, in which it is desired to locate all positions within string y which begin an approximation to string x containing at most D errors (insertions or deletions).

2020 ◽  
Author(s):  
Omer Sabary ◽  
Alexander Yucovich ◽  
Guy Shapira ◽  
Eitan Yaakobi

AbstractIn the trace reconstruction problem a length-n string x yields a collection of noisy copies, called traces, y1, …, yt where each yi is independently obtained from x by passing through a deletion channel, which deletes every symbol with some fixed probability. The main goal under this paradigm is to determine the required minimum number of i.i.d traces in order to reconstruct x with high probability. The trace reconstruction problem can be extended to the model where each trace is a result of x passing through a deletion-insertion-substitution channel, which introduces also insertions and substitutions. Motivated by the storage channel of DNA, this work is focused on another variation of the trace reconstruction problem, which is referred by the DNA reconstruction problem. A DNA reconstruction algorithm is a mapping which receives t traces y1, …, yt as an input and produces , an estimation of x. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm’s estimation. For the deletion channel case, the problem is referred by the deletion DNA reconstruction problem and the goal is to minimize the Levenshtein distance .In this work, we present several new algorithms for these reconstruction problems. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original sequence. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data as well as on data from previous DNA experiments and are shown to outperform all previous algorithms.


Author(s):  
Misturah Adunni Alaran ◽  
AbdulAkeem Adesina Agboola ◽  
Adio Taofiki Akinwale ◽  
Olusegun Folorunso

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.


2006 ◽  
Vol 4 (13) ◽  
pp. 207-222 ◽  
Author(s):  
L Wei ◽  
E Keogh ◽  
X Xi ◽  
S.-H Lee

The matching of two-dimensional shapes is an important problem with many applications in anthropology. Examples of objects that anthropologists are interested in classifying, clustering and indexing based on shape include bone fragments, projectile points (arrowheads/spearpoints), petroglyphs and ceramics. Interest in matching such objects originates from the fundamental question for many biological anthropologists and archaeologists: how can we best quantify differences and similarities? This interest is fuelled in part by a movement that notes: ‘an increasing number of archaeologists are showing interest in employing Darwinian evolutionary theory to explain variation in the material record’. Aiding such research efforts with computers requires a shape similarity measure that is invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortions are relatively easy to handle, either in the representation of the data or in the similarity measure used. However, rotation invariance seems to be uniquely difficult. Current approaches typically try to achieve rotation invariance in the representation of the data, at the expense of poor discrimination ability, or in the distance measure, at the expense of efficiency. In this work, we show that we can take the slow but accurate approaches and dramatically speed them up. On real world problems, our technique can take current approaches and make them four orders of magnitude faster, without false dismissals. Moreover, our technique can be used with any of the dozens of existing shape representations and with all the most popular distance measures, including Euclidean distance, dynamic time warping and longest common subsequence. We show the applications of our work to several important problems in anthropology, including clustering and indexing of skulls, projectile points and petroglyphs.


2013 ◽  
Vol 2013 ◽  
pp. 1-9
Author(s):  
Wei Li ◽  
Shouzhen Zeng

We introduce a method based on distance measures for group decision making under uncertain linguistic environment. We develop some uncertain linguistic aggregation distance measures called the uncertain linguistic weighted distance (ULWD) measure, the uncertain linguistic ordered weighted distance (ULOWD) measure, and the uncertain linguistic hybrid weighted distance (ULHWD) measure. We study some of their characteristic, and we prove that the ULWD and the ULOWD are special cases of the ULHWD measure. Finally, we develop an application of the ULHWD measure in a group decision making problem concerning the evaluation of university faculty for tenure and promotion with uncertain linguistic information.


Sign in / Sign up

Export Citation Format

Share Document