Serial Computations of Levenshtein Distances

AbstractIn the trace reconstruction problem a length-n string x yields a collection of noisy copies, called traces, y1, …, yt where each yi is independently obtained from x by passing through a deletion channel, which deletes every symbol with some fixed probability. The main goal under this paradigm is to determine the required minimum number of i.i.d traces in order to reconstruct x with high probability. The trace reconstruction problem can be extended to the model where each trace is a result of x passing through a deletion-insertion-substitution channel, which introduces also insertions and substitutions. Motivated by the storage channel of DNA, this work is focused on another variation of the trace reconstruction problem, which is referred by the DNA reconstruction problem. A DNA reconstruction algorithm is a mapping which receives t traces y1, …, yt as an input and produces , an estimation of x. The goal in the DNA reconstruction problem is to minimize the edit distance between the original string and the algorithm’s estimation. For the deletion channel case, the problem is referred by the deletion DNA reconstruction problem and the goal is to minimize the Levenshtein distance .In this work, we present several new algorithms for these reconstruction problems. Our algorithms look globally on the entire sequence of the traces and use dynamic programming algorithms, which are used for the shortest common supersequence and the longest common subsequence problems, in order to decode the original sequence. Our algorithms do not require any limitations on the input and the number of traces, and more than that, they perform well even for error probabilities as high as 0.27. The algorithms have been tested on simulated data as well as on data from previous DNA experiments and are shown to outperform all previous algorithms.

Download Full-text

A New LCS-Neutrosophic Similarity Measure for Text Information Retrieval

Neutrosophic Sets in Decision Analysis and Operations Research - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-7998-2555-5.ch012 ◽

2020 ◽

pp. 258-280

Author(s):

Misturah Adunni Alaran ◽

AbdulAkeem Adesina Agboola ◽

Adio Taofiki Akinwale ◽

Olusegun Folorunso

Keyword(s):

Information Retrieval ◽

Similarity Measure ◽

Information Search ◽

Longest Common Subsequence ◽

Data Set ◽

String Similarity ◽

True Match ◽

Neutrosophic Logic ◽

Common Subsequence ◽

Text Information

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.

Download Full-text

Supporting anthropological research with efficient rotation invariant shape similarity measurement

Journal of The Royal Society Interface ◽

10.1098/rsif.2006.0168 ◽

2006 ◽

Vol 4 (13) ◽

pp. 207-222 ◽

Cited By ~ 4

Author(s):

L Wei ◽

E Keogh ◽

X Xi ◽

S.-H Lee

Keyword(s):

Similarity Measure ◽

Distance Measure ◽

Rotation Invariance ◽

Longest Common Subsequence ◽

Distance Measures ◽

Shape Similarity ◽

Shape Representations ◽

Discrimination Ability ◽

Projectile Points ◽

Common Subsequence

The matching of two-dimensional shapes is an important problem with many applications in anthropology. Examples of objects that anthropologists are interested in classifying, clustering and indexing based on shape include bone fragments, projectile points (arrowheads/spearpoints), petroglyphs and ceramics. Interest in matching such objects originates from the fundamental question for many biological anthropologists and archaeologists: how can we best quantify differences and similarities? This interest is fuelled in part by a movement that notes: ‘an increasing number of archaeologists are showing interest in employing Darwinian evolutionary theory to explain variation in the material record’. Aiding such research efforts with computers requires a shape similarity measure that is invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most of these distortions are relatively easy to handle, either in the representation of the data or in the similarity measure used. However, rotation invariance seems to be uniquely difficult. Current approaches typically try to achieve rotation invariance in the representation of the data, at the expense of poor discrimination ability, or in the distance measure, at the expense of efficiency. In this work, we show that we can take the slow but accurate approaches and dramatically speed them up. On real world problems, our technique can take current approaches and make them four orders of magnitude faster, without false dismissals. Moreover, our technique can be used with any of the dozens of existing shape representations and with all the most popular distance measures, including Euclidean distance, dynamic time warping and longest common subsequence. We show the applications of our work to several important problems in anthropology, including clustering and indexing of skulls, projectile points and petroglyphs.

Download Full-text

XLCS: A New Bit-Parallel Longest Common Subsequence Algorithm on Xeon Phi Clusters

2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc/smartcity/dss.2019.00204 ◽

2019 ◽

Author(s):

Zekun Yin ◽

Hao Zhang ◽

Kai Xu ◽

Yuandong Chan ◽

Shaoliang Peng ◽

...

Keyword(s):

Longest Common Subsequence ◽

Xeon Phi ◽

Common Subsequence

Download Full-text

Longest common subsequence as private search

Proceedings of the 8th ACM workshop on Privacy in the electronic society - WPES '09 ◽

10.1145/1655188.1655200 ◽

2009 ◽

Cited By ~ 5

Author(s):

Mark Gondree ◽

Payman Mohassel

Keyword(s):

Longest Common Subsequence ◽

Common Subsequence ◽

Private Search

Download Full-text

Side Channel Leakage Alignment Based on Longest Common Subsequence

2020 IEEE 14th International Conference on Big Data Science and Engineering (BigDataSE) ◽

10.1109/bigdatase50710.2020.00025 ◽

2020 ◽

Author(s):

Anni Jia ◽

Wei Yang ◽

Gongxuan Zhang

Keyword(s):

Longest Common Subsequence ◽

Side Channel ◽

Common Subsequence

Download Full-text

Longest Common Subsequence based Multistage Collaborative Filtering for Recommender Systems

2020 21st International Arab Conference on Information Technology (ACIT) ◽

10.1109/acit50332.2020.9300068 ◽

2020 ◽

Author(s):

Dilip Singh Sisodia ◽

Inakollu NehaPriyanka ◽

Prodduturi Amulya

Keyword(s):

Collaborative Filtering ◽

Recommender Systems ◽

Longest Common Subsequence ◽

Common Subsequence

Download Full-text

Research on longest common subsequence fast algorithm

2011 International Conference on Consumer Electronics, Communications and Networks (CECNet) ◽

10.1109/cecnet.2011.5768323 ◽

2011 ◽

Cited By ~ 3

Author(s):

Jiamei Liu ◽

Suping Wu

Keyword(s):

Fast Algorithm ◽

Longest Common Subsequence ◽

Common Subsequence

Download Full-text

Uncertain Linguistic Aggregation Distance Measures and Their Application to Group Decision Making

Journal of Applied Mathematics ◽

10.1155/2013/563650 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9

Author(s):

Wei Li ◽

Shouzhen Zeng

Keyword(s):

Decision Making ◽

Group Decision Making ◽

Group Decision ◽

Distance Measures ◽

University Faculty ◽

Linguistic Information ◽

Weighted Distance ◽

Special Cases ◽

Decision Making Problem

We introduce a method based on distance measures for group decision making under uncertain linguistic environment. We develop some uncertain linguistic aggregation distance measures called the uncertain linguistic weighted distance (ULWD) measure, the uncertain linguistic ordered weighted distance (ULOWD) measure, and the uncertain linguistic hybrid weighted distance (ULHWD) measure. We study some of their characteristic, and we prove that the ULWD and the ULOWD are special cases of the ULHWD measure. Finally, we develop an application of the ULHWD measure in a group decision making problem concerning the evaluation of university faculty for tenure and promotion with uncertain linguistic information.

Download Full-text

Algorithms for computing variants of the longest common subsequence problem

Theoretical Computer Science ◽

10.1016/j.tcs.2008.01.009 ◽

2008 ◽

Vol 395 (2-3) ◽

pp. 255-267 ◽

Cited By ~ 15

Author(s):

Costas S. Iliopoulos ◽

M. Sohel Rahman

Keyword(s):

Longest Common Subsequence ◽

Longest Common Subsequence Problem ◽

Common Subsequence

Download Full-text