A note on the longest common substring with k-mismatches problem

Given a set $\mathcal{D}$ of q documents, the Longest Common Substring (LCS) problem asks, for any integer 2 ⩽ k ⩽ q, the longest substring that appears in k documents. LCS is a well-studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences alignments and analysis. This problem has been solved by Hui (2000International Journal of Computer Science and Engineering15 73–76) by using a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with the use of suffix trees.In this article, we present a simple method for solving the LCS problem by using suffix trees (STs) and classical union-find data structures. In turn, we show how this simple algorithm can be adapted in order to work with other space efficient data structures such as the enhanced suffix arrays (ESA) and the compressed suffix tree.

Download Full-text

Longest Common Substring with Approximately k Mismatches

Algorithmica ◽

10.1007/s00453-019-00548-x ◽

2019 ◽

Vol 81 (6) ◽

pp. 2633-2652 ◽

Cited By ~ 2

Author(s):

Tomasz Kociumaka ◽

Jakub Radoszewski ◽

Tatiana Starikovskaya

Keyword(s):

Longest Common Substring

Download Full-text

The extended longest common substring algorithm for spoken document retrieval

2015 9th International Conference on Application of Information and Communication Technologies (AICT) ◽

10.1109/icaict.2015.7338523 ◽

2015 ◽

Cited By ~ 2

Author(s):

Dmitriy Prozorov ◽

Alexandra Yashina

Keyword(s):

Document Retrieval ◽

Spoken Document Retrieval ◽

Longest Common Substring

Download Full-text

Efficient watermark detection by using the longest common substring technique

Egyptian Informatics Journal ◽

10.1016/j.eij.2011.05.001 ◽

2011 ◽

Vol 12 (2) ◽

pp. 115-123 ◽

Cited By ~ 1

Author(s):

Taha M. Mohamed ◽

Hesham N. Elmahdy ◽

Hoda M. Onsi

Keyword(s):

Longest Common Substring

Download Full-text

Computing the longest common substring with one mismatch

Problems of Information Transmission ◽

10.1134/s0032946011010030 ◽

2011 ◽

Vol 47 (1) ◽

pp. 28-33 ◽

Cited By ~ 6

Author(s):

M. A. Babenko ◽

T. A. Starikovskaya

Keyword(s):

Longest Common Substring

Download Full-text

Time-Space Trade-Offs for the Longest Common Substring Problem

Combinatorial Pattern Matching - Lecture Notes in Computer Science ◽

10.1007/978-3-642-38905-4_22 ◽

2013 ◽

pp. 223-234 ◽

Cited By ~ 4

Author(s):

Tatiana Starikovskaya ◽

Hjalte Wedel Vildhøj

Keyword(s):

Time Space ◽

Trade Offs ◽

Longest Common Substring

Download Full-text

Linear Time Algorithms for Generalizations of the Longest Common Substring Problem

Algorithmica ◽

10.1007/s00453-009-9369-1 ◽

2009 ◽

Vol 60 (4) ◽

pp. 806-818 ◽

Cited By ~ 16

Author(s):

Michael Arnold ◽

Enno Ohlebusch

Keyword(s):

Linear Time ◽

Longest Common Substring ◽

Linear Time Algorithms

Download Full-text

Methodology for the Assessment of the Text Similarity of Documents in the CORE Open Access Data Set of Scholarly Documents

10.18690/978-961-286-516-0.12 ◽

2021 ◽

Author(s):

Ivan Kovačič ◽

David Bajs ◽

Milan Ojsteršek

Keyword(s):

Second Step ◽

Plagiarism Detection ◽

Text Similarity ◽

Data Set ◽

The Core ◽

Metadata Enrichment ◽

Feature Based ◽

Longest Common Substring ◽

Hashing Algorithm ◽

Access Data

This paper describes the methodology of data preparation and analysis of the text similarity required for plagiarism detection on the CORE data set. Firstly, we used the CrossREF API and Microsoft Academic Graph data set for metadata enrichment and elimination of duplicates of doc-uments from the CORE 2018 data set. In the second step, we used 4-gram sequences of words from every document and transformed them into SHA-256 hash values. Features retrieved using hashing algorithm are compared, and the result is a list of documents and the percentages of cov-erage between pairs of documents features. In the third step, called pairwise feature-based ex-haustive analysis, pairs of documents are checked using the longest common substring.

Download Full-text

Sublinear Space Algorithms for the Longest Common Substring Problem

Algorithms - ESA 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-662-44777-2_50 ◽

2014 ◽

pp. 605-617 ◽

Cited By ~ 7

Author(s):

Tomasz Kociumaka ◽

Tatiana Starikovskaya ◽

Hjalte Wedel Vildhøj

Keyword(s):

Longest Common Substring

Download Full-text