scholarly journals A note on the longest common substring with k-mismatches problem

2015 ◽  
Vol 115 (6-8) ◽  
pp. 640-642 ◽  
Author(s):  
Szymon Grabowski
2015 ◽  
Vol 27 (2) ◽  
pp. 277-295 ◽  
Author(s):  
MAXIME CROCHEMORE ◽  
COSTAS S. ILIOPOULOS ◽  
ALESSIO LANGIU ◽  
FILIPPO MIGNOSI

Given a set $\mathcal{D}$ of q documents, the Longest Common Substring (LCS) problem asks, for any integer 2 ⩽ k ⩽ q, the longest substring that appears in k documents. LCS is a well-studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences alignments and analysis. This problem has been solved by Hui (2000International Journal of Computer Science and Engineering15 73–76) by using a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with the use of suffix trees.In this article, we present a simple method for solving the LCS problem by using suffix trees (STs) and classical union-find data structures. In turn, we show how this simple algorithm can be adapted in order to work with other space efficient data structures such as the enhanced suffix arrays (ESA) and the compressed suffix tree.


Algorithmica ◽  
2019 ◽  
Vol 81 (6) ◽  
pp. 2633-2652 ◽  
Author(s):  
Tomasz Kociumaka ◽  
Jakub Radoszewski ◽  
Tatiana Starikovskaya

2011 ◽  
Vol 12 (2) ◽  
pp. 115-123 ◽  
Author(s):  
Taha M. Mohamed ◽  
Hesham N. Elmahdy ◽  
Hoda M. Onsi

2011 ◽  
Vol 47 (1) ◽  
pp. 28-33 ◽  
Author(s):  
M. A. Babenko ◽  
T. A. Starikovskaya

2021 ◽  
Author(s):  
Ivan Kovačič ◽  
David Bajs ◽  
Milan Ojsteršek

This paper describes the methodology of data preparation and analysis of the text similarity required for plagiarism detection on the CORE data set. Firstly, we used the CrossREF API and Microsoft Academic Graph data set for metadata enrichment and elimination of duplicates of doc-uments from the CORE 2018 data set. In the second step, we used 4-gram sequences of words from every document and transformed them into SHA-256 hash values. Features retrieved using hashing algorithm are compared, and the result is a list of documents and the percentages of cov-erage between pairs of documents features. In the third step, called pairwise feature-based ex-haustive analysis, pairs of documents are checked using the longest common substring.


Author(s):  
Tomasz Kociumaka ◽  
Tatiana Starikovskaya ◽  
Hjalte Wedel Vildhøj

Sign in / Sign up

Export Citation Format

Share Document