scholarly journals Efficient Data Structures for Range Shortest Unique Substring Queries

Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 276
Author(s):  
Paniz Abedin ◽  
Arnab Ganguly ◽  
Solon P. Pissis ◽  
Sharma V. Thankachan

Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].

2013 ◽  
Vol 756-759 ◽  
pp. 1387-1391
Author(s):  
Xiao Dong Wang ◽  
Jun Tian

Building an efficient data structure for range selection problems is considered. While there are several theoretical solutions to the problem, only a few have been tried out, and there is little idea on how the others would perform. The computation model used in this paper is the RAM model with word-size . Our data structure is a practical linear space data structure that supports range selection queries in time with preprocessing time.


Algorithmica ◽  
2020 ◽  
Vol 82 (12) ◽  
pp. 3707-3743
Author(s):  
Amihood Amir ◽  
Panagiotis Charalampopoulos ◽  
Solon P. Pissis ◽  
Jakub Radoszewski

Abstract Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an $$\mathcal {O}(n)$$ O ( n ) -time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in $$\tilde{\mathcal {O}}(n^{2/3})$$ O ~ ( n 2 / 3 ) time, after $$\tilde{\mathcal {O}}(n)$$ O ~ ( n ) -time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an $$\tilde{\mathcal {O}}(n)$$ O ~ ( n ) -sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in $$\tilde{\mathcal {O}}(1)$$ O ~ ( 1 ) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of S and T. We show that answering such queries is hard in general and propose efficient data structures for several restricted cases.


1991 ◽  
Vol 01 (03) ◽  
pp. 207-226 ◽  
Author(s):  
SESHAGIRI RAO ALA

In this paper we propose a universal data structure (UDS), termed as UDS, which will aid in the design of optimal boundary data structures. We later show, with the aid of some recently published data structures, that any data structure can be expressed as a special case of UDS. We demonstrate how the application of the optimality concepts of the UDS can lead us to the discovery of more efficient data structures than popular data structures. We also discuss two approaches for optimization. We show that a globally optimal data structure is better than a special purpose optimal data structure.


2009 ◽  
pp. 196-204
Author(s):  
Ioannis N. Kouris ◽  
Christos Makris ◽  
Evangelos Theodoridis ◽  
Athanasios Tsakalidis

Information retrieval is the computational discipline that deals with the efficient representation, organization, and access to information objects that represent natural language texts (Baeza-Yates, & Ribeiro-Neto, 1999; Salton & McGill, 1983; Witten, Moûat, & Bell, 1999). A crucial subproblem in the information retrieval area is the design and implementation of efficient data structures and algorithms for indexing and searching information objects that are vaguely described. In this article, we are going to present the latest developments in the indexing area by giving special emphasis to: data structures and algorithmic techniques for string manipulation, space efficient implementations, and compression techniques for efficient storage of information objects.


Sign in / Sign up

Export Citation Format

Share Document