Efficient Data Structures for Range Shortest Unique Substring Queries

Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].

Download Full-text

Efficient Data Structures for Range Selections Problem

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1387 ◽

2013 ◽

Vol 756-759 ◽

pp. 1387-1391

Author(s):

Xiao Dong Wang ◽

Jun Tian

Keyword(s):

Data Structure ◽

Linear Space ◽

Data Structures ◽

Computation Model ◽

Word Size ◽

Selection Problems ◽

Efficient Data ◽

Space Data ◽

Efficient Data Structures ◽

Range Selection

Building an efficient data structure for range selection problems is considered. While there are several theoretical solutions to the problem, only a few have been tried out, and there is little idea on how the others would perform. The computation model used in this paper is the RAM model with word-size . Our data structure is a practical linear space data structure that supports range selection queries in time with preprocessing time.

Download Full-text

Dynamic and Internal Longest Common Substring

Algorithmica ◽

10.1007/s00453-020-00744-0 ◽

2020 ◽

Vol 82 (12) ◽

pp. 3707-3743

Author(s):

Amihood Amir ◽

Panagiotis Charalampopoulos ◽

Solon P. Pissis ◽

Jakub Radoszewski

Keyword(s):

Data Structure ◽

Computer Science ◽

Data Structures ◽

Classical Problem ◽

Edit Operation ◽

Time And Space ◽

Dynamic Setting ◽

Efficient Data ◽

Longest Common Substring ◽

Efficient Data Structures

Abstract Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an $$\mathcal {O}(n)$$ O ( n ) -time solution. In the fully dynamic setting, edit operations are allowed in either of the two strings, and the problem is to find an LCS after each edit. We present the first solution to the fully dynamic LCS problem requiring sublinear time in n per edit operation. In particular, we show how to find an LCS after each edit operation in $$\tilde{\mathcal {O}}(n^{2/3})$$ O ~ ( n 2 / 3 ) time, after $$\tilde{\mathcal {O}}(n)$$ O ~ ( n ) -time and space preprocessing. This line of research has been recently initiated in a somewhat restricted dynamic variant by Amir et al. [SPIRE 2017]. More specifically, the authors presented an $$\tilde{\mathcal {O}}(n)$$ O ~ ( n ) -sized data structure that returns an LCS of the two strings after a single edit operation (that is reverted afterwards) in $$\tilde{\mathcal {O}}(1)$$ O ~ ( 1 ) time. At CPM 2018, three papers (Abedin et al., Funakoshi et al., and Urabe et al.) studied analogously restricted dynamic variants of problems on strings; specifically, computing the longest palindrome and the Lyndon factorization of a string after a single edit operation. We develop dynamic sublinear-time algorithms for both of these problems as well. We also consider internal LCS queries, that is, queries in which we are to return an LCS of a pair of substrings of S and T. We show that answering such queries is hard in general and propose efficient data structures for several restricted cases.

Download Full-text

DESIGN METHODOLOGY OF BOUNDARY DATA STRUCTURES

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195991000165 ◽

1991 ◽

Vol 01 (03) ◽

pp. 207-226 ◽

Cited By ~ 4

Author(s):

SESHAGIRI RAO ALA

Keyword(s):

Data Structure ◽

Data Structures ◽

Boundary Data ◽

Design Methodology ◽

Published Data ◽

Optimal Boundary ◽

Efficient Data ◽

Efficient Data Structures ◽

Special Case ◽

Better Than

In this paper we propose a universal data structure (UDS), termed as UDS, which will aid in the design of optimal boundary data structures. We later show, with the aid of some recently published data structures, that any data structure can be expressed as a special case of UDS. We demonstrate how the application of the optimality concepts of the UDS can lead us to the discovery of more efficient data structures than popular data structures. We also discuss two approaches for optimization. We show that a globally optimal data structure is better than a special purpose optimal data structure.

Download Full-text

Indexing Textual Information

Database Technologies ◽

10.4018/978-1-60566-058-5.ch014 ◽

2009 ◽

pp. 196-204

Author(s):

Ioannis N. Kouris ◽

Christos Makris ◽

Evangelos Theodoridis ◽

Athanasios Tsakalidis

Keyword(s):

Information Retrieval ◽

Data Structures ◽

Textual Information ◽

Design And Implementation ◽

Efficient Data ◽

Data Structures And Algorithms ◽

Efficient Representation ◽

Algorithmic Techniques ◽

Efficient Data Structures ◽

Information Objects

Information retrieval is the computational discipline that deals with the efficient representation, organization, and access to information objects that represent natural language texts (Baeza-Yates, & Ribeiro-Neto, 1999; Salton & McGill, 1983; Witten, Moûat, & Bell, 1999). A crucial subproblem in the information retrieval area is the design and implementation of efficient data structures and algorithms for indexing and searching information objects that are vaguely described. In this article, we are going to present the latest developments in the indexing area by giving special emphasis to: data structures and algorithmic techniques for string manipulation, space efficient implementations, and compression techniques for efficient storage of information objects.

Download Full-text