Indexing Textual Information

Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].

Download Full-text

Using a Library of Efficient Data Structures and Algorithms as a Neural Network Research Tool

Artificial Neural Networks ◽

10.1016/b978-0-444-89488-5.50095-6 ◽

1992 ◽

pp. 1273-1276 ◽

Cited By ~ 2

Author(s):

Bernd FRITZKE

Keyword(s):

Neural Network ◽

Data Structures ◽

Research Tool ◽

Network Research ◽

Efficient Data ◽

Data Structures And Algorithms ◽

Efficient Data Structures

Download Full-text

Indexing Textual Information

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch302 ◽

2011 ◽

pp. 1917-1922

Author(s):

Ioannis N. Kouris ◽

Christos Makris ◽

Evangelos Theodoridis ◽

Athanasios Tsakalidis

Keyword(s):

Information Retrieval ◽

Data Structures ◽

Dna Sequences ◽

Digital Libraries ◽

Web Mining ◽

Web Search ◽

Information Filtering ◽

Molecular Sequence ◽

Algorithmic Techniques ◽

Information Objects

Information retrieval is the computational discipline that deals with the efficient representation, organization, and access to information objects that represent natural language texts (Baeza-Yates, & Ribeiro-Neto, 1999; Salton & McGill, 1983; Witten, Moûat, & Bell, 1999). A crucial subproblem in the information retrieval area is the design and implementation of efficient data structures and algorithms for indexing and searching information objects that are vaguely described. In this article, we are going to present the latest developments in the indexing area by giving special emphasis to: data structures and algorithmic techniques for string manipulation, space efficient implementations, and compression techniques for efficient storage of information objects. The aforementioned problems appear in a series of applications as digital libraries, molecular sequence databases (DNA sequences, protein databases [Gusûeld, 1997)], implementation of Web search engines, web mining and information filtering.

Download Full-text