Suffix Tree Data Structures for Matrices

Pattern Matching Algorithms ◽

10.1093/oso/9780195113679.003.0013 ◽

1997 ◽

Author(s):

R. Giancarlo ◽

R. Grossi

Keyword(s):

Linear Space ◽

Suffix Tree ◽

Linear Time ◽

Suffix Trees ◽

Construction Time ◽

Matching Problems ◽

Tree Construction ◽

The Matrix ◽

Visual Databases ◽

Efficient Construction

We discuss the suffix tree generalization to matrices in this chapter. We extend the suffix tree notion (described in Chapter 3) from text strings to text matrices whose entries are taken from an ordered alphabet with the aim of solving pattern-matching problems. This suffix tree generalization can be efficiently used to implement low-level routines for Computer Vision, Data Compression, Geographic Information Systems and Visual Databases. We examine the submatrices in the form of the text’s contiguous parts that still have a matrix shape. Representing these text submatrices as “suitably formatted” strings stored in a compacted trie is the rationale behind suffix trees for matrices. The choice of the format inevitably influences suffix tree construction time and space complexity. We first deal with square matrices and show that many suffix tree families can be defined for the same input matrix according to the matrix’s string representations. We can store each suffix tree in linear space and give an efficient construction algorithm whose input is both the matrix and the string representation chosen. We then treat rectangular matrices and define their corresponding suffix trees by means of some general rules which we list formally. We show that there is a super-linear lower bound to the space required (in contrast with the linear space required by suffix trees for square matrices). We give a simple example of one of these suffix trees. The last part of the chapter illustrates some technical results regarding suffix trees for square matrices: we show how to achieve an expected linear-time suffix tree construction for a constant-size alphabet under some mild probabilistic assumptions about the input distribution. We begin by defining a wide class of string representations for square matrices. We let Σ denote an ordered alphabet of characters and introduce another alphabet of five special characters, called shapes. A shape is one of the special characters taken from set {IN,SW,NW,SE,NE}. Shape IN encodes the 1x1 matrix generated from the empty matrix by creating a square.

Download Full-text

THE VIRTUAL SUFFIX TREE

International Journal of Foundations of Computer Science ◽

10.1142/s0129054109007066 ◽

2009 ◽

Vol 20 (06) ◽

pp. 1109-1133 ◽

Cited By ~ 2

Author(s):

JIE LIN ◽

YUE JIANG ◽

DON ADJEROH

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Suffix Array ◽

Intermediate Step ◽

Suffix Trees ◽

String Length ◽

Space Requirement ◽

Suffix Arrays ◽

Tree Construction ◽

Efficient Data

We introduce the VST (virtual suffix tree), an efficient data structure for suffix trees and suffix arrays. Starting from the suffix array, we construct the suffix tree, from which we derive the virtual suffix tree. Later, we remove the intermediate step of suffix tree construction, and build the VST directly from the suffix array. The VST provides the same functionality as the suffix tree, including suffix links, but at a much smaller space requirement. It has the same linear time construction even for large alphabets, Σ, requires O(n) space to store (n is the string length), and allows searching for a pattern of length m to be performed in O(m log |Σ|) time, the same time needed for a suffix tree. Given the VST, we show an algorithm that computes all the suffix links in linear time, independent of Σ. The VST requires less space than other recently proposed data structures for suffix trees and suffix arrays, such as the enhanced suffix array [1], and the linearized suffix tree [17]. On average, the space requirement (including that for suffix arrays and suffix links) is 13.8n bytes for the regular VST, and 12.05n bytes in its compact form.

Download Full-text

OPTIMAL PARALLEL CONSTRUCTION OF MINIMAL SUFFIX AND FACTOR AUTOMATA

Parallel Processing Letters ◽

10.1142/s0129626496000054 ◽

1996 ◽

Vol 06 (01) ◽

pp. 35-44 ◽

Cited By ~ 5

Author(s):

DANY BRESLAUER ◽

RAMESH HARIHARAN

Keyword(s):

Parallel Algorithms ◽

Data Structures ◽

Suffix Tree ◽

Finite Automata ◽

Suffix Trees ◽

Deterministic Finite Automata ◽

Tree Construction ◽

Parallel Construction ◽

Construction Algorithms

This paper gives optimal parallel algorithms for the construction of the smallest deterministic finite automata recognizing all the suffixes and the factors of a string. The algorithms use recently discovered optimal parallel suffix tree construction algorithms together with data structures for the efficient manipulation of trees, exploiting the well known relation between suffix and factor automata and suffix trees.

Download Full-text

Reversed Lempel–Ziv Factorization with Suffix Trees

Algorithms ◽

10.3390/a14060161 ◽

2021 ◽

Vol 14 (6) ◽

pp. 161

Author(s):

Dominik Köppl

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Suffix Trees ◽

Tree Representations ◽

Linear Time Algorithms

We present linear-time algorithms computing the reversed Lempel–Ziv factorization [Kolpakov and Kucherov, TCS’09] within the space bounds of two different suffix tree representations. We can adapt these algorithms to compute the longest previous non-overlapping reverse factor table [Crochemore et al., JDA’12] within the same space but pay a multiplicative logarithmic time penalty.

Download Full-text

From Ukkonen to McCreight and Weiner: A Unifying View of Linear-Time Suffix Tree Construction

Algorithmica ◽

10.1007/pl00009177 ◽

1997 ◽

Vol 19 (3) ◽

pp. 331-353 ◽

Cited By ~ 74

Author(s):

R. Giegerich ◽

S. Kurtz

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Tree Construction

Download Full-text

Non-Overlapping LZ77 Factorization and LZ78 Substring Compression Queries with Suffix Trees

Algorithms ◽

10.3390/a14020044 ◽

2021 ◽

Vol 14 (2) ◽

pp. 44

Author(s):

Dominik Köppl

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Suffix Trees ◽

Small Space ◽

Tree Representation ◽

Limited Space ◽

Tree Representations

We present algorithms computing the non-overlapping Lempel–Ziv-77 factorization and the longest previous non-overlapping factor table within small space in linear or near-linear time with the help of modern suffix tree representations fitting into limited space. With similar techniques, we show how to answer substring compression queries for the Lempel–Ziv-78 factorization with a possible logarithmic multiplicative slowdown depending on the used suffix tree representation.

Download Full-text

DTA-SiST: de novo transcriptome assembly by using simplified suffix trees

BMC Bioinformatics ◽

10.1186/s12859-019-3272-9 ◽

2019 ◽

Vol 20 (S25) ◽

Author(s):

Jin Zhao ◽

Haodi Feng ◽

Daming Zhu ◽

Chi Zhang ◽

Ying Xu

Keyword(s):

Suffix Tree ◽

High Throughput Sequencing ◽

De Novo ◽

State Of The Art ◽

Linear Time ◽

Transcriptome Assembly ◽

De Novo Transcriptome Assembly ◽

Suffix Trees ◽

De Novo Transcriptome ◽

Hybrid Strategy

Abstract Background Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge. Results We develop a novel radical framework, called DTA-SiST, for de novo transcriptome assembly based on suffix trees. DTA-SiST first extends contigs by reads that have the longest overlaps with the contigs’ terminuses. These reads can be found in linear time of the lengths of the reads through a well-designed suffix tree structure. Then, DTA-SiST constructs splicing graphs based on contigs for each gene locus. Finally, DTA-SiST proposes two strategies to extract transcript-representing paths: a depth-first enumeration strategy and a hybrid strategy based on length and coverage. We implemented the above two strategies and compared them with the state-of-the-art de novo assemblers on both simulated and real datasets. Experimental results showed that the depth-first enumeration strategy performs always better with recall and also better with precision for smaller datasets while the hybrid strategy leads with precision for big datasets. Conclusions DTA-SiST performs more competitive than the other compared de novo assemblers especially with precision measure, due to the read-based contig extension strategy and the elegant transcripts extraction rules.

Download Full-text

Optimal Parallel Construction of Minimal Suffix and Factor Automata

BRICS Report Series ◽

10.7146/brics.v2i16.19884 ◽

1995 ◽

Vol 2 (16) ◽

Author(s):

Dany Breslauer ◽

Ramesh Hariharan

Keyword(s):

Parallel Algorithms ◽

Data Structures ◽

Suffix Tree ◽

Finite Automata ◽

Suffix Trees ◽

Deterministic Finite Automata ◽

Tree Construction ◽

Parallel Construction ◽

Construction Algorithms

Download Full-text

Space-Efficient Construction Algorithm for the Circular Suffix Tree

2013 Data Compression Conference ◽

10.1109/dcc.2013.76 ◽

2013 ◽

Author(s):

Wing Kai Hon ◽

Tsung Han Ku ◽

R. Shah ◽

S. V. Thankachan

Keyword(s):

Suffix Tree ◽

Construction Algorithm ◽

Efficient Construction

Download Full-text

The Suffix Tree of a Tree and Minimizing Sequential Transducers

BRICS Report Series ◽

10.7146/brics.v2i47.19948 ◽

1995 ◽

Vol 2 (47) ◽

Author(s):

Dany Breslauer

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Time Algorithm ◽

Linear Time Algorithm

This paper gives a linear-time algorithm for the construction of the<br />suffix tree of a tree. The suffix tree of a tree is used to obtain an efficient<br />algorithm for the minimization of sequential transducers.

Download Full-text

Suffix Tree Construction

Encyclopedia of Algorithms ◽

10.1007/978-3-642-27848-8_414-2 ◽

2014 ◽

pp. 1-6

Author(s):

Jens Stoye

Keyword(s):

Suffix Tree ◽

Tree Construction

Download Full-text