Optimal Prefix and Suffix Queries on Texts

International audience In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied by Mäkinen and Navarro [Position-Restricted Substring Searching, LATIN 2006]. In the problem handled in this paper, we are interested in those occurrences of the pattern that lies in a suffix or in a prefix of the given text. We achieve optimal query time for our problem against a data structure which is an extension of the classic suffix tree data structure. The time and space complexity of the data structure is dominated by that of the suffix tree. Notably, the (best) algorithm by Mäkinen and Navarro, if applied to our problem, gives sub-optimal query time and the corresponding data structure also requires more time and space.

Download Full-text

A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents Clustering

International Journal of Database Management Systems ◽

10.5121/ijdms.2013.5602 ◽

2013 ◽

Vol 5 (6) ◽

pp. 17-33 ◽

Cited By ~ 5

Author(s):

Issam SAHMOUDI ◽

Hanane FROUD ◽

Abdelmonaime LACHKAR

Keyword(s):

Data Structure ◽

Extraction Method ◽

Suffix Tree ◽

Tree Data ◽

Tree Data Structure

Download Full-text

ADAPTING BOYER-MOORE-LIKE ALGORITHMS FOR SEARCHING HUFFMAN ENCODED TEXTS

International Journal of Foundations of Computer Science ◽

10.1142/s0129054112400163 ◽

2012 ◽

Vol 23 (02) ◽

pp. 343-356 ◽

Cited By ~ 2

Author(s):

DOMENICO CANTONE ◽

SIMONE FARO ◽

EMANUELE GIAQUINTA

Keyword(s):

Data Structure ◽

String Matching ◽

Matching Problem ◽

Efficient Approach ◽

Tree Data ◽

Tree Data Structure

In this paper we propose an efficient approach to the compressed string matching problem on Huffman encoded texts, based on the BOYER-MOORE strategy. Once a candidate valid shift has been located, a subsequent verification phase checks whether the shift is codeword aligned by taking advantage of the skeleton tree data structure. Our approach leads to algorithms that exhibit a sublinear behavior on the average, as shown by extensive experimentation.

Download Full-text

A Scalable Algorithm for Constructing Frequent Pattern Tree

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014010103 ◽

2014 ◽

Vol 10 (1) ◽

pp. 42-56 ◽

Cited By ~ 3

Author(s):

Zailani Abdullah ◽

Tutut Herawan ◽

A. Noraziah ◽

Mustafa Mat Deris

Keyword(s):

Data Structure ◽

Frequent Pattern ◽

Frequent Patterns ◽

Scalable Algorithm ◽

Tree Construction ◽

Frequent Pattern Tree ◽

Support Threshold ◽

Benchmark Datasets ◽

Tree Data ◽

Tree Data Structure

Frequent Pattern Tree (FP-Tree) is a compact data structure of representing frequent itemsets. The construction of FP-Tree is very important prior to frequent patterns mining. However, there have been too limited efforts specifically focused on constructing FP-Tree data structure beyond from its original database. In typical FP-Tree construction, besides the prior knowledge on support threshold, it also requires two database scans; first to build and sort the frequent patterns and second to build its prefix paths. Thus, twice database scanning is a key and major limitation in completing the construction of FP-Tree. Therefore, this paper suggests scalable Trie Transformation Technique Algorithm (T3A) to convert our predefined tree data structure, Disorder Support Trie Itemset (DOSTrieIT) into FP-Tree. Experiment results through two UCI benchmark datasets show that the proposed T3A generates FP-Tree up to 3 magnitudes faster than that the benchmarked FP-Growth.

Download Full-text