scholarly journals Optimal Prefix and Suffix Queries on Texts

2007 ◽  
Vol DMTCS Proceedings vol. AH,... (Proceedings) ◽  
Author(s):  
Maxime Crochemore ◽  
Costas S. Iliopoulos ◽  
M. Sohel Rahman

International audience In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied by Mäkinen and Navarro [Position-Restricted Substring Searching, LATIN 2006]. In the problem handled in this paper, we are interested in those occurrences of the pattern that lies in a suffix or in a prefix of the given text. We achieve optimal query time for our problem against a data structure which is an extension of the classic suffix tree data structure. The time and space complexity of the data structure is dominated by that of the suffix tree. Notably, the (best) algorithm by Mäkinen and Navarro, if applied to our problem, gives sub-optimal query time and the corresponding data structure also requires more time and space.

2012 ◽  
Vol 23 (02) ◽  
pp. 343-356 ◽  
Author(s):  
DOMENICO CANTONE ◽  
SIMONE FARO ◽  
EMANUELE GIAQUINTA

In this paper we propose an efficient approach to the compressed string matching problem on Huffman encoded texts, based on the BOYER-MOORE strategy. Once a candidate valid shift has been located, a subsequent verification phase checks whether the shift is codeword aligned by taking advantage of the skeleton tree data structure. Our approach leads to algorithms that exhibit a sublinear behavior on the average, as shown by extensive experimentation.


2014 ◽  
Vol 10 (1) ◽  
pp. 42-56 ◽  
Author(s):  
Zailani Abdullah ◽  
Tutut Herawan ◽  
A. Noraziah ◽  
Mustafa Mat Deris

Frequent Pattern Tree (FP-Tree) is a compact data structure of representing frequent itemsets. The construction of FP-Tree is very important prior to frequent patterns mining. However, there have been too limited efforts specifically focused on constructing FP-Tree data structure beyond from its original database. In typical FP-Tree construction, besides the prior knowledge on support threshold, it also requires two database scans; first to build and sort the frequent patterns and second to build its prefix paths. Thus, twice database scanning is a key and major limitation in completing the construction of FP-Tree. Therefore, this paper suggests scalable Trie Transformation Technique Algorithm (T3A) to convert our predefined tree data structure, Disorder Support Trie Itemset (DOSTrieIT) into FP-Tree. Experiment results through two UCI benchmark datasets show that the proposed T3A generates FP-Tree up to 3 magnitudes faster than that the benchmarked FP-Growth.


Sign in / Sign up

Export Citation Format

Share Document