scholarly journals A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents Clustering

2013 ◽  
Vol 5 (6) ◽  
pp. 17-33 ◽  
Author(s):  
Issam SAHMOUDI ◽  
Hanane FROUD ◽  
Abdelmonaime LACHKAR
2007 ◽  
Vol DMTCS Proceedings vol. AH,... (Proceedings) ◽  
Author(s):  
Maxime Crochemore ◽  
Costas S. Iliopoulos ◽  
M. Sohel Rahman

International audience In this paper, we study a restricted version of the position restricted pattern matching problem introduced and studied by Mäkinen and Navarro [Position-Restricted Substring Searching, LATIN 2006]. In the problem handled in this paper, we are interested in those occurrences of the pattern that lies in a suffix or in a prefix of the given text. We achieve optimal query time for our problem against a data structure which is an extension of the classic suffix tree data structure. The time and space complexity of the data structure is dominated by that of the suffix tree. Notably, the (best) algorithm by Mäkinen and Navarro, if applied to our problem, gives sub-optimal query time and the corresponding data structure also requires more time and space.


2014 ◽  
Vol 10 (1) ◽  
pp. 42-56 ◽  
Author(s):  
Zailani Abdullah ◽  
Tutut Herawan ◽  
A. Noraziah ◽  
Mustafa Mat Deris

Frequent Pattern Tree (FP-Tree) is a compact data structure of representing frequent itemsets. The construction of FP-Tree is very important prior to frequent patterns mining. However, there have been too limited efforts specifically focused on constructing FP-Tree data structure beyond from its original database. In typical FP-Tree construction, besides the prior knowledge on support threshold, it also requires two database scans; first to build and sort the frequent patterns and second to build its prefix paths. Thus, twice database scanning is a key and major limitation in completing the construction of FP-Tree. Therefore, this paper suggests scalable Trie Transformation Technique Algorithm (T3A) to convert our predefined tree data structure, Disorder Support Trie Itemset (DOSTrieIT) into FP-Tree. Experiment results through two UCI benchmark datasets show that the proposed T3A generates FP-Tree up to 3 magnitudes faster than that the benchmarked FP-Growth.


Sign in / Sign up

Export Citation Format

Share Document