A Scalable Algorithm for Constructing Frequent Pattern Tree

2014 ◽  
Vol 10 (1) ◽  
pp. 42-56 ◽  
Author(s):  
Zailani Abdullah ◽  
Tutut Herawan ◽  
A. Noraziah ◽  
Mustafa Mat Deris

Frequent Pattern Tree (FP-Tree) is a compact data structure of representing frequent itemsets. The construction of FP-Tree is very important prior to frequent patterns mining. However, there have been too limited efforts specifically focused on constructing FP-Tree data structure beyond from its original database. In typical FP-Tree construction, besides the prior knowledge on support threshold, it also requires two database scans; first to build and sort the frequent patterns and second to build its prefix paths. Thus, twice database scanning is a key and major limitation in completing the construction of FP-Tree. Therefore, this paper suggests scalable Trie Transformation Technique Algorithm (T3A) to convert our predefined tree data structure, Disorder Support Trie Itemset (DOSTrieIT) into FP-Tree. Experiment results through two UCI benchmark datasets show that the proposed T3A generates FP-Tree up to 3 magnitudes faster than that the benchmarked FP-Growth.

2012 ◽  
Vol 3 (2) ◽  
pp. 279-283
Author(s):  
Rahul Sharma ◽  
Dr. Manish Manoria

The essential aspect of mining association rules is to mine the frequent patterns. Due to native difficulty it is impossible to mine complete frequent patterns from a dense database. FP-growth algorithm has been implemented using an Array-based structure, known as the FP-tree,which is for storing compressed frequency information. Numerous experimental results have demonstrated that the algorithm performs extremely well. But in FP-growth algorithm, two traversals of FP-tree are needed for constructing the new conditional FP-tree. In this paper we present a novel Array Based Without Scanning Frequent Pattern (ABWSFP) tree technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree based algorithms. The technique works especially well for large datasets. We then present a new algorithm which use the QFP-tree data structure in combination with the FP Tree- Experimental results show that the new algorithm outperform other algorithm in not only the speed of algorithms, but also their CPU consumption and their scalability.


2016 ◽  
Author(s):  
Chen Sun ◽  
Robert S. Harris ◽  
Rayan Chikhi ◽  
Paul Medvedev

AbstractThe ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39 - 85%, with a price of up to 3x memory consumption during queries. Notably, it can query a batch of 198,074 queries in under 8 hours (compared to around two days previously) and a whole set of k-mers from a sequencing experiment (about 27 mil k-mers) in under 11 minutes.


Sign in / Sign up

Export Citation Format

Share Document