A Scalable Algorithm for Constructing Frequent Pattern Tree

Frequent Pattern Tree (FP-Tree) is a compact data structure of representing frequent itemsets. The construction of FP-Tree is very important prior to frequent patterns mining. However, there have been too limited efforts specifically focused on constructing FP-Tree data structure beyond from its original database. In typical FP-Tree construction, besides the prior knowledge on support threshold, it also requires two database scans; first to build and sort the frequent patterns and second to build its prefix paths. Thus, twice database scanning is a key and major limitation in completing the construction of FP-Tree. Therefore, this paper suggests scalable Trie Transformation Technique Algorithm (T3A) to convert our predefined tree data structure, Disorder Support Trie Itemset (DOSTrieIT) into FP-Tree. Experiment results through two UCI benchmark datasets show that the proposed T3A generates FP-Tree up to 3 magnitudes faster than that the benchmarked FP-Growth.

Download Full-text

Novel Approach for Frequent Pattern Algorithm for Maximizing Frequent Patterns in Effective Time

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v3i2b.2876 ◽

2012 ◽

Vol 3 (2) ◽

pp. 279-283

Author(s):

Rahul Sharma ◽

Dr. Manish Manoria

Keyword(s):

Data Structure ◽

Large Datasets ◽

Experimental Results ◽

Frequent Pattern ◽

Frequent Patterns ◽

Effective Time ◽

Novel Approach ◽

Tree Data ◽

Improved Performance ◽

Tree Data Structure

The essential aspect of mining association rules is to mine the frequent patterns. Due to native difficulty it is impossible to mine complete frequent patterns from a dense database. FP-growth algorithm has been implemented using an Array-based structure, known as the FP-tree,which is for storing compressed frequency information. Numerous experimental results have demonstrated that the algorithm performs extremely well. But in FP-growth algorithm, two traversals of FP-tree are needed for constructing the new conditional FP-tree. In this paper we present a novel Array Based Without Scanning Frequent Pattern (ABWSFP) tree technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree based algorithms. The technique works especially well for large datasets. We then present a new algorithm which use the QFP-tree data structure in combination with the FP Tree- Experimental results show that the new algorithm outperform other algorithm in not only the speed of algorithms, but also their CPU consumption and their scalability.

Download Full-text

The Benefits of Using Prefix Tree Data Structure in Multi-Level Frequent Pattern Mining

2007 2nd International Workshop on Soft Computing Applications ◽

10.1109/sofa.2007.4318326 ◽

2007 ◽

Author(s):

Mirela Pater ◽

Daniela E. Popescu

Keyword(s):

Data Structure ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Prefix Tree ◽

Tree Data ◽

Multi Level ◽

Tree Data Structure

Download Full-text

AllSome Sequence Bloom Trees

10.1101/090464 ◽

2016 ◽

Author(s):

Chen Sun ◽

Robert S. Harris ◽

Rayan Chikhi ◽

Paul Medvedev

Keyword(s):

Data Structure ◽

Rna Seq ◽

Memory Consumption ◽

Construction Time ◽

Sequence Read Archive ◽

Tree Construction ◽

Sequencing Experiment ◽

Tree Data ◽

Tree Data Structure ◽

Generation Sequencing

AbstractThe ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39 - 85%, with a price of up to 3x memory consumption during queries. Notably, it can query a batch of 198,074 queries in under 8 hours (compared to around two days previously) and a whole set of k-mers from a sequencing experiment (about 27 mil k-mers) in under 11 minutes.

Download Full-text