A Scalable Vertical Model for Mining Association Rules
Association rule mining (ARM) is the data-mining process for finding all association rules in datasets matching user-defined measures of interest such as support and confidence. Usually, ARM proceeds by mining all frequent itemsets — a step known to be very computationally intensive — from which rules are then derived in a straight forward manner. In general, mining all frequent itemsets prunes the space by using the downward closure (or anti-monotonicity) property of support which states that no itemset can be frequent unless all of its subsets are frequent. A large number of papers have addressed the problem of ARM but not many of them have focused on scalability over very large datasets (i.e. when datasets contain a very large number of transactions). In this paper, we propose a new model for representing data and mining frequent itemsets that is based on the P-tree technology for compression and faster logical operations over vertically structured data and on set enumeration trees for fast itemset enumeration. Experimental results presented hereinafter show big improvements for our approach over large datasets when compared to other contemporary approaches in the literature.