GridWall: A Novel Condensed Representation of Frequent Itemsets

Author(s):  
Weidong Tian ◽  
Jianqiang Mei ◽  
Hongjuan Zhou ◽  
Zhongqiu Zhao

In data mining, major research topic is frequent itemset mining (FIM). Frequent Itemsets (FIs) usually generating a large amount of Itemsets from database it causing from high memory and long execution time usage. Frequent Closed Itemsets(FCI) and Frequent Maximal Itemsets(FMI) are a reduced lossless representation of frequent itemsets. The FCI allows to decreasing the memory usage and execution time while comparing to FMIs. The whole data of frequent Itemsets(FIs) may be derived from FCIs and FMIs with correct methods. While various study has presented several efficient approach for FCIs and FMIs mining. In sight of this, that we proposed an algorithm called DCFI-Mine for capably derive FIs from Closed FIs and RFMI algorithm derive FMIs to FIs. The advantages of DCFI-Mine algorithm has two features: First, efficiency, different existing algorithm that tends to develop an enormous quantity of Itemsets all through process, DCFI-Mine process the Itemsets straight without candidate generation. But in proposed RFMI multiple scan occurs due to search of item support so efficiency is less than proposed algorithm DCFI-Mine. Second, in terms of losslessness DCFI-Mine and RFMI can discover complete frequent itemset without lapse. Experimental result shows That DCFI-Mine is best deriving FIs in term of memory usage and executions time


2013 ◽  
Vol 33 (11) ◽  
pp. 3045-3048
Author(s):  
Hongmei WANG ◽  
Ming HU

2021 ◽  
Vol 16 (2) ◽  
pp. 1-30
Author(s):  
Guangtao Wang ◽  
Gao Cong ◽  
Ying Zhang ◽  
Zhen Hai ◽  
Jieping Ye

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.


Sign in / Sign up

Export Citation Format

Share Document