Efficient weighted probabilistic frequent itemset mining in uncertain databases

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.

Download Full-text

GPU acceleration of probabilistic frequent itemset mining from uncertain databases

Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12 ◽

10.1145/2396761.2396874 ◽

2012 ◽

Cited By ~ 4

Author(s):

Yusuke Kozawa ◽

Toshiyuki Amagasa ◽

Hiroyuki Kitagawa

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Gpu Acceleration ◽

Uncertain Databases ◽

Itemset Mining

Download Full-text

Probabilistic frequent itemset mining in uncertain databases

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09 ◽

10.1145/1557019.1557039 ◽

2009 ◽

Cited By ~ 141

Author(s):

Thomas Bernecker ◽

Hans-Peter Kriegel ◽

Matthias Renz ◽

Florian Verhein ◽

Andreas Zuefle

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Itemset Mining

Download Full-text

Probabilistic maximal frequent itemset mining methods over uncertain databases

Intelligent Data Analysis ◽

10.3233/ida-184255 ◽

2019 ◽

Vol 23 (6) ◽

pp. 1219-1241

Author(s):

Haifeng Li ◽

Mo Hai ◽

Ning Zhang ◽

Jianming Zhu ◽

Yue Wang ◽

...

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Itemset Mining ◽

Mining Methods

Download Full-text

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Database Systems for Advanced Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-32025-0_10 ◽

2016 ◽

pp. 149-163 ◽

Cited By ~ 4

Author(s):

Haifeng Li ◽

Ning Zhang

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Itemset Mining

Download Full-text

Weighted frequent itemset mining over uncertain databases

Applied Intelligence ◽

10.1007/s10489-015-0703-9 ◽

2015 ◽

Vol 44 (1) ◽

pp. 232-250 ◽

Cited By ~ 31

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Philippe Fournier-Viger ◽

Tzung-Pei Hong ◽

Vincent S. Tseng

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Itemset Mining

Download Full-text

Frequent Itemset Mining for a Combination of Certain and Uncertain Databases

Recent Developments and the New Direction in Soft-Computing Foundations and Applications - Studies in Fuzziness and Soft Computing ◽

10.1007/978-3-319-75408-6_3 ◽

2018 ◽

pp. 25-39 ◽

Cited By ~ 1

Author(s):

Samar Wazir ◽

Tanvir Ahmad ◽

M. M. Sufyan Beg

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Itemset Mining

Download Full-text

Frequent Itemset Mining A Metadata Based Approach for Knowledge Discovery

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i3.316320 ◽

2018 ◽

Vol 6 (3) ◽

pp. 316-320

Author(s):

Basavaraj A. Goudannavar ◽

◽

Prashant Bhat ◽

Keyword(s):

Knowledge Discovery ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Inverse Frequent Itemset Mining Based on FP-Tree

Journal of Software ◽

10.3724/sp.j.1001.2008.00338 ◽

2008 ◽

Vol 19 (2) ◽

pp. 338-350 ◽

Cited By ~ 2

Author(s):

Yu-Hong GUO

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

Efficient weighted probabilistic frequent itemset mining in uncertain databases

Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

GPU acceleration of probabilistic frequent itemset mining from uncertain databases

Probabilistic frequent itemset mining in uncertain databases

Probabilistic maximal frequent itemset mining methods over uncertain databases

Probabilistic Maximal Frequent Itemset Mining Over Uncertain Databases

Weighted frequent itemset mining over uncertain databases

Frequent Itemset Mining for a Combination of Certain and Uncertain Databases

Frequent Itemset Mining A Metadata Based Approach for Knowledge Discovery

Inverse Frequent Itemset Mining Based on FP-Tree

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

Export Citation Format