Parallel architecture for implementation of frequent itemset mining using FP-growth

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

Mathematics ◽

10.3390/math9040450 ◽

2021 ◽

Vol 9 (4) ◽

pp. 450

Author(s):

Gergely Honti ◽

János Abonyi

Keyword(s):

Climate Change ◽

Extraction Process ◽

Knowledge Extraction ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Underlying Structure ◽

Multilayer Network ◽

Interdisciplinary Science ◽

Academic Knowledge ◽

Itemset Mining

Triplestores or resource description framework (RDF) stores are purpose-built databases used to organise, store and share data with context. Knowledge extraction from a large amount of interconnected data requires effective tools and methods to address the complexity and the underlying structure of semantic information. We propose a method that generates an interpretable multilayered network from an RDF database. The method utilises frequent itemset mining (FIM) of the subjects, predicates and the objects of the RDF data, and automatically extracts informative subsets of the database for the analysis. The results are used to form layers in an analysable multidimensional network. The methodology enables a consistent, transparent, multi-aspect-oriented knowledge extraction from the linked dataset. To demonstrate the usability and effectiveness of the methodology, we analyse how the science of sustainability and climate change are structured using the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks of disciplines to reveal the significant interdisciplinary science communities in sustainability and climate change. The constructed multilayer network then enables an analysis of the significant disciplines and interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, we search for interdisciplinary science communities and then measure and rank their multidisciplinary effects. The analysis identifies discipline similarities, pinpointing the similarity between atmospheric science and meteorology as well as between geomorphology and oceanography. The results confirm that frequent itemset mining provides an informative sampled subsets of RDF databases which can be simultaneously analysed as layers of a multilayer network.

Download Full-text

MapReduce-based frequent itemset mining for analysis of electronic evidence

2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE) ◽

10.1109/sadfe.2013.6911549 ◽

2013 ◽

Author(s):

Xueqing Jiang ◽

Guozi Sun

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Electronic Evidence ◽

Itemset Mining

Download Full-text

Frequent Itemset Mining techniques — A technical review

2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave) ◽

10.1109/startup.2016.7583968 ◽

2016 ◽

Cited By ~ 4

Author(s):

Tushar M. Chaure ◽

Kavita R. Singh

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Technical Review

Download Full-text

Human resource recommendation algorithm based on improved frequent itemset mining

Future Generation Computer Systems ◽

10.1016/j.future.2021.08.017 ◽

2021 ◽

Author(s):

Liu Zhaoshan ◽

Ma Yiming ◽

Zheng Huihua ◽

Liu Dege ◽

Liu Jing

Keyword(s):

Human Resource ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Recommendation Algorithm ◽

Itemset Mining

Download Full-text

A new closed frequent itemset mining algorithm based on GPU and improved vertical structure

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.3904 ◽

2016 ◽

Vol 29 (6) ◽

pp. e3904 ◽

Cited By ~ 6

Author(s):

Yun Li ◽

Jie Xu ◽

Yun-Hao Yuan ◽

Ling Chen

Keyword(s):

Vertical Structure ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithm ◽

Closed Frequent Itemset

Download Full-text

Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases

Intelligent Information and Database Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54472-4_45 ◽

2017 ◽

pp. 476-486 ◽

Cited By ~ 2

Author(s):

Haifeng Li ◽

Yue Wang ◽

Ning Zhang ◽

Yuejin Zhang

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining

Download Full-text

Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622017500341 ◽

2017 ◽

Vol 16 (06) ◽

pp. 1549-1579 ◽

Cited By ~ 7

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Philippe Fournier-Viger ◽

Tzung-Pei Hong ◽

Han-Chieh Chao

Keyword(s):

Real Life ◽

Search Space ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Uncertain Databases ◽

Two Phase ◽

Itemset Mining ◽

Meaningful Relationships ◽

Weighted Probability ◽

Novel Structures

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.

Download Full-text

Parallel architecture for implementation of frequent itemset mining using FP-growth

Frequent Itemset Mining A Metadata Based Approach for Knowledge Discovery

Inverse Frequent Itemset Mining Based on FP-Tree

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

Frequent Itemset Mining and Multi-Layer Network-Based Analysis of RDF Databases

MapReduce-based frequent itemset mining for analysis of electronic evidence

Frequent Itemset Mining techniques — A technical review

Human resource recommendation algorithm based on improved frequent itemset mining

A new closed frequent itemset mining algorithm based on GPU and improved vertical structure

Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases

Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases

Export Citation Format