Review on high utility itemset mining algorithms for big data

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.

Download Full-text

MINING OF HIGH-UTILITY ITEMSETS WITH NEGATIVE UTILITY

JOURNAL OF TECHNOLOGY & INNOVATION ◽

10.26480/jtin.02.2021.44.47 ◽

2020 ◽

Vol 1 (2) ◽

pp. 44-47

Author(s):

Tung N.T ◽

Nguyen Le Van ◽

Trinh Cong Nhut ◽

Tran Van Sang

Keyword(s):

State Of The Art ◽

Upper Bounds ◽

Itemset Mining ◽

Novel Structure ◽

Transactional Databases ◽

Speed Up ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

The goal of the high-utility itemset mining task is to discover combinations of items that yield high profits from transactional databases. HUIM is a useful tool for retail stores to analyze customer behaviors. However, in the real world, items are found with both positive and negative utility values. To address this issue, we propose an algorithm named Modified Efficient High‐utility Itemsets mining with Negative utility (MEHIN) to find all HUIs with negative utility. This algorithm is an improved version of the EHIN algorithm. MEHIN utilizes 2 new upper bounds for pruning, named revised subtree and revised local utility. To reduce dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. An array‐based utility‐counting technique is also utilized to calculate upper‐bound efficiently. The MEHIN employs a novel structure called P-set to reduce the number of transaction scans and to speed up the mining process. Experimental results show that the proposed algorithms considerably outperform the state-of-the-art HUI-mining algorithms on negative utility in retail databases in terms of runtime.

Download Full-text

An empirical evaluation of high utility itemset mining algorithms

Expert Systems with Applications ◽

10.1016/j.eswa.2018.02.008 ◽

2018 ◽

Vol 101 ◽

pp. 91-115 ◽

Cited By ~ 10

Author(s):

Chongsheng Zhang ◽

George Almpanidis ◽

Wanwan Wang ◽

Changchang Liu

Keyword(s):

Empirical Evaluation ◽

Itemset Mining ◽

High Utility ◽

Mining Algorithms

Download Full-text

Review on high utility itemset mining algorithms

2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup Conclave) ◽

10.1109/startup.2016.7583939 ◽

2016 ◽

Author(s):

V. Kavitha ◽

B. G. Geetha

Keyword(s):

Itemset Mining ◽

High Utility ◽

Mining Algorithms

Download Full-text

A Survey of incremental high-utility pattern mining based on storage structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202745 ◽

2021 ◽

pp. 1-26

Author(s):

Haodong Cheng ◽

Meng Han ◽

Ni Zhang ◽

Xiaojuan Li ◽

Le Wang

Keyword(s):

Pattern Mining ◽

Business Decisions ◽

Practical Applications ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets ◽

High Utility Patterns ◽

Mining Algorithms ◽

Purchase Quantity ◽

Storage Structures

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.

Download Full-text