MINING OF HIGH-UTILITY ITEMSETS WITH NEGATIVE UTILITY

Tung N.T; Nguyen Le Van; Trinh Cong Nhut; Tran Van Sang

doi:10.26480/jtin.02.2021.44.47

MINING OF HIGH-UTILITY ITEMSETS WITH NEGATIVE UTILITY

JOURNAL OF TECHNOLOGY & INNOVATION ◽

10.26480/jtin.02.2021.44.47 ◽

2020 ◽

Vol 1 (2) ◽

pp. 44-47

Author(s):

Tung N.T ◽

Nguyen Le Van ◽

Trinh Cong Nhut ◽

Tran Van Sang

Keyword(s):

State Of The Art ◽

Upper Bounds ◽

Itemset Mining ◽

Novel Structure ◽

Transactional Databases ◽

Speed Up ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

The goal of the high-utility itemset mining task is to discover combinations of items that yield high profits from transactional databases. HUIM is a useful tool for retail stores to analyze customer behaviors. However, in the real world, items are found with both positive and negative utility values. To address this issue, we propose an algorithm named Modified Efficient High‐utility Itemsets mining with Negative utility (MEHIN) to find all HUIs with negative utility. This algorithm is an improved version of the EHIN algorithm. MEHIN utilizes 2 new upper bounds for pruning, named revised subtree and revised local utility. To reduce dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. An array‐based utility‐counting technique is also utilized to calculate upper‐bound efficiently. The MEHIN employs a novel structure called P-set to reduce the number of transaction scans and to speed up the mining process. Experimental results show that the proposed algorithms considerably outperform the state-of-the-art HUI-mining algorithms on negative utility in retail databases in terms of runtime.

Download Full-text

Incrementally updating the high average-utility patterns with pre-large concept

Applied Intelligence ◽

10.1007/s10489-020-01743-y ◽

2020 ◽

Vol 50 (11) ◽

pp. 3788-3807

Author(s):

Jerry Chun-Wei Lin ◽

Matin Pirouz ◽

Youcef Djenouri ◽

Chien-Fu Cheng ◽

Usman Ahmed

Keyword(s):

State Of The Art ◽

The State ◽

Batch Mode ◽

Itemset Mining ◽

The Past ◽

Dynamic Databases ◽

Speed Up ◽

Average Utility ◽

High Utility ◽

High Utility Patterns

Abstract High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.

Download Full-text

Efficient Algorithm for Mining High Utility Pattern Considering Length Constraints

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2019070101 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1-27

Author(s):

Kuldeep Singh ◽

Bhaskar Biswas

Keyword(s):

Data Mining ◽

Upper Bound ◽

Efficient Algorithm ◽

State Of The Art ◽

Memory Usage ◽

Important Data ◽

Benchmark Datasets ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.

Download Full-text

Mining High Utility Itemsets Based on Pattern Growth without Candidate Generation

Mathematics ◽

10.3390/math9010035 ◽

2020 ◽

Vol 9 (1) ◽

pp. 35

Author(s):

Yiwei Liu ◽

Le Wang ◽

Lin Feng ◽

Bo Jin

Keyword(s):

State Of The Art ◽

Research Topic ◽

Performance Gap ◽

Utility Values ◽

Original Dataset ◽

Pattern Growth ◽

Active Research ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

Mining high utility itemsets (HUIs) has been an active research topic in data mining in recent years. Existing HUI mining algorithms typically take two steps: generating candidates and identifying utility values of these candidate itemsets. The performance of these algorithms depends on the efficiency of both steps, both of which are usually time-consuming. In this study, we propose an efficient pattern-growth based HUI mining algorithm, called tail-node tree-based high-utility itemset (TNT-HUI) mining. This algorithm avoids the time-consuming candidate generation step, as well as the need of scanning the original dataset multiple times for exact utility values, as supported by a novel tree structure, named the tail-node tree (TN-Tree). The performance of TNT-HUI was evaluated in comparison with state-of-the-art benchmark methods on different datasets. Experimental results showed that TNT-HUI outperformed benchmark algorithms in both execution time and memory use by orders of magnitude. The performance gap is larger for denser datasets and lower thresholds.

Download Full-text

A Survey of incremental high-utility pattern mining based on storage structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202745 ◽

2021 ◽

pp. 1-26

Author(s):

Haodong Cheng ◽

Meng Han ◽

Ni Zhang ◽

Xiaojuan Li ◽

Le Wang

Keyword(s):

Pattern Mining ◽

Business Decisions ◽

Practical Applications ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets ◽

High Utility Patterns ◽

Mining Algorithms ◽

Purchase Quantity ◽

Storage Structures

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.

Download Full-text

HUIL-TN & HUI-TN: Mining high utility itemsets based on pattern-growth

PLoS ONE ◽

10.1371/journal.pone.0248349 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0248349

Author(s):

Le Wang ◽

Shui Wang

Keyword(s):

Data Mining ◽

State Of The Art ◽

Research Topic ◽

Running Time ◽

Original Dataset ◽

Pattern Growth ◽

Active Research ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

In recent years, high utility itemsets (HUIs) mining has been an active research topic in data mining. In this study, we propose two efficient pattern-growth based HUI mining algorithms, called High Utility Itemset based on Length and Tail-Node tree (HUIL-TN) and High Utility Itemset based on Tail-Node tree (HUI-TN). These two algorithms avoid the time-consuming candidate generation stage and the need of scanning the original dataset multiple times for exact utility values. A novel tree structure, named tail-node tree (TN-tree) is proposed as a key element of our algorithms to maintain complete utililty-information of existing itemsets of a dataset. The performance of HUIL-TN and HUI-TN was evaluated against state-of-the-art reference methods on various datasets. Experimental results showed that our algorithms exceed or close to the best performance on all datasets in terms of running time, while other algorithms can only excel in certain types of dataset. Scalability tests were also performed and our algorithms obtained the flattest curves among all competitors.

Download Full-text

Dynamic maintenance model for high average-utility pattern mining with deletion operation

Applied Intelligence ◽

10.1007/s10489-021-02539-4 ◽

2021 ◽

Author(s):

Jimmy Ming-Tai Wu ◽

Qian Teng ◽

Shahab Tayeb ◽

Jerry Chun-Wei Lin

Keyword(s):

Pattern Mining ◽

Computational Cost ◽

Practical Applications ◽

Itemset Mining ◽

Dynamic Databases ◽

Speed Up ◽

Dynamic Maintenance ◽

Average Utility ◽

High Utility ◽

Maintenance Model

AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.

Download Full-text

Mining of top-k high utility itemsets with negative utility

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201357 ◽

2020 ◽

pp. 1-16

Author(s):

Rui Sun ◽

Meng Han ◽

Chunyan Zhang ◽

Mingyao Shen ◽

Shiyu Du

Keyword(s):

Data Mining ◽

Search Space ◽

Experimental Results ◽

Effective Algorithm ◽

Memory Usage ◽

Utility Value ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text

A marketing solution for cross-selling by high utility itemset mining with dynamic transactional databases

2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT) ◽

10.1109/icctict.2016.7514609 ◽

2016 ◽

Author(s):

Prajakta R. Padhye ◽

R. J. Deshmukh

Keyword(s):

Itemset Mining ◽

Transactional Databases ◽

High Utility

Download Full-text

A Systematic Survey on High Utility Itemset Mining

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019300027 ◽

2019 ◽

Vol 18 (04) ◽

pp. 1113-1185 ◽

Cited By ~ 2

Author(s):

Bahareh Rahmati ◽

Mohammad Karim Sohrabi

Keyword(s):

Data Structures ◽

Search Space ◽

Frequent Itemset ◽

Itemset Mining ◽

Efficient Data ◽

Average Utility ◽

High Utility ◽

High Utility Itemsets ◽

Downward Closure ◽

Efficient Data Structures

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.

Download Full-text

Phương pháp song song khai phá tập lợi ích cao dựa trên chỉ số hình chiếu

Research and Development on Information and Communication Technology ◽

10.32913/rd-ict.vol1.no37.349 ◽

2017 ◽

pp. 31

Author(s):

Đậu Hải Phong

Keyword(s):

Data Mining ◽

Parallel Algorithms ◽

Experimental Results ◽

Sequential Algorithms ◽

Parallel Method ◽

Speed Up ◽

High Utility ◽

High Utility Itemsets ◽

Better Than

High utility itemsets (HUIs) mining is one of popular problems in data mining. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. All the parallel algorithms to try reduce synchronization cost and caculation global profit of itemsets. In this paper, we present a parallel method for mining HUIs from projection-based indexing to speed up performance and reduce memory requirements. The experimental results show that the performance and number candidate of our algorithm is better than some non parallel algorithms.

Download Full-text