Mining High Utility Itemsets Based on Pattern Growth without Candidate Generation

Yiwei Liu; Le Wang; Lin Feng; Bo Jin

doi:10.3390/math9010035

Mining High Utility Itemsets Based on Pattern Growth without Candidate Generation

Mathematics ◽

10.3390/math9010035 ◽

2020 ◽

Vol 9 (1) ◽

pp. 35

Author(s):

Yiwei Liu ◽

Le Wang ◽

Lin Feng ◽

Bo Jin

Keyword(s):

State Of The Art ◽

Research Topic ◽

Performance Gap ◽

Utility Values ◽

Original Dataset ◽

Pattern Growth ◽

Active Research ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

Mining high utility itemsets (HUIs) has been an active research topic in data mining in recent years. Existing HUI mining algorithms typically take two steps: generating candidates and identifying utility values of these candidate itemsets. The performance of these algorithms depends on the efficiency of both steps, both of which are usually time-consuming. In this study, we propose an efficient pattern-growth based HUI mining algorithm, called tail-node tree-based high-utility itemset (TNT-HUI) mining. This algorithm avoids the time-consuming candidate generation step, as well as the need of scanning the original dataset multiple times for exact utility values, as supported by a novel tree structure, named the tail-node tree (TN-Tree). The performance of TNT-HUI was evaluated in comparison with state-of-the-art benchmark methods on different datasets. Experimental results showed that TNT-HUI outperformed benchmark algorithms in both execution time and memory use by orders of magnitude. The performance gap is larger for denser datasets and lower thresholds.

Download Full-text

HUIL-TN & HUI-TN: Mining high utility itemsets based on pattern-growth

PLoS ONE ◽

10.1371/journal.pone.0248349 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0248349

Author(s):

Le Wang ◽

Shui Wang

Keyword(s):

Data Mining ◽

State Of The Art ◽

Research Topic ◽

Running Time ◽

Original Dataset ◽

Pattern Growth ◽

Active Research ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

In recent years, high utility itemsets (HUIs) mining has been an active research topic in data mining. In this study, we propose two efficient pattern-growth based HUI mining algorithms, called High Utility Itemset based on Length and Tail-Node tree (HUIL-TN) and High Utility Itemset based on Tail-Node tree (HUI-TN). These two algorithms avoid the time-consuming candidate generation stage and the need of scanning the original dataset multiple times for exact utility values. A novel tree structure, named tail-node tree (TN-tree) is proposed as a key element of our algorithms to maintain complete utililty-information of existing itemsets of a dataset. The performance of HUIL-TN and HUI-TN was evaluated against state-of-the-art reference methods on various datasets. Experimental results showed that our algorithms exceed or close to the best performance on all datasets in terms of running time, while other algorithms can only excel in certain types of dataset. Scalability tests were also performed and our algorithms obtained the flattest curves among all competitors.

Download Full-text

MINING OF HIGH-UTILITY ITEMSETS WITH NEGATIVE UTILITY

JOURNAL OF TECHNOLOGY & INNOVATION ◽

10.26480/jtin.02.2021.44.47 ◽

2020 ◽

Vol 1 (2) ◽

pp. 44-47

Author(s):

Tung N.T ◽

Nguyen Le Van ◽

Trinh Cong Nhut ◽

Tran Van Sang

Keyword(s):

State Of The Art ◽

Upper Bounds ◽

Itemset Mining ◽

Novel Structure ◽

Transactional Databases ◽

Speed Up ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

The goal of the high-utility itemset mining task is to discover combinations of items that yield high profits from transactional databases. HUIM is a useful tool for retail stores to analyze customer behaviors. However, in the real world, items are found with both positive and negative utility values. To address this issue, we propose an algorithm named Modified Efficient High‐utility Itemsets mining with Negative utility (MEHIN) to find all HUIs with negative utility. This algorithm is an improved version of the EHIN algorithm. MEHIN utilizes 2 new upper bounds for pruning, named revised subtree and revised local utility. To reduce dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. An array‐based utility‐counting technique is also utilized to calculate upper‐bound efficiently. The MEHIN employs a novel structure called P-set to reduce the number of transaction scans and to speed up the mining process. Experimental results show that the proposed algorithms considerably outperform the state-of-the-art HUI-mining algorithms on negative utility in retail databases in terms of runtime.

Download Full-text

Mining Approximate Frequent Itemsets Using Pattern Growth Approach

Information Technology And Control ◽

10.5755/j01.itc.50.4.29060 ◽

2021 ◽

Vol 50 (4) ◽

pp. 627-644

Author(s):

Shariq Bashir ◽

Daphne Teck Ching Lai

Keyword(s):

Processing Time ◽

Single Phase ◽

State Of The Art ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Second Phase ◽

Pattern Growth ◽

Database Size ◽

Mining Algorithms ◽

Growth Approach

Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.

Download Full-text

Efficient Algorithm for Mining High Utility Pattern Considering Length Constraints

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2019070101 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1-27

Author(s):

Kuldeep Singh ◽

Bhaskar Biswas

Keyword(s):

Data Mining ◽

Upper Bound ◽

Efficient Algorithm ◽

State Of The Art ◽

Memory Usage ◽

Important Data ◽

Benchmark Datasets ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.

Download Full-text

A Survey of incremental high-utility pattern mining based on storage structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202745 ◽

2021 ◽

pp. 1-26

Author(s):

Haodong Cheng ◽

Meng Han ◽

Ni Zhang ◽

Xiaojuan Li ◽

Le Wang

Keyword(s):

Pattern Mining ◽

Business Decisions ◽

Practical Applications ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets ◽

High Utility Patterns ◽

Mining Algorithms ◽

Purchase Quantity ◽

Storage Structures

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.

Download Full-text

Improved Strategy for High-Utility Pattern Mining Algorithm

Mathematical Problems in Engineering ◽

10.1155/2020/1971805 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Le Wang ◽

Shui Wang ◽

Haiyan Li ◽

Chunliang Zhou

Keyword(s):

Pattern Mining ◽

State Of The Art ◽

Search Space ◽

Research Topics ◽

Main Research ◽

Mining Algorithm ◽

Temporal Efficiency ◽

High Utility ◽

High Utility Patterns ◽

Mining Algorithms

High-utility pattern mining is a research hotspot in the field of pattern mining, and one of its main research topics is how to improve the efficiency of the mining algorithm. Based on the study on the state-of-the-art high-utility pattern mining algorithms, this paper proposes an improved strategy that removes noncandidate items from the global header table and local header table as early as possible, thus reducing search space and improving efficiency of the algorithm. The proposed strategy is applied to the algorithm EFIM (EFficient high-utility Itemset Mining). Experimental verification was carried out on nine typical datasets (including two large datasets); results show that our strategy can effectively improve temporal efficiency for mining high-utility patterns.

Download Full-text

Mining Profitable and Concise Patterns in Large-Scale Internet of Things Environments

Wireless Communications and Mobile Computing ◽

10.1155/2021/6653816 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Jerry Chun-Wei Lin ◽

Youcef Djenouri ◽

Gautam Srivastava ◽

Philippe Fournier-Viger

Keyword(s):

Large Scale ◽

State Of The Art ◽

Frequency Factor ◽

Market Analysis ◽

Smart Devices ◽

Mapreduce Framework ◽

High Utilization ◽

Mapreduce Model ◽

High Utility ◽

High Utility Itemsets

In recent years, HUIM (or a.k.a. high-utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket-market analysis and its relevant applications. Since current basket-market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary to consider the mining of HUIs (or a.k.a. high-utility itemsets) in a large-scale database especially with IoT situations. First, a GA-based MapReduce model is presented in this work known as GMR-Miner for mining closed patterns with high utilization in large-scale databases. The k -means model is initially adopted to group transactions regarding their relevant correlation based on the frequency factor. A genetic algorithm (GA) is utilized in the developed MapReduce framework that can be used to explore the potential and possible candidates in a limited time. Also, the developed 3-tier MapReduce model can be easily deployed in Spark for the handlings of any database of large scale for knowledge discovery of closed patterns with high utilization. We created sets of extensive experimental environments for evaluating the results of the developed GMR-Miner compared to the well-known and state-of-the-art CLS-Miner. We present our in-depth results to show that the developed GMR-Miner outperforms CLS-Miner in many criteria, i.e., memory usage, scalability, and runtime.

Download Full-text

Mining High-Utility Itemsets of Generalized Quantity with Pattern-Growth Structures

Sensor Networks and Signal Processing - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-15-4917-5_33 ◽

2020 ◽

pp. 447-464

Author(s):

Ming-Yen Lin ◽

Tzer-Fu Tu ◽

Sue-Chen Hsueh

Keyword(s):

Pattern Growth ◽

High Utility ◽

High Utility Itemsets

Download Full-text

Depth Impurity Pruned Strategies for Extracting High Utility Itemsets

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.4.16747 ◽

2018 ◽

Vol 7 (3.4) ◽

pp. 52

Author(s):

K Santhi ◽

B Valarmathi ◽

T Chellatamilan

Keyword(s):

State Of The Art ◽

Database Mining ◽

Huge Amount ◽

Utility Mining ◽

Organizational Behaviour ◽

Mining Technique ◽

Transaction Database ◽

High Utility ◽

High Utility Itemsets ◽

Time And Space Complexity

Normally in a transaction database mining high utility itemsets indicates to the location of itemsets which is causing high utility like benefits. In spite of the fact that various important calculations have been proposed as of late, they bring about the issue of generating a huge amount of itemsets for mining to discover HUI. Mining is reduced by such an extended quantity as far as execution time and space complexity. When the database contains large amount of transactions, this condition may turn into mediocre. In this research paper, we account this concern by offering a state-of-the-art calculation named Depth Impurity Quality Index Pruned strategies which considers the complexity of sub-trees to more efficiently identify high-utility itemsets. It is an collection of common itemset which are used for mining and is significantly harder, inflexible. This is imputable to the absence of intrinsic organizational behaviour of HUI which could have worked. This paper suggests a high utility mining technique which make use of novel pruning approaches.The experimental outcomes disclose that the proposed method is exceptionally viable in killing unhopeful applicants in the database transactions.

Download Full-text

ETKDS: An efficient algorithm of Top-K high utility itemsets mining over data streams under sliding window model

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210610 ◽

2021 ◽

pp. 1-22

Author(s):

Haodong Cheng ◽

Meng Han ◽

Ni Zhang ◽

Le Wang ◽

Xiaojuan Li

Keyword(s):

Data Streams ◽

Efficient Algorithm ◽

Large Scale ◽

Search Space ◽

Real Time Processing ◽

Pruning Strategy ◽

Complex Construction ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

The researcher proposed the concept of Top-K high-utility itemsets mining over data streams. Users directly specify the number K of high-utility itemsets they wish to obtain for mining with no need to set a minimum utility threshold. There exist some problems in current Top-K high-utility itemsets mining algorithms over data streams including the complex construction process of the storage structure, the inefficiency of threshold raising strategies and utility pruning strategies, and large scale of the search space, etc., which still can not meet the requirement of real-time processing over data streams with limited time and memory constraints. To solve this problem, this paper proposes an efficient algorithm based on dataset projection for mining Top-K high-utility itemsets from a data stream. A data structure CIUDataListSW is also proposed, which stores the position of the item in the transaction to effectively obtain the initial projected dataset of the item. In order to improve the projection efficiency, this paper innovates a new reorganization technology for projected transactions in common batches to maintain the sort order of transactions in the process of dataset projection. Dual pruning strategy and transaction merging mechanism are also used to further reduce search space and dataset scanning costs. In addition, based on the proposed CUDH S W structure, an efficient threshold raising strategy CUD is used, and a new threshold raising strategy CUDCB is designed to further shorten the mining time. Experimental results show that the algorithm has great advantages in running time and memory consumption, and it is especially suitable for the mining of high-utility itemsets of dense datasets.

Download Full-text