Efficient Algorithm for Mining High Utility Pattern Considering Length Constraints

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.

Download Full-text

Mining of top-k high utility itemsets with negative utility

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201357 ◽

2020 ◽

pp. 1-16

Author(s):

Rui Sun ◽

Meng Han ◽

Chunyan Zhang ◽

Mingyao Shen ◽

Shiyu Du

Keyword(s):

Data Mining ◽

Search Space ◽

Experimental Results ◽

Effective Algorithm ◽

Memory Usage ◽

Utility Value ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text

MINING OF HIGH-UTILITY ITEMSETS WITH NEGATIVE UTILITY

JOURNAL OF TECHNOLOGY & INNOVATION ◽

10.26480/jtin.02.2021.44.47 ◽

2020 ◽

Vol 1 (2) ◽

pp. 44-47

Author(s):

Tung N.T ◽

Nguyen Le Van ◽

Trinh Cong Nhut ◽

Tran Van Sang

Keyword(s):

State Of The Art ◽

Upper Bounds ◽

Itemset Mining ◽

Novel Structure ◽

Transactional Databases ◽

Speed Up ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

The goal of the high-utility itemset mining task is to discover combinations of items that yield high profits from transactional databases. HUIM is a useful tool for retail stores to analyze customer behaviors. However, in the real world, items are found with both positive and negative utility values. To address this issue, we propose an algorithm named Modified Efficient High‐utility Itemsets mining with Negative utility (MEHIN) to find all HUIs with negative utility. This algorithm is an improved version of the EHIN algorithm. MEHIN utilizes 2 new upper bounds for pruning, named revised subtree and revised local utility. To reduce dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. An array‐based utility‐counting technique is also utilized to calculate upper‐bound efficiently. The MEHIN employs a novel structure called P-set to reduce the number of transaction scans and to speed up the mining process. Experimental results show that the proposed algorithms considerably outperform the state-of-the-art HUI-mining algorithms on negative utility in retail databases in terms of runtime.

Download Full-text

HUIL-TN & HUI-TN: Mining high utility itemsets based on pattern-growth

PLoS ONE ◽

10.1371/journal.pone.0248349 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0248349

Author(s):

Le Wang ◽

Shui Wang

Keyword(s):

Data Mining ◽

State Of The Art ◽

Research Topic ◽

Running Time ◽

Original Dataset ◽

Pattern Growth ◽

Active Research ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

In recent years, high utility itemsets (HUIs) mining has been an active research topic in data mining. In this study, we propose two efficient pattern-growth based HUI mining algorithms, called High Utility Itemset based on Length and Tail-Node tree (HUIL-TN) and High Utility Itemset based on Tail-Node tree (HUI-TN). These two algorithms avoid the time-consuming candidate generation stage and the need of scanning the original dataset multiple times for exact utility values. A novel tree structure, named tail-node tree (TN-tree) is proposed as a key element of our algorithms to maintain complete utililty-information of existing itemsets of a dataset. The performance of HUIL-TN and HUI-TN was evaluated against state-of-the-art reference methods on various datasets. Experimental results showed that our algorithms exceed or close to the best performance on all datasets in terms of running time, while other algorithms can only excel in certain types of dataset. Scalability tests were also performed and our algorithms obtained the flattest curves among all competitors.

Download Full-text

Mining correlated high-utility itemsets using various measures

Logic Journal of IGPL ◽

10.1093/jigpal/jzz068 ◽

2020 ◽

Vol 28 (1) ◽

pp. 19-32 ◽

Cited By ~ 2

Author(s):

Philippe Fournier-Viger ◽

Yimin Zhang ◽

Jerry Chun-Wei Lin ◽

Duy-Tai Dinh ◽

Hoai Bac Le

Keyword(s):

Efficient Algorithm ◽

Real Life ◽

Huge Amount ◽

Utility Measure ◽

Itemset Mining ◽

High Profit ◽

Benchmark Datasets ◽

High Utility ◽

High Utility Itemsets ◽

Highly Correlated

Abstract Discovering high-utility itemsets (HUIs) consists of finding sets of items that yield a high profit in customer transaction databases. An important limitation of traditional high-utility itemset mining (HUIM) is that only the utility measure is used for assessing the interestingness of patterns. This leads to finding several itemsets that have a high profit but contain items that are weakly correlated. To address this issue, this paper proposes to integrate the concept of correlation in HUIM to find profitable itemsets that are highly correlated, using the all-confidence and bond measures. An efficient algorithm named FCHM (fast correlated high-utility itemset miner) is proposed to efficiently discover correlated high-utility itemsets (CHIs). Two versions of the algorithm are proposed: FCHM$_{all\text{-}confidence}$ and FCHM$_{bond}$, which are based on the all-confidence and bond measures, respectively. An experimental evaluation was done using four real-life benchmark datasets from the HUIM literature: mushroom, retail, kosarak and foodmart. Results show that FCHM is efficient and can prune a huge amount of weakly CHIs.

Download Full-text

FHN: An efficient algorithm for mining high-utility itemsets with negative unit profits

Knowledge-Based Systems ◽

10.1016/j.knosys.2016.08.022 ◽

2016 ◽

Vol 111 ◽

pp. 283-298 ◽

Cited By ~ 27

Author(s):

Jerry Chun-Wei Lin ◽

Philippe Fournier-Viger ◽

Wensheng Gan

Keyword(s):

Efficient Algorithm ◽

High Utility ◽

High Utility Itemsets

Download Full-text

An efficient algorithm for hiding sensitive-high utility itemsets

Intelligent Data Analysis ◽

10.3233/ida-194697 ◽

2020 ◽

Vol 24 (4) ◽

pp. 831-845

Author(s):

Vy Huynh Trieu ◽

Hai Le Quoc ◽

Chau Truong Ngoc

Keyword(s):

Efficient Algorithm ◽

High Utility ◽

High Utility Itemsets

Download Full-text

EHNL: An efficient algorithm for mining high utility itemsets with negative utility value and length constraints

Information Sciences ◽

10.1016/j.ins.2019.01.056 ◽

2019 ◽

Vol 484 ◽

pp. 44-70 ◽

Cited By ~ 2

Author(s):

Kuldeep Singh ◽

Ajay Kumar ◽

Shashank Sheshar Singh ◽

Harish Kumar Shakya ◽

Bhaskar Biswas

Keyword(s):

Efficient Algorithm ◽

Utility Value ◽

High Utility ◽

High Utility Itemsets

Download Full-text

An efficient algorithm for mining temporal high utility itemsets from data streams

Journal of Systems and Software ◽

10.1016/j.jss.2007.07.026 ◽

2008 ◽

Vol 81 (7) ◽

pp. 1105-1117 ◽

Cited By ~ 52

Author(s):

Chun-Jung Chu ◽

Vincent S. Tseng ◽

Tyne Liang

Keyword(s):

Data Streams ◽

Efficient Algorithm ◽

High Utility ◽

High Utility Itemsets

Download Full-text

Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules

Sensors ◽

10.3390/s20041078 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1078 ◽

Cited By ~ 7

Author(s):

Thang Mai ◽

Loan T.T. Nguyen ◽

Bay Vo ◽

Unil Yun ◽

Tzung-Pei Hong

Keyword(s):

Association Rules ◽

Business Strategy ◽

Efficient Algorithm ◽

Business Managers ◽

Competitive Strategies ◽

Computing Systems ◽

Other Information ◽

High Utility ◽

High Utility Itemsets ◽

The Internet Of Things

In business, managers may use the association information among products to define promotion and competitive strategies. The mining of high-utility association rules (HARs) from high-utility itemsets enables users to select their own weights for rules, based either on the utility or confidence values. This approach also provides more information, which can help managers to make better decisions. Some efficient methods for mining HARs have been developed in recent years. However, in some decision-support systems, users only need to mine a smallest set of HARs for efficient use. Therefore, this paper proposes a method for the efficient mining of non-redundant high-utility association rules (NR-HARs). We first build a semi-lattice of mined high-utility itemsets, and then identify closed and generator itemsets within this. Following this, an efficient algorithm is developed for generating rules from the built lattice. This new approach was verified on different types of datasets to demonstrate that it has a faster runtime and does not require more memory than existing methods. The proposed algorithm can be integrated with a variety of applications and would combine well with external systems, such as the Internet of Things (IoT) and distributed computer systems. Many companies have been applying IoT and such computing systems into their business activities, monitoring data or decision-making. The data can be sent into the system continuously through the IoT or any other information system. Selecting an appropriate and fast approach helps management to visualize customer needs as well as make more timely decisions on business strategy.

Download Full-text

CHN: an efficient algorithm for mining closed high utility itemsets with negative utility

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2018.2882421 ◽

2018 ◽

pp. 1-1 ◽

Cited By ~ 2

Author(s):

Kuldeep Singh ◽

Shashank Sheshar Singh ◽

Ajay Kumar ◽

Harish Kumar Shakya ◽

Bhaskar Biswas

Keyword(s):

Efficient Algorithm ◽

High Utility ◽

High Utility Itemsets

Download Full-text