Mining correlated high-utility itemsets using various measures

2020 ◽  
Vol 28 (1) ◽  
pp. 19-32 ◽  
Author(s):  
Philippe Fournier-Viger ◽  
Yimin Zhang ◽  
Jerry Chun-Wei Lin ◽  
Duy-Tai Dinh ◽  
Hoai Bac Le

Abstract Discovering high-utility itemsets (HUIs) consists of finding sets of items that yield a high profit in customer transaction databases. An important limitation of traditional high-utility itemset mining (HUIM) is that only the utility measure is used for assessing the interestingness of patterns. This leads to finding several itemsets that have a high profit but contain items that are weakly correlated. To address this issue, this paper proposes to integrate the concept of correlation in HUIM to find profitable itemsets that are highly correlated, using the all-confidence and bond measures. An efficient algorithm named FCHM (fast correlated high-utility itemset miner) is proposed to efficiently discover correlated high-utility itemsets (CHIs). Two versions of the algorithm are proposed: FCHM$_{all\text{-}confidence}$ and FCHM$_{bond}$, which are based on the all-confidence and bond measures, respectively. An experimental evaluation was done using four real-life benchmark datasets from the HUIM literature: mushroom, retail, kosarak and foodmart. Results show that FCHM is efficient and can prune a huge amount of weakly CHIs.

2019 ◽  
Vol 15 (3) ◽  
pp. 1-27
Author(s):  
Kuldeep Singh ◽  
Bhaskar Biswas

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.


2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.


2016 ◽  
Vol 111 ◽  
pp. 283-298 ◽  
Author(s):  
Jerry Chun-Wei Lin ◽  
Philippe Fournier-Viger ◽  
Wensheng Gan

2020 ◽  
Vol 24 (4) ◽  
pp. 831-845
Author(s):  
Vy Huynh Trieu ◽  
Hai Le Quoc ◽  
Chau Truong Ngoc

2019 ◽  
Vol 484 ◽  
pp. 44-70 ◽  
Author(s):  
Kuldeep Singh ◽  
Ajay Kumar ◽  
Shashank Sheshar Singh ◽  
Harish Kumar Shakya ◽  
Bhaskar Biswas

2016 ◽  
Vol 51 (2) ◽  
pp. 595-625 ◽  
Author(s):  
Souleymane Zida ◽  
Philippe Fournier-Viger ◽  
Jerry Chun-Wei Lin ◽  
Cheng-Wei Wu ◽  
Vincent S. Tseng

2008 ◽  
Vol 81 (7) ◽  
pp. 1105-1117 ◽  
Author(s):  
Chun-Jung Chu ◽  
Vincent S. Tseng ◽  
Tyne Liang

Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1078 ◽  
Author(s):  
Thang Mai ◽  
Loan T.T. Nguyen ◽  
Bay Vo ◽  
Unil Yun ◽  
Tzung-Pei Hong

In business, managers may use the association information among products to define promotion and competitive strategies. The mining of high-utility association rules (HARs) from high-utility itemsets enables users to select their own weights for rules, based either on the utility or confidence values. This approach also provides more information, which can help managers to make better decisions. Some efficient methods for mining HARs have been developed in recent years. However, in some decision-support systems, users only need to mine a smallest set of HARs for efficient use. Therefore, this paper proposes a method for the efficient mining of non-redundant high-utility association rules (NR-HARs). We first build a semi-lattice of mined high-utility itemsets, and then identify closed and generator itemsets within this. Following this, an efficient algorithm is developed for generating rules from the built lattice. This new approach was verified on different types of datasets to demonstrate that it has a faster runtime and does not require more memory than existing methods. The proposed algorithm can be integrated with a variety of applications and would combine well with external systems, such as the Internet of Things (IoT) and distributed computer systems. Many companies have been applying IoT and such computing systems into their business activities, monitoring data or decision-making. The data can be sent into the system continuously through the IoT or any other information system. Selecting an appropriate and fast approach helps management to visualize customer needs as well as make more timely decisions on business strategy.


2019 ◽  
Vol 18 (04) ◽  
pp. 1113-1185 ◽  
Author(s):  
Bahareh Rahmati ◽  
Mohammad Karim Sohrabi

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.


Author(s):  
Kuldeep Singh ◽  
Shashank Sheshar Singh ◽  
Ajay Kumar ◽  
Harish Kumar Shakya ◽  
Bhaskar Biswas

Sign in / Sign up

Export Citation Format

Share Document