Mining of top-k high utility itemsets with negative utility

2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

2019 ◽  
Vol 18 (04) ◽  
pp. 1113-1185 ◽  
Author(s):  
Bahareh Rahmati ◽  
Mohammad Karim Sohrabi

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.


Author(s):  
Đậu Hải Phong

High utility itemsets (HUIs) mining is one of popular problems in data mining. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. All the parallel algorithms to try reduce synchronization cost and caculation global profit of itemsets. In this paper, we present a parallel method for mining HUIs from projection-based indexing to speed up performance and reduce memory requirements. The experimental results show that the performance and number candidate of our algorithm is better than some non parallel algorithms.


2019 ◽  
Vol 15 (3) ◽  
pp. 1-27
Author(s):  
Kuldeep Singh ◽  
Bhaskar Biswas

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.


2014 ◽  
Vol 10 (1) ◽  
pp. 1-15 ◽  
Author(s):  
Wei Song ◽  
Yu Liu ◽  
Jinhong Li

Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Although a number of relevant approaches have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. In this paper, the authors propose an efficient algorithm, namely BAHUI (Bitmap-based Algorithm for High Utility Itemsets), for mining high utility itemsets with bitmap database representation. In BAHUI, bitmap is used vertically and horizontally. On the one hand, BAHUI exploits a divide-and-conquer approach to visit itemset lattice by using bitmap vertically. On the other hand, BAHUI horizontally uses bitmap to calculate the real utilities of candidates. Using bitmap compression scheme, BAHUI reduces the memory usage and makes use of the efficient bitwise operation. Furthermore, BAHUI only records candidate high utility itemsets with maximal length, and inherits the pruning and searching strategies from maximal itemset mining problem. Extensive experimental results show that the BAHUI algorithm is both efficient and scalable.


Author(s):  
Nguyen Manh Hung ◽  
Dau Hai Phong

Mining high utility itemsets in transaction database is an important task in data mining and widely applied in many areas. Recently, many algorithms have been proposed, but most algorithms for identifying high utility itemsets need to generate candidate sets by overestimating their utility and then calculating their exact utility value. Therefore, the number of candidate itemsets is much larger than the actual number of high utility itemsets. In this paper, we introduce the Retail Transaction-Weighted Utility (RTWU) structure and propose two algorithms: EAHUIMiner algorithm and PEAHUI-Miner parallel algorithm. They have been experimented and compared to the two most efficient algorithms: EFIM and FHM. Results show that our algorithm is better with sparse datasets. DOI: 10.32913/rd-ict.vol3.no14.519


Author(s):  
Tiantian Xu ◽  
Jianliang Xu ◽  
Xiangjun Dong

High utility sequential patterns (HUSP) mining has recently received a lot of attention from researchers. Many algorithms have been proposed to mine HUSP and most of them only use a single minimum utility, which implicitly assumes that all items in the database are of the same importance (such as profit), or other information based on users’ concern in the database. This is often not the case in real-life applications. Although a few methods have been proposed to mine high utility itemsets (HUI) with multiple minimum utility (MMU), they are not suitable for mining HUSP with MMU because an item may occur more than one time in a sequence and may have multiple utility values. In this paper, we propose a novel method, called HUSpan-MMU, to efficiently mine HUSP with MMU from sequential utility-based databases. A lexicographic quantitative sequence tree (LQS-tree) is used to extract the complete set of HUSP. Meanwhile, two pruning methods are used to reduce the search space in the LQS-tree. Experimental results on both synthetic and real datasets show that HUSpan-MMU can efficiently mine HUSP with MMU from utility-based databases.


Webology ◽  
2021 ◽  
Vol 18 (1) ◽  
pp. 92-103
Author(s):  
Vandna Dahiya ◽  
Sandeep Dalal

Utility itemset mining, which finds the item sets based on utility factors, has established itself as an essential form of data mining. The utility is defined in terms of quantity and some interest factor. Various methods have been developed so far by the researchers to mine these itemsets but most of them are not scalable. In the present times, a scalable approach is required that can fulfill the budding needs of data mining. A Spark based novel technique has been recommended in this research paper for mining the data in a distributed way, called as Absolute High Utility Itemset Mining (AHUIM). The technique is suitable for small as well as large datasets. The performance of the technique is being measured for various parameters such as speed, scalability, and accuracy etc.


2019 ◽  
Vol 484 ◽  
pp. 44-70 ◽  
Author(s):  
Kuldeep Singh ◽  
Ajay Kumar ◽  
Shashank Sheshar Singh ◽  
Harish Kumar Shakya ◽  
Bhaskar Biswas

2004 ◽  
Vol 03 (02) ◽  
pp. 143-154
Author(s):  
Chin-Chen Chang ◽  
Chih-Yang Lin ◽  
Pei-Yu Lin

Parallel association rules mining is a noticeable problem in data mining. However, little work has been proposed to deal with three important issues: (1) less memory usage; (2) less communication, among the involved computers, over the network; and (3) load balance among computers. In this paper, we present a graph-based scheme to solve the parallel mining problem by applying independent groups (clusters of maximal cliques). To bring the three issues to a close, the purpose of the independent groups aims at dividing a database into several independent sub-databases, so each sub-database can be employed independently to perform mining algorithms. To emphasis the effectiveness of the graph-based scheme, we adopt the independent groups not only for maximal large itemsets mining but also for general large itemsets mining. The experimental results show that our scheme can improve the efficiency for parallel mining when the independent groups are well-organized and designed.


2019 ◽  
Vol 8 (4) ◽  
pp. 8083-8091

High Utility Item sets mining has attracted many researchers in recent years. But HUI mining methods involves a exponential mining space and returns a very large number of high-utility itemsets. . Temporal periodicity of itemset is considered recently as an important interesting criteria for mining high-utility itemsets in many applications. Periodic High Utility item sets mining methods has a limitation that it does not consider frequency and not suitable for large databases. To address this problem, we have proposed two efficient algorithms named FPHUI( mining periodic frequent HUIs), MFPHM(efficient mining periodic frequent HUIs) for mining periodic frequent high-utility itemsets. The first algorithm FPHUI miner generates all periodic frequent itemsets. Mining periodic frequent high-utility itemsets leads to more computational cost in very large databases. We further developed another algorithm called MFPHM to overcome this limitation. The performance of the frequent FPHUI miner is evaluated by conducting experiments on various real datasets. Experimental results show that proposed algorithms is efficient and effective.


Sign in / Sign up

Export Citation Format

Share Document