Mining of top-k high utility itemsets with negative utility

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text

A Systematic Survey on High Utility Itemset Mining

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019300027 ◽

2019 ◽

Vol 18 (04) ◽

pp. 1113-1185 ◽

Cited By ~ 2

Author(s):

Bahareh Rahmati ◽

Mohammad Karim Sohrabi

Keyword(s):

Data Structures ◽

Search Space ◽

Frequent Itemset ◽

Itemset Mining ◽

Efficient Data ◽

Average Utility ◽

High Utility ◽

High Utility Itemsets ◽

Downward Closure ◽

Efficient Data Structures

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.

Download Full-text

Phương pháp song song khai phá tập lợi ích cao dựa trên chỉ số hình chiếu

Research and Development on Information and Communication Technology ◽

10.32913/rd-ict.vol1.no37.349 ◽

2017 ◽

pp. 31

Author(s):

Đậu Hải Phong

Keyword(s):

Data Mining ◽

Parallel Algorithms ◽

Experimental Results ◽

Sequential Algorithms ◽

Parallel Method ◽

Speed Up ◽

High Utility ◽

High Utility Itemsets ◽

Better Than

High utility itemsets (HUIs) mining is one of popular problems in data mining. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. All the parallel algorithms to try reduce synchronization cost and caculation global profit of itemsets. In this paper, we present a parallel method for mining HUIs from projection-based indexing to speed up performance and reduce memory requirements. The experimental results show that the performance and number candidate of our algorithm is better than some non parallel algorithms.

Download Full-text

Efficient Algorithm for Mining High Utility Pattern Considering Length Constraints

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2019070101 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1-27

Author(s):

Kuldeep Singh ◽

Bhaskar Biswas

Keyword(s):

Data Mining ◽

Upper Bound ◽

Efficient Algorithm ◽

State Of The Art ◽

Memory Usage ◽

Important Data ◽

Benchmark Datasets ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.

Download Full-text

BAHUI

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2014010101 ◽

2014 ◽

Vol 10 (1) ◽

pp. 1-15 ◽

Cited By ~ 30

Author(s):

Wei Song ◽

Yu Liu ◽

Jinhong Li

Keyword(s):

The Other ◽

Divide And Conquer ◽

Memory Usage ◽

Important Research ◽

Compression Scheme ◽

Research Issues ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets ◽

The One

Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Although a number of relevant approaches have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. In this paper, the authors propose an efficient algorithm, namely BAHUI (Bitmap-based Algorithm for High Utility Itemsets), for mining high utility itemsets with bitmap database representation. In BAHUI, bitmap is used vertically and horizontally. On the one hand, BAHUI exploits a divide-and-conquer approach to visit itemset lattice by using bitmap vertically. On the other hand, BAHUI horizontally uses bitmap to calculate the real utilities of candidates. Using bitmap compression scheme, BAHUI reduces the memory usage and makes use of the efficient bitwise operation. Furthermore, BAHUI only records candidate high utility itemsets with maximal length, and inherits the pruning and searching strategies from maximal itemset mining problem. Extensive experimental results show that the BAHUI algorithm is both efficient and scalable.

Download Full-text

Parallel Mining for High Utility Itemsets Mining by Efficient Data Structure

Research and Development on Information and Communication Technology ◽

10.32913/rd-ict.vol3.no14.519 ◽

2017 ◽

Author(s):

Nguyen Manh Hung ◽

Dau Hai Phong

Keyword(s):

Data Mining ◽

Data Structure ◽

Actual Number ◽

Utility Value ◽

Weighted Utility ◽

Parallel Mining ◽

Efficient Data ◽

Transaction Database ◽

High Utility ◽

High Utility Itemsets

Mining high utility itemsets in transaction database is an important task in data mining and widely applied in many areas. Recently, many algorithms have been proposed, but most algorithms for identifying high utility itemsets need to generate candidate sets by overestimating their utility and then calculating their exact utility value. Therefore, the number of candidate itemsets is much larger than the actual number of high utility itemsets. In this paper, we introduce the Retail Transaction-Weighted Utility (RTWU) structure and propose two algorithms: EAHUIMiner algorithm and PEAHUI-Miner parallel algorithm. They have been experimented and compared to the two most efficient algorithms: EFIM and FHM. Results show that our algorithm is better with sparse datasets. DOI: 10.32913/rd-ict.vol3.no14.519

Download Full-text

Mining High Utility Sequential Patterns Using Multiple Minimum Utility

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418590176 ◽

2018 ◽

Vol 32 (10) ◽

pp. 1859017 ◽

Cited By ~ 1

Author(s):

Tiantian Xu ◽

Jianliang Xu ◽

Xiangjun Dong

Keyword(s):

Real Life ◽

Search Space ◽

Experimental Results ◽

Sequential Patterns ◽

Other Information ◽

Novel Method ◽

Complete Set ◽

High Utility ◽

High Utility Itemsets ◽

Pruning Methods

High utility sequential patterns (HUSP) mining has recently received a lot of attention from researchers. Many algorithms have been proposed to mine HUSP and most of them only use a single minimum utility, which implicitly assumes that all items in the database are of the same importance (such as profit), or other information based on users’ concern in the database. This is often not the case in real-life applications. Although a few methods have been proposed to mine high utility itemsets (HUI) with multiple minimum utility (MMU), they are not suitable for mining HUSP with MMU because an item may occur more than one time in a sequence and may have multiple utility values. In this paper, we propose a novel method, called HUSpan-MMU, to efficiently mine HUSP with MMU from sequential utility-based databases. A lexicographic quantitative sequence tree (LQS-tree) is used to extract the complete set of HUSP. Meanwhile, two pruning methods are used to reduce the search space in the LQS-tree. Experimental results on both synthetic and real datasets show that HUSpan-MMU can efficiently mine HUSP with MMU from utility-based databases.

Download Full-text

A Scalable Approach for Data Mining – AHUIM

Webology ◽

10.14704/web/v18i1/web18029 ◽

2021 ◽

Vol 18 (1) ◽

pp. 92-103

Author(s):

Vandna Dahiya ◽

Sandeep Dalal

Keyword(s):

Data Mining ◽

Research Paper ◽

Large Datasets ◽

Novel Technique ◽

Itemset Mining ◽

Essential Form ◽

High Utility

Utility itemset mining, which finds the item sets based on utility factors, has established itself as an essential form of data mining. The utility is defined in terms of quantity and some interest factor. Various methods have been developed so far by the researchers to mine these itemsets but most of them are not scalable. In the present times, a scalable approach is required that can fulfill the budding needs of data mining. A Spark based novel technique has been recommended in this research paper for mining the data in a distributed way, called as Absolute High Utility Itemset Mining (AHUIM). The technique is suitable for small as well as large datasets. The performance of the technique is being measured for various parameters such as speed, scalability, and accuracy etc.

Download Full-text

EHNL: An efficient algorithm for mining high utility itemsets with negative utility value and length constraints

Information Sciences ◽

10.1016/j.ins.2019.01.056 ◽

2019 ◽

Vol 484 ◽

pp. 44-70 ◽

Cited By ~ 2

Author(s):

Kuldeep Singh ◽

Ajay Kumar ◽

Shashank Sheshar Singh ◽

Harish Kumar Shakya ◽

Bhaskar Biswas

Keyword(s):

Efficient Algorithm ◽

Utility Value ◽

High Utility ◽

High Utility Itemsets

Download Full-text

An Efficient Graph-Based Method for Parallel Mining Problems

Journal of Information & Knowledge Management ◽

10.1142/s0219649204000778 ◽

2004 ◽

Vol 03 (02) ◽

pp. 143-154

Author(s):

Chin-Chen Chang ◽

Chih-Yang Lin ◽

Pei-Yu Lin

Keyword(s):

Data Mining ◽

Association Rules ◽

Load Balance ◽

Experimental Results ◽

Memory Usage ◽

Association Rules Mining ◽

Parallel Mining ◽

Mining Algorithms

Parallel association rules mining is a noticeable problem in data mining. However, little work has been proposed to deal with three important issues: (1) less memory usage; (2) less communication, among the involved computers, over the network; and (3) load balance among computers. In this paper, we present a graph-based scheme to solve the parallel mining problem by applying independent groups (clusters of maximal cliques). To bring the three issues to a close, the purpose of the independent groups aims at dividing a database into several independent sub-databases, so each sub-database can be employed independently to perform mining algorithms. To emphasis the effectiveness of the graph-based scheme, we adopt the independent groups not only for maximal large itemsets mining but also for general large itemsets mining. The experimental results show that our scheme can improve the efficiency for parallel mining when the independent groups are well-organized and designed.

Download Full-text

Towards Efficient Mining of Periodic High-Utility Itemsets in Large Databases

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8445.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 8083-8091

Keyword(s):

Computational Cost ◽

Frequent Itemsets ◽

Experimental Results ◽

Large Databases ◽

Very Large Databases ◽

Frequent Itemsets Mining ◽

Mining Methods ◽

High Utility ◽

High Utility Itemsets ◽

Temporal Periodicity

High Utility Item sets mining has attracted many researchers in recent years. But HUI mining methods involves a exponential mining space and returns a very large number of high-utility itemsets. . Temporal periodicity of itemset is considered recently as an important interesting criteria for mining high-utility itemsets in many applications. Periodic High Utility item sets mining methods has a limitation that it does not consider frequency and not suitable for large databases. To address this problem, we have proposed two efficient algorithms named FPHUI( mining periodic frequent HUIs), MFPHM(efficient mining periodic frequent HUIs) for mining periodic frequent high-utility itemsets. The first algorithm FPHUI miner generates all periodic frequent itemsets. Mining periodic frequent high-utility itemsets leads to more computational cost in very large databases. We further developed another algorithm called MFPHM to overcome this limitation. The performance of the frequent FPHUI miner is evaluated by conducting experiments on various real datasets. Experimental results show that proposed algorithms is efficient and effective.

Download Full-text