Mining High Utility Itemsets with Hill Climbing and Simulated Annealing

2022 ◽  
Vol 13 (1) ◽  
pp. 1-22
Author(s):  
M. Saqib Nawaz ◽  
Philippe Fournier-Viger ◽  
Unil Yun ◽  
Youxi Wu ◽  
Wei Song

High utility itemset mining (HUIM) is the task of finding all items set, purchased together, that generate a high profit in a transaction database. In the past, several algorithms have been developed to mine high utility itemsets (HUIs). However, most of them cannot properly handle the exponential search space while finding HUIs when the size of the database and total number of items increases. Recently, evolutionary and heuristic algorithms were designed to mine HUIs, which provided considerable performance improvement. However, they can still have a long runtime and some may miss many HUIs. To address this problem, this article proposes two algorithms for HUIM based on Hill Climbing (HUIM-HC) and Simulated Annealing (HUIM-SA). Both algorithms transform the input database into a bitmap for efficient utility computation and for search space pruning. To improve population diversity, HUIs discovered by evolution are used as target values for the next population instead of keeping the current optimal values in the next population. Through experiments on real-life datasets, it was found that the proposed algorithms are faster than state-of-the-art heuristic and evolutionary HUIM algorithms, that HUIM-SA discovers similar HUIs, and that HUIM-SA evolves linearly with the number of iterations.

Author(s):  
Tiantian Xu ◽  
Xiangjun Dong ◽  
Jianliang Xu ◽  
Xue Dong

High utility sequential patterns (HUSP) refer to those sequential patterns with high utility (such as profit), which play a crucial role in many real-life applications. Relevant studies of HUSP only consider positive values of sequence utility. In some applications, however, a sequence consists of items with negative values (NIV). For example, a supermarket sells a cartridge with negative profit in a package with a printer at higher positive return. Although a few methods have been proposed to mine high utility itemsets (HUI) with NIV, they are not suitable for mining HUSP with NIV because an item may occur more than once in a sequence and its utility may have multiple values. In this paper, we propose a novel method High Utility Sequential Patterns with Negative Item Values (HUSP-NIV) to efficiently mine HUSP with NIV from sequential utility-based databases. HUSP-NIV works as follows: (1) using the lexicographic quantitative sequence tree (LQS-tree) to extract the complete set of high utility sequences and using I-Concatenation and S-Concatenation mechanisms to generate newly concatenated sequences; (2) using three pruning methods to reduce the search space in the LQS-tree; (3) traversing LQS-tree and outputting all the high utility sequential patterns. To the best of our knowledge, HUSP-NIV is the first method to mine HUSP with NIV, which is shown efficient on both synthetic and real datasets.


Author(s):  
Tiantian Xu ◽  
Jianliang Xu ◽  
Xiangjun Dong

High utility sequential patterns (HUSP) mining has recently received a lot of attention from researchers. Many algorithms have been proposed to mine HUSP and most of them only use a single minimum utility, which implicitly assumes that all items in the database are of the same importance (such as profit), or other information based on users’ concern in the database. This is often not the case in real-life applications. Although a few methods have been proposed to mine high utility itemsets (HUI) with multiple minimum utility (MMU), they are not suitable for mining HUSP with MMU because an item may occur more than one time in a sequence and may have multiple utility values. In this paper, we propose a novel method, called HUSpan-MMU, to efficiently mine HUSP with MMU from sequential utility-based databases. A lexicographic quantitative sequence tree (LQS-tree) is used to extract the complete set of HUSP. Meanwhile, two pruning methods are used to reduce the search space in the LQS-tree. Experimental results on both synthetic and real datasets show that HUSpan-MMU can efficiently mine HUSP with MMU from utility-based databases.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-31
Author(s):  
Chunkai Zhang ◽  
Zilin Du ◽  
Yuting Yang ◽  
Wensheng Gan ◽  
Philip S. Yu

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.


Mathematics ◽  
2021 ◽  
Vol 9 (23) ◽  
pp. 3011
Author(s):  
Drishti Yadav

This paper introduces a novel population-based bio-inspired meta-heuristic optimization algorithm, called Blood Coagulation Algorithm (BCA). BCA derives inspiration from the process of blood coagulation in the human body. The underlying concepts and ideas behind the proposed algorithm are the cooperative behavior of thrombocytes and their intelligent strategy of clot formation. These behaviors are modeled and utilized to underscore intensification and diversification in a given search space. A comparison with various state-of-the-art meta-heuristic algorithms over a test suite of 23 renowned benchmark functions reflects the efficiency of BCA. An extensive investigation is conducted to analyze the performance, convergence behavior and computational complexity of BCA. The comparative study and statistical test analysis demonstrate that BCA offers very competitive and statistically significant results compared to other eminent meta-heuristic algorithms. Experimental results also show the consistent performance of BCA in high dimensional search spaces. Furthermore, we demonstrate the applicability of BCA on real-world applications by solving several real-life engineering problems.


2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.


2019 ◽  
Vol 18 (04) ◽  
pp. 1113-1185 ◽  
Author(s):  
Bahareh Rahmati ◽  
Mohammad Karim Sohrabi

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.


2020 ◽  
Vol 9 (4) ◽  
pp. 857
Author(s):  
Jacob Hopkins ◽  
Forrest Joy ◽  
Alaa Sheta ◽  
Hamza Turabieh ◽  
Dulal Kar

The main objective of an unmanned aerial vehicle (UAV) path planning is to generate a flight path that links a start point to an endpoint in an indoor space avoiding obstacles.  Path planning is essential for many real-life applications such as an autonomous car, surveillance mission, farming robots, unmanned aerial vehicles package delivery, space exploration, and many others. To create an optimal path, we need to adopt a specific criterion to minimize the distance the UAV must travel such as the Euclidean distance. In this paper, we provide our initial idea of creating an optimal path for indoor UAV using both A* and the Late Acceptance Hill Climbing (LAHC) algorithms. We are adopting an indoor search environment with various complexity and utilize the Probabilistic Roadmap algorithm (PRM) as a search space for both algorithms. The basic idea following PRM is to generate random sample points in the space and search these points for an optimal path. The developed results show that the LAHC algorithm outperforms the A* algorithm.


2019 ◽  
Vol 15 (1) ◽  
pp. 58-79 ◽  
Author(s):  
P. Lalitha Kumari ◽  
S. G. Sanjeevi ◽  
T.V. Madhusudhana Rao

Mining high-utility itemsets is an important task in the area of data mining. It involves exponential mining space and returns a very large number of high-utility itemsets. In a real-time scenario, it is often sufficient to mine a small number of high-utility itemsets based on user-specified interestingness. Recently, the temporal regularity of an itemset is considered as an important interesting criterion for many applications. Methods for finding the regular high utility itemsets suffers from setting the threshold value. To address this problem, a novel algorithm called as TKRHU (Top k Regular High Utility Itemset) Miner is proposed to mine top-k high utility itemsets that appears regularly where k represents the desired number of regular high itemsets. A novel list structure RUL and efficient pruning techniques are developed to discover the top-k regular itemsets with high profit. Efficient pruning techniques are designed for reducing search space. Experimental results show that proposed algorithm using novel list structure achieves high efficiency in terms of runtime and space.


2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Ju Wang ◽  
Fuxian Liu ◽  
Chunjie Jin

High utility itemsets (HUIs) mining has been a hot topic recently, which can be used to mine the profitable itemsets by considering both the quantity and profit factors. Up to now, researches on HUIs mining over uncertain datasets and data stream had been studied respectively. However, to the best of our knowledge, the issue of HUIs mining over uncertain data stream is seldom studied. In this paper, PHUIMUS (potential high utility itemsets mining over uncertain data stream) algorithm is proposed to mine potential high utility itemsets (PHUIs) that represent the itemsets with high utilities and high existential probabilities over uncertain data stream based on sliding windows. To realize the algorithm, potential utility list over uncertain data stream (PUS-list) is designed to mine PHUIs without rescanning the analyzed uncertain data stream. And transaction weighted probability and utility tree (TWPUS-tree) over uncertain data stream is also designed to decrease the number of candidate itemsets generated by the PHUIMUS algorithm. Substantial experiments are conducted in terms of run-time, number of discovered PHUIs, memory consumption, and scalability on real-life and synthetic databases. The results show that our proposed algorithm is reasonable and acceptable for mining meaningful PHUIs from uncertain data streams.


2021 ◽  
pp. 1-22
Author(s):  
Haodong Cheng ◽  
Meng Han ◽  
Ni Zhang ◽  
Le Wang ◽  
Xiaojuan Li

The researcher proposed the concept of Top-K high-utility itemsets mining over data streams. Users directly specify the number K of high-utility itemsets they wish to obtain for mining with no need to set a minimum utility threshold. There exist some problems in current Top-K high-utility itemsets mining algorithms over data streams including the complex construction process of the storage structure, the inefficiency of threshold raising strategies and utility pruning strategies, and large scale of the search space, etc., which still can not meet the requirement of real-time processing over data streams with limited time and memory constraints. To solve this problem, this paper proposes an efficient algorithm based on dataset projection for mining Top-K high-utility itemsets from a data stream. A data structure CIUDataListSW is also proposed, which stores the position of the item in the transaction to effectively obtain the initial projected dataset of the item. In order to improve the projection efficiency, this paper innovates a new reorganization technology for projected transactions in common batches to maintain the sort order of transactions in the process of dataset projection. Dual pruning strategy and transaction merging mechanism are also used to further reduce search space and dataset scanning costs. In addition, based on the proposed CUDH S W structure, an efficient threshold raising strategy CUD is used, and a new threshold raising strategy CUDCB is designed to further shorten the mining time. Experimental results show that the algorithm has great advantages in running time and memory consumption, and it is especially suitable for the mining of high-utility itemsets of dense datasets.


Sign in / Sign up

Export Citation Format

Share Document