Dynamic maintenance model for high average-utility pattern mining with deletion operation

Applied Intelligence ◽

10.1007/s10489-021-02539-4 ◽

2021 ◽

Author(s):

Jimmy Ming-Tai Wu ◽

Qian Teng ◽

Shahab Tayeb ◽

Jerry Chun-Wei Lin

Keyword(s):

Pattern Mining ◽

Computational Cost ◽

Practical Applications ◽

Itemset Mining ◽

Dynamic Databases ◽

Speed Up ◽

Dynamic Maintenance ◽

Average Utility ◽

High Utility ◽

Maintenance Model

AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.

Incrementally updating the high average-utility patterns with pre-large concept

Applied Intelligence ◽

10.1007/s10489-020-01743-y ◽

2020 ◽

Vol 50 (11) ◽

pp. 3788-3807

Author(s):

Jerry Chun-Wei Lin ◽

Matin Pirouz ◽

Youcef Djenouri ◽

Chien-Fu Cheng ◽

Usman Ahmed

Keyword(s):

State Of The Art ◽

The State ◽

Batch Mode ◽

Itemset Mining ◽

The Past ◽

Dynamic Databases ◽

Speed Up ◽

Average Utility ◽

High Utility ◽

High Utility Patterns

Abstract High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.

Analytics of high average-utility patterns in the industrial internet of things

Applied Intelligence ◽

10.1007/s10489-021-02751-2 ◽

2021 ◽

Author(s):

Jimmy Ming-Tai Wu ◽

Zhongcui Li ◽

Gautam Srivastava ◽

Unil Yun ◽

Jerry Chun-Wei Lin

Keyword(s):

High Performance ◽

Pattern Mining ◽

Search Space ◽

Research Field ◽

Industrial Internet Of Things ◽

Utility Measure ◽

Itemset Mining ◽

Industrial Internet ◽

Average Utility ◽

High Utility

AbstractRecently, revealing more valuable information except for quantity value for a database is an essential research field. High utility itemset mining (HAUIM) was suggested to reveal useful patterns by average-utility measure for pattern analytics and evaluations. HAUIM provides a more fair assessment than generic high utility itemset mining and ignores the influence of the length of itemsets. There are several high-performance HAUIM algorithms proposed to gain knowledge from a disorganized database. However, most existing works do not concern the uncertainty factor, which is one of the characteristics of data gathered from IoT equipment. In this work, an efficient algorithm for HAUIM to handle the uncertainty databases in IoTs is presented. Two upper-bound values are estimated to early diminish the search space for discovering meaningful patterns that greatly solve the limitations of pattern mining in IoTs. Experimental results showed several evaluations of the proposed approach compared to the existing algorithms, and the results are acceptable to state that the designed approach efficiently reveals high average utility itemsets from an uncertain situation.

A Survey of incremental high-utility pattern mining based on storage structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202745 ◽

2021 ◽

pp. 1-26

Author(s):

Haodong Cheng ◽

Meng Han ◽

Ni Zhang ◽

Xiaojuan Li ◽

Le Wang

Keyword(s):

Pattern Mining ◽

Business Decisions ◽

Practical Applications ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets ◽

High Utility Patterns ◽

Mining Algorithms ◽

Purchase Quantity ◽

Storage Structures

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.

Incremental high utility pattern mining with static and dynamic databases

Applied Intelligence ◽

10.1007/s10489-014-0601-6 ◽

2014 ◽

Vol 42 (2) ◽

pp. 323-352 ◽

Cited By ~ 54

Author(s):

Unil Yun ◽

Heungmo Ryang

Keyword(s):

Pattern Mining ◽

Dynamic Databases ◽

High Utility

Efficient Utility Tree-Based Algorithm to Mine High Utility Patterns Having Strong Correlation

Complexity ◽

10.1155/2021/7310137 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Rashad Saeed ◽

Azhar Rauf ◽

Fahmi H. Quradaa ◽

Syed Muhammad Asim

Keyword(s):

Strong Correlation ◽

Pattern Mining ◽

Computational Cost ◽

Search Space ◽

Sufficient Information ◽

Memory Consumption ◽

Cost Cutting ◽

E Learning ◽

High Utility ◽

High Utility Patterns

High Utility Itemset Mining (HUIM) is one of the most investigated tasks of data mining. It has broad applications in domains such as product recommendation, market basket analysis, e-learning, text mining, bioinformatics, and web click stream analysis. Insights from such pattern analysis provide numerous benefits, including cost cutting, improved competitive advantage, and increased revenue. However, HUIM methods may discover misleading patterns as they do not evaluate the correlation of extracted patterns. As a consequence, a number of algorithms have been proposed to mine correlated HUIs. These algorithms still suffer from the issue of the computational cost in terms of both time and memory consumption. This paper presents an algorithm, named Efficient Correlated High Utility Pattern Mining (ECoHUPM), to efficiently mine the high utility patterns having strong correlation items. A new data structure based on utility tree (UTtree) named CoUTlist is proposed to store sufficient information for mining the desired patterns. Three pruning properties are introduced to reduce the search space and improve the mining performance. Experiments on sparse, very sparse, dense, and very dense datasets indicate that the proposed ECoHUPM algorithm is efficient as compared to the state-of-the-art CoHUIM and CoHUI-Miner algorithms in terms of both time and memory consumption.

A Systematic Survey on High Utility Itemset Mining

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019300027 ◽

2019 ◽

Vol 18 (04) ◽

pp. 1113-1185 ◽

Cited By ~ 2

Author(s):

Bahareh Rahmati ◽

Mohammad Karim Sohrabi

Keyword(s):

Data Structures ◽

Search Space ◽

Frequent Itemset ◽

Itemset Mining ◽

Efficient Data ◽

Average Utility ◽

High Utility ◽

High Utility Itemsets ◽

Downward Closure ◽

Efficient Data Structures

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.

Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3487046 ◽

2022 ◽

Vol 16 (3) ◽

pp. 1-26

Author(s):

Jerry Chun-Wei Lin ◽

Youcef Djenouri ◽

Gautam Srivastava ◽

Yuanfa Li ◽

Philip S. Yu

Keyword(s):

Large Scale ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Main Memory ◽

Frequent Itemset ◽

Sequential Pattern ◽

Sequential Patterns ◽

Speed Up ◽

Mapreduce Model ◽

High Utility

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.

The Efficient Mining of Skyline Patterns from a Volunteer Computing Network

ACM Transactions on Internet Technology ◽

10.1145/3423557 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1-20

Author(s):

Jimmy Ming-Tai Wu ◽

Qian Teng ◽

Gautam Srivastava ◽

Matin Pirouz ◽

Jerry Chun-Wei Lin

Keyword(s):

Upper Bound ◽

Pattern Mining ◽

Frequent Itemset ◽

Volunteer Computing ◽

Decision Making Process ◽

Itemset Mining ◽

The Past ◽

Frequency Constraint ◽

New Knowledge ◽

High Utility

In the ever-growing world, the concepts of High-utility Itemset Mining (HUIM) as well as Frequent Itemset Mining (FIM) are fundamental works in knowledge discovery. Several algorithms have been designed successfully. However, these algorithms only used one factor to estimate an itemset. In the past, skyline pattern mining by considering both aspects of frequency and utility has been extensively discussed. In most cases, however, people tend to focus on purchase quantities of itemsets rather than frequencies. In this article, we propose a new knowledge called skyline quantity-utility pattern (SQUP) to provide better estimations in the decision-making process by considering quantity and utility together. Two algorithms, respectively, called SQU-Miner and SKYQUP are presented to efficiently mine the set of SQUPs. Moreover, the usage of volunteer computing is proposed to show the potential in real supermarket applications. Two new efficient utility-max structures are also mentioned for the reduction of the candidate itemsets, respectively, utilized in SQU-Miner and SKYQUP. These two new utility-max structures are used to store the upper-bound of utility for itemsets under the quantity constraint instead of frequency constraint, and the second proposed utility-max structure moreover applies a recursive updated process to further obtain strict upper-bound of utility. Our in-depth experimental results prove that SKYQUP has stronger performance when a comparison is made to SQU-Miner in terms of memory usage, runtime, and the number of candidates.

MINING OF HIGH-UTILITY ITEMSETS WITH NEGATIVE UTILITY

JOURNAL OF TECHNOLOGY & INNOVATION ◽

10.26480/jtin.02.2021.44.47 ◽

2020 ◽

Vol 1 (2) ◽

pp. 44-47

Author(s):

Tung N.T ◽

Nguyen Le Van ◽

Trinh Cong Nhut ◽

Tran Van Sang

Keyword(s):

State Of The Art ◽

Upper Bounds ◽

Itemset Mining ◽

Novel Structure ◽

Transactional Databases ◽

Speed Up ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets ◽

Mining Algorithms

The goal of the high-utility itemset mining task is to discover combinations of items that yield high profits from transactional databases. HUIM is a useful tool for retail stores to analyze customer behaviors. However, in the real world, items are found with both positive and negative utility values. To address this issue, we propose an algorithm named Modified Efficient High‐utility Itemsets mining with Negative utility (MEHIN) to find all HUIs with negative utility. This algorithm is an improved version of the EHIN algorithm. MEHIN utilizes 2 new upper bounds for pruning, named revised subtree and revised local utility. To reduce dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. An array‐based utility‐counting technique is also utilized to calculate upper‐bound efficiently. The MEHIN employs a novel structure called P-set to reduce the number of transaction scans and to speed up the mining process. Experimental results show that the proposed algorithms considerably outperform the state-of-the-art HUI-mining algorithms on negative utility in retail databases in terms of runtime.

Dramatically Reducing Search for High Utility Sequential Patterns by Maintaining Candidate Lists

Information ◽

10.3390/info11010044 ◽

2020 ◽

Vol 11 (1) ◽

pp. 44

Author(s):

Scott Buffett

Keyword(s):

Upper Bound ◽

Pattern Mining ◽

Computational Cost ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Frequent Patterns ◽

Run Time ◽

High Utility

A ubiquitous challenge throughout all areas of data mining, particularly in the mining of frequent patterns in large databases, is centered on the necessity to reduce the time and space required to perform the search. The extent of this reduction proportionally facilitates the ability to identify patterns of interest. High utility sequential pattern mining (HUSPM) seeks to identify frequent patterns that are (1) sequential in nature and (2) hold a significant magnitude of utility in a sequence database, by considering the aspect of item value or importance. While traditional sequential pattern mining relies on the downward closure property to significantly reduce the required search space, with HUSPM, this property does not hold. To address this drawback, an approach is proposed that establishes a tight upper bound on the utility of future candidate sequential patterns by maintaining a list of items that are deemed potential candidates for concatenation. Such candidates are provably the only items that are ever needed for any extension of a given sequential pattern or its descendants in the search tree. This list is then exploited to significantly further tighten the upper bound on the utilities of descendent patterns. An extension of this work is then proposed that significantly reduces the computational cost of updating database utilities each time a candidate item is removed from the list, resulting in a massive reduction in the number of candidate sequential patterns that need to be generated in the search. Sequential pattern mining methods implementing these new techniques for bound reduction and further candidate list reduction are demonstrated via the introduction of the CRUSP and CRUSPPivot algorithms, respectively. Validation of the techniques was conducted on six public datasets. Tests show that use of the CRUSP algorithm results in a significant reduction in the overall number of candidate sequential patterns that need to be considered, and subsequently a significant reduction in run time, when compared to the current state of the art in bounding techniques. When employing the CRUSPPivot algorithm, the further reduction in the size of the search space was found to be dramatic, with the reduction in run time found to be dramatic to moderate, depending on the dataset. Demonstrating the practical significance of the work, experiments showed that time required for one particularly complex dataset was reduced from many hours to less than one minute.