Efficient Utility Tree-Based Algorithm to Mine High Utility Patterns Having Strong Correlation

High Utility Itemset Mining (HUIM) is one of the most investigated tasks of data mining. It has broad applications in domains such as product recommendation, market basket analysis, e-learning, text mining, bioinformatics, and web click stream analysis. Insights from such pattern analysis provide numerous benefits, including cost cutting, improved competitive advantage, and increased revenue. However, HUIM methods may discover misleading patterns as they do not evaluate the correlation of extracted patterns. As a consequence, a number of algorithms have been proposed to mine correlated HUIs. These algorithms still suffer from the issue of the computational cost in terms of both time and memory consumption. This paper presents an algorithm, named Efficient Correlated High Utility Pattern Mining (ECoHUPM), to efficiently mine the high utility patterns having strong correlation items. A new data structure based on utility tree (UTtree) named CoUTlist is proposed to store sufficient information for mining the desired patterns. Three pruning properties are introduced to reduce the search space and improve the mining performance. Experiments on sparse, very sparse, dense, and very dense datasets indicate that the proposed ECoHUPM algorithm is efficient as compared to the state-of-the-art CoHUIM and CoHUI-Miner algorithms in terms of both time and memory consumption.

Download Full-text

Dramatically Reducing Search for High Utility Sequential Patterns by Maintaining Candidate Lists

Information ◽

10.3390/info11010044 ◽

2020 ◽

Vol 11 (1) ◽

pp. 44

Author(s):

Scott Buffett

Keyword(s):

Upper Bound ◽

Pattern Mining ◽

Computational Cost ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Frequent Patterns ◽

Run Time ◽

High Utility

A ubiquitous challenge throughout all areas of data mining, particularly in the mining of frequent patterns in large databases, is centered on the necessity to reduce the time and space required to perform the search. The extent of this reduction proportionally facilitates the ability to identify patterns of interest. High utility sequential pattern mining (HUSPM) seeks to identify frequent patterns that are (1) sequential in nature and (2) hold a significant magnitude of utility in a sequence database, by considering the aspect of item value or importance. While traditional sequential pattern mining relies on the downward closure property to significantly reduce the required search space, with HUSPM, this property does not hold. To address this drawback, an approach is proposed that establishes a tight upper bound on the utility of future candidate sequential patterns by maintaining a list of items that are deemed potential candidates for concatenation. Such candidates are provably the only items that are ever needed for any extension of a given sequential pattern or its descendants in the search tree. This list is then exploited to significantly further tighten the upper bound on the utilities of descendent patterns. An extension of this work is then proposed that significantly reduces the computational cost of updating database utilities each time a candidate item is removed from the list, resulting in a massive reduction in the number of candidate sequential patterns that need to be generated in the search. Sequential pattern mining methods implementing these new techniques for bound reduction and further candidate list reduction are demonstrated via the introduction of the CRUSP and CRUSPPivot algorithms, respectively. Validation of the techniques was conducted on six public datasets. Tests show that use of the CRUSP algorithm results in a significant reduction in the overall number of candidate sequential patterns that need to be considered, and subsequently a significant reduction in run time, when compared to the current state of the art in bounding techniques. When employing the CRUSPPivot algorithm, the further reduction in the size of the search space was found to be dramatic, with the reduction in run time found to be dramatic to moderate, depending on the dataset. Demonstrating the practical significance of the work, experiments showed that time required for one particularly complex dataset was reduced from many hours to less than one minute.

Download Full-text

Improved Strategy for High-Utility Pattern Mining Algorithm

Mathematical Problems in Engineering ◽

10.1155/2020/1971805 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Le Wang ◽

Shui Wang ◽

Haiyan Li ◽

Chunliang Zhou

Keyword(s):

Pattern Mining ◽

State Of The Art ◽

Search Space ◽

Research Topics ◽

Main Research ◽

Mining Algorithm ◽

Temporal Efficiency ◽

High Utility ◽

High Utility Patterns ◽

Mining Algorithms

High-utility pattern mining is a research hotspot in the field of pattern mining, and one of its main research topics is how to improve the efficiency of the mining algorithm. Based on the study on the state-of-the-art high-utility pattern mining algorithms, this paper proposes an improved strategy that removes noncandidate items from the global header table and local header table as early as possible, thus reducing search space and improving efficiency of the algorithm. The proposed strategy is applied to the algorithm EFIM (EFficient high-utility Itemset Mining). Experimental verification was carried out on nine typical datasets (including two large datasets); results show that our strategy can effectively improve temporal efficiency for mining high-utility patterns.

Download Full-text

Dynamic maintenance model for high average-utility pattern mining with deletion operation

Applied Intelligence ◽

10.1007/s10489-021-02539-4 ◽

2021 ◽

Author(s):

Jimmy Ming-Tai Wu ◽

Qian Teng ◽

Shahab Tayeb ◽

Jerry Chun-Wei Lin

Keyword(s):

Pattern Mining ◽

Computational Cost ◽

Practical Applications ◽

Itemset Mining ◽

Dynamic Databases ◽

Speed Up ◽

Dynamic Maintenance ◽

Average Utility ◽

High Utility ◽

Maintenance Model

AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.

Download Full-text

Optimization of Evolutionary Algorithm Using Machine Learning Techniques for Pattern Mining in Transactional Database

Handbook of Research on Applications and Implementations of Machine Learning Techniques - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9902-9.ch010 ◽

2020 ◽

pp. 173-200

Author(s):

Logeswaran K. ◽

Suresh P. ◽

Savitha S. ◽

Prasanna Kumar K. R.

Keyword(s):

Evolutionary Algorithm ◽

Pattern Mining ◽

Fitness Function ◽

Search Space ◽

Machine Learning Techniques ◽

Dynamic Selection ◽

Learning Techniques ◽

Optimal Function ◽

High Utility ◽

Mining Algorithms

In recent years, the data analysts are facing many challenges in high utility itemset (HUI) mining from given transactional database using existing traditional techniques. The challenges in utility mining algorithms are exponentially growing search space and the minimum utility threshold appropriate to the given database. To overcome these challenges, evolutionary algorithm-based techniques can be used to mine the HUI from transactional database. However, testing each of the supporting functions in the optimization problem is very inefficient and it increases the time complexity of the algorithm. To overcome this drawback, reinforcement learning-based approach is proposed for improving the efficiency of the algorithm, and the most appropriate fitness function for evaluation can be selected automatically during execution of an algorithm. Furthermore, during the optimization process when distinct functions are skillful, dynamic selection of current optimal function is done.

Download Full-text

Analytics of high average-utility patterns in the industrial internet of things

Applied Intelligence ◽

10.1007/s10489-021-02751-2 ◽

2021 ◽

Author(s):

Jimmy Ming-Tai Wu ◽

Zhongcui Li ◽

Gautam Srivastava ◽

Unil Yun ◽

Jerry Chun-Wei Lin

Keyword(s):

High Performance ◽

Pattern Mining ◽

Search Space ◽

Research Field ◽

Industrial Internet Of Things ◽

Utility Measure ◽

Itemset Mining ◽

Industrial Internet ◽

Average Utility ◽

High Utility

AbstractRecently, revealing more valuable information except for quantity value for a database is an essential research field. High utility itemset mining (HAUIM) was suggested to reveal useful patterns by average-utility measure for pattern analytics and evaluations. HAUIM provides a more fair assessment than generic high utility itemset mining and ignores the influence of the length of itemsets. There are several high-performance HAUIM algorithms proposed to gain knowledge from a disorganized database. However, most existing works do not concern the uncertainty factor, which is one of the characteristics of data gathered from IoT equipment. In this work, an efficient algorithm for HAUIM to handle the uncertainty databases in IoTs is presented. Two upper-bound values are estimated to early diminish the search space for discovering meaningful patterns that greatly solve the limitations of pattern mining in IoTs. Experimental results showed several evaluations of the proposed approach compared to the existing algorithms, and the results are acceptable to state that the designed approach efficiently reveals high average utility itemsets from an uncertain situation.

Download Full-text

A Survey of incremental high-utility pattern mining based on storage structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202745 ◽

2021 ◽

pp. 1-26

Author(s):

Haodong Cheng ◽

Meng Han ◽

Ni Zhang ◽

Xiaojuan Li ◽

Le Wang

Keyword(s):

Pattern Mining ◽

Business Decisions ◽

Practical Applications ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets ◽

High Utility Patterns ◽

Mining Algorithms ◽

Purchase Quantity ◽

Storage Structures

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.

Download Full-text

An Efficient Algorithm for Extracting High-Utility Hierarchical Sequential Patterns

Wireless Communications and Mobile Computing ◽

10.1155/2020/8816228 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yiwen Zu

Keyword(s):

Pattern Mining ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Second Phase ◽

Two Phase ◽

High Utility ◽

Synthetic Datasets ◽

Hierarchical Relation

High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, where utility is used to measure the importance or weight of a sequence. However, the underlying informative knowledge of hierarchical relation between different items is ignored in HUSPM, which makes HUSPM unable to extract more interesting patterns. In this paper, we incorporate the hierarchical relation of items into HUSPM and propose a two-phase algorithm MHUH, the first algorithm for high-utility hierarchical sequential pattern mining (HUHSPM). In the first phase named Extension, we use the existing algorithm FHUSpan which we proposed earlier to efficiently mine the general high-utility sequences (g-sequences); in the second phase named Replacement, we mine the special high-utility sequences with the hierarchical relation (s-sequences) as high-utility hierarchical sequential patterns from g-sequences. For further improvements of efficiency, MHUH takes several strategies such as Reduction, FGS, and PBS and a novel upper bounder TSWU, which will be able to greatly reduce the search space. Substantial experiments were conducted on both real and synthetic datasets to assess the performance of the two-phase algorithm MHUH in terms of runtime, number of patterns, and scalability. Conclusion can be drawn from the experiment that MHUH extracts more interesting patterns with underlying informative knowledge efficiently in HUHSPM.

Download Full-text

RHUPS

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3430767 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1-27

Author(s):

Yoonji Baek ◽

Unil Yun ◽

Heonho Kim ◽

Hyoju Nam ◽

Hyunsoo Kim ◽

...

Keyword(s):

Large Scale ◽

Pattern Mining ◽

Traffic Measurement ◽

Knowledge Based ◽

Efficient Management ◽

Data Market ◽

High Utility ◽

Intelligent Information ◽

Synthetic Datasets ◽

High Utility Patterns

Databases that deal with the real world have various characteristics. New data is continuously inserted over time without limiting the length of the database, and a variety of information about the items constituting the database is contained. Recently generated data has a greater influence than the previously generated data. These are called the time-sensitive non-binary stream databases, and they include databases such as web-server click data, market sales data, data from sensor networks, and network traffic measurement. Many high utility pattern mining and stream pattern mining methods have been proposed so far. However, they have a limitation that they are not suitable to analyze these databases, because they find valid patterns by analyzing a database with only some of the features described above. Therefore, knowledge-based software about how to find meaningful information efficiently by analyzing databases with these characteristics is required. In this article, we propose an intelligent information system that calculates the influence of the insertion time of each batch in a large-scale stream database by applying the sliding window model and mines recent high utility patterns without generating candidate patterns. In addition, a novel list-based data structure is suggested for a fast and efficient management of the time-sensitive stream databases. Moreover, our technique is compared with state-of-the-art algorithms through various experiments using real datasets and synthetic datasets. The experimental results show that our approach outperforms the previously proposed methods in terms of runtime, memory usage, and scalability.

Download Full-text

On-Shelf Utility Mining of Sequence Data

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3457570 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-31

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yuting Yang ◽

Wensheng Gan ◽

Philip S. Yu

Keyword(s):

High Efficiency ◽

Sequence Data ◽

Real Life ◽

Search Space ◽

Upper Bounds ◽

Utility Mining ◽

Limited Memory ◽

Time Periods ◽

High Utility ◽

Synthetic Datasets

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.

Download Full-text

A Survey of Correlated High Utility Pattern Mining

IEEE Access ◽

10.1109/access.2021.3065393 ◽

2021 ◽

pp. 1-1

Author(s):

Rashad S. Almoqbily ◽

Azhar Rauf ◽

Fahmi H. Quradaa

Keyword(s):

Pattern Mining ◽

High Utility

Download Full-text