One scan based high average-utility pattern mining in static and dynamic databases

AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.

Download Full-text

Incremental high utility pattern mining with static and dynamic databases

Applied Intelligence ◽

10.1007/s10489-014-0601-6 ◽

2014 ◽

Vol 42 (2) ◽

pp. 323-352 ◽

Cited By ~ 54

Author(s):

Unil Yun ◽

Heungmo Ryang

Keyword(s):

Pattern Mining ◽

Dynamic Databases ◽

High Utility

Download Full-text

Incrementally updating the high average-utility patterns with pre-large concept

Applied Intelligence ◽

10.1007/s10489-020-01743-y ◽

2020 ◽

Vol 50 (11) ◽

pp. 3788-3807

Author(s):

Jerry Chun-Wei Lin ◽

Matin Pirouz ◽

Youcef Djenouri ◽

Chien-Fu Cheng ◽

Usman Ahmed

Keyword(s):

State Of The Art ◽

The State ◽

Batch Mode ◽

Itemset Mining ◽

The Past ◽

Dynamic Databases ◽

Speed Up ◽

Average Utility ◽

High Utility ◽

High Utility Patterns

Abstract High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.

Download Full-text

Analytics of high average-utility patterns in the industrial internet of things

Applied Intelligence ◽

10.1007/s10489-021-02751-2 ◽

2021 ◽

Author(s):

Jimmy Ming-Tai Wu ◽

Zhongcui Li ◽

Gautam Srivastava ◽

Unil Yun ◽

Jerry Chun-Wei Lin

Keyword(s):

High Performance ◽

Pattern Mining ◽

Search Space ◽

Research Field ◽

Industrial Internet Of Things ◽

Utility Measure ◽

Itemset Mining ◽

Industrial Internet ◽

Average Utility ◽

High Utility

AbstractRecently, revealing more valuable information except for quantity value for a database is an essential research field. High utility itemset mining (HAUIM) was suggested to reveal useful patterns by average-utility measure for pattern analytics and evaluations. HAUIM provides a more fair assessment than generic high utility itemset mining and ignores the influence of the length of itemsets. There are several high-performance HAUIM algorithms proposed to gain knowledge from a disorganized database. However, most existing works do not concern the uncertainty factor, which is one of the characteristics of data gathered from IoT equipment. In this work, an efficient algorithm for HAUIM to handle the uncertainty databases in IoTs is presented. Two upper-bound values are estimated to early diminish the search space for discovering meaningful patterns that greatly solve the limitations of pattern mining in IoTs. Experimental results showed several evaluations of the proposed approach compared to the existing algorithms, and the results are acceptable to state that the designed approach efficiently reveals high average utility itemsets from an uncertain situation.

Download Full-text

Frequent high minimum average utility sequence mining with constraints in dynamic databases using efficient pruning strategies

Applied Intelligence ◽

10.1007/s10489-021-02520-1 ◽

2021 ◽

Author(s):

Tin Truong ◽

Hai Duong ◽

Bac Le ◽

Philippe Fournier-Viger ◽

Unil Yun

Keyword(s):

Sequence Mining ◽

Dynamic Databases ◽

Average Utility

Download Full-text

High average-utility sequential pattern mining based on uncertain databases

Knowledge and Information Systems ◽

10.1007/s10115-019-01385-8 ◽

2019 ◽

Vol 62 (3) ◽

pp. 1199-1228 ◽

Cited By ~ 2

Author(s):

Jerry Chun-Wei Lin ◽

Ting Li ◽

Matin Pirouz ◽

Ji Zhang ◽

Philippe Fournier-Viger

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Uncertain Databases ◽

Average Utility

Download Full-text

Efficiently Updating the Discovered Sequential Patterns for Sequence Modification

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500455 ◽

2016 ◽

Vol 26 (08) ◽

pp. 1285-1313 ◽

Cited By ~ 1

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Philippe Fournier-Viger ◽

Tzung-Pei Hong

Keyword(s):

Pattern Mining ◽

State Of The Art ◽

Poor Performance ◽

Sequential Patterns ◽

Batch Mode ◽

Dynamic Databases ◽

The Cost ◽

Mining Algorithms ◽

Sequence Modification ◽

Over Time

Mining sequential patterns (SPs) is a popular data mining task, which consists in finding interesting, unexpected, and useful patterns in sequence databases. It has several applications in many domains. However, most sequential pattern mining algorithms assume that databases are static, i.e. that they do not change over time. But in real-word applications, sequences are often modified. Thus, it is an important issue to design algorithms for updating SPs in a dynamic database environment. Although some algorithms have been proposed to maintain SPs in dynamic databases, these algorithms may have poor performance, especially when databases contain long sequences or a large number of sequences. This paper addresses this issue by proposing a novel dynamic mining approach named PreFUSP-TREE-MOD to address the problem of maintaining and updating discovered SPs when sequences in a database are modified. The proposed approach adopts the previously proposed pre-large concept using two support thresholds, to avoid scanning the database when possible, for updating the set of discovered patterns. Due to the pruning properties of the pre-large concept, the PreFUSP-TREE-MOD maintenance algorithm can effectively reduce the cost of database scans to maintain and update the built FUSP-tree for sequence modification. When the number of modified sequences is less than the safety bound of the pre-large concept, the proposed maintenance algorithm outperforms traditional SPM algorithms in batch mode, and the state-of-the-art maintenance algorithm in terms of execution time and number of tree nodes.

Download Full-text

HAOP-Miner:Self-adaptive high-average utility one-off sequential pattern mining

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115449 ◽

2021 ◽

pp. 115449

Author(s):

Youxi Wu ◽

Rong Lei ◽

Yan Li ◽

Lei Guo ◽

Xindong Wu

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Average Utility

Download Full-text

Mining Closed Sequential Patterns in Progressive Databases

Journal of Information & Knowledge Management ◽

10.1142/s021964921350024x ◽

2013 ◽

Vol 12 (03) ◽

pp. 1350024

Author(s):

R. B. V. Subramanyam ◽

A. Suresh Rao ◽

Ramesh Karnati ◽

Somaraju Suvvari ◽

D. V. L. N. Somayajulu

Keyword(s):

Pattern Mining ◽

Synthetic Data ◽

Window Size ◽

Search Space ◽

Sequential Patterns ◽

Data Sets ◽

Time Stamp ◽

Algorithmic Approach ◽

Dynamic Databases ◽

Search Space Pruning

Previous studies of Mining Closed Sequential Patterns suggested several heuristics and proposed some computationally effective techniques. Like, Bidirectional Extension with closure checking schemas, Back scan search space pruning, and scan skip optimization used in BIDE (BI-Directional Extension) algorithm. Many researchers were inspired with the efficiency of BIDE, have tried to apply the technique implied by BIDE to various kinds of databases; we toofelt that it can be applied over progressive databases. Without tailoring BIDE, it cannot be applied to dynamic databases. The concept of progressive databases explores the nature of incremental databases by defining the parameters like, Period of Interest (POI), user defined minimum support. An algorithm PISA (Progressive mIning Sequential pAttern mining) was proposed by Huang et al. for finding all sequential patterns over progressive databases. The structure of PISA helps in space utilization by limiting the height of the tree, to the length of POI and this issue is also a motivation for further improvement in this work. In this paper, a tree structure LCT (Label, Customer-id, and Time stamp) is proposed, and an approach formining closed sequential patterns using closure checking schemas across the progressive databases concept. The significance of LCT structure is, confining its height to a maximum of two levels. The algorithmic approach describes that the window size can be increased by one unit of time. The complexity of the proposed algorithmic approach is also analysed. The approach is validated using synthetic data sets available in Internet and shows a better performance in comparison to the existing methods.

Download Full-text

Damped window based high average utility pattern mining over data streams

Knowledge-Based Systems ◽

10.1016/j.knosys.2017.12.029 ◽

2018 ◽

Vol 144 ◽

pp. 188-205 ◽

Cited By ~ 48

Author(s):

Unil Yun ◽

Donggyu Kim ◽

Eunchul Yoon ◽

Hamido Fujita

Keyword(s):

Data Streams ◽

Pattern Mining ◽

Average Utility

Download Full-text