One scan based high average-utility pattern mining in static and dynamic databases

2020 ◽  
Vol 111 ◽  
pp. 143-158
Author(s):  
Jongseong Kim ◽  
Unil Yun ◽  
Eunchul Yoon ◽  
Jerry Chun-Wei Lin ◽  
Philippe Fournier-Viger
Author(s):  
Jimmy Ming-Tai Wu ◽  
Qian Teng ◽  
Shahab Tayeb ◽  
Jerry Chun-Wei Lin

AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.


2020 ◽  
Vol 50 (11) ◽  
pp. 3788-3807
Author(s):  
Jerry Chun-Wei Lin ◽  
Matin Pirouz ◽  
Youcef Djenouri ◽  
Chien-Fu Cheng ◽  
Usman Ahmed

Abstract High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.


Author(s):  
Jimmy Ming-Tai Wu ◽  
Zhongcui Li ◽  
Gautam Srivastava ◽  
Unil Yun ◽  
Jerry Chun-Wei Lin

AbstractRecently, revealing more valuable information except for quantity value for a database is an essential research field. High utility itemset mining (HAUIM) was suggested to reveal useful patterns by average-utility measure for pattern analytics and evaluations. HAUIM provides a more fair assessment than generic high utility itemset mining and ignores the influence of the length of itemsets. There are several high-performance HAUIM algorithms proposed to gain knowledge from a disorganized database. However, most existing works do not concern the uncertainty factor, which is one of the characteristics of data gathered from IoT equipment. In this work, an efficient algorithm for HAUIM to handle the uncertainty databases in IoTs is presented. Two upper-bound values are estimated to early diminish the search space for discovering meaningful patterns that greatly solve the limitations of pattern mining in IoTs. Experimental results showed several evaluations of the proposed approach compared to the existing algorithms, and the results are acceptable to state that the designed approach efficiently reveals high average utility itemsets from an uncertain situation.


2019 ◽  
Vol 62 (3) ◽  
pp. 1199-1228 ◽  
Author(s):  
Jerry Chun-Wei Lin ◽  
Ting Li ◽  
Matin Pirouz ◽  
Ji Zhang ◽  
Philippe Fournier-Viger

Author(s):  
Jerry Chun-Wei Lin ◽  
Wensheng Gan ◽  
Philippe Fournier-Viger ◽  
Tzung-Pei Hong

Mining sequential patterns (SPs) is a popular data mining task, which consists in finding interesting, unexpected, and useful patterns in sequence databases. It has several applications in many domains. However, most sequential pattern mining algorithms assume that databases are static, i.e. that they do not change over time. But in real-word applications, sequences are often modified. Thus, it is an important issue to design algorithms for updating SPs in a dynamic database environment. Although some algorithms have been proposed to maintain SPs in dynamic databases, these algorithms may have poor performance, especially when databases contain long sequences or a large number of sequences. This paper addresses this issue by proposing a novel dynamic mining approach named PreFUSP-TREE-MOD to address the problem of maintaining and updating discovered SPs when sequences in a database are modified. The proposed approach adopts the previously proposed pre-large concept using two support thresholds, to avoid scanning the database when possible, for updating the set of discovered patterns. Due to the pruning properties of the pre-large concept, the PreFUSP-TREE-MOD maintenance algorithm can effectively reduce the cost of database scans to maintain and update the built FUSP-tree for sequence modification. When the number of modified sequences is less than the safety bound of the pre-large concept, the proposed maintenance algorithm outperforms traditional SPM algorithms in batch mode, and the state-of-the-art maintenance algorithm in terms of execution time and number of tree nodes.


2013 ◽  
Vol 12 (03) ◽  
pp. 1350024
Author(s):  
R. B. V. Subramanyam ◽  
A. Suresh Rao ◽  
Ramesh Karnati ◽  
Somaraju Suvvari ◽  
D. V. L. N. Somayajulu

Previous studies of Mining Closed Sequential Patterns suggested several heuristics and proposed some computationally effective techniques. Like, Bidirectional Extension with closure checking schemas, Back scan search space pruning, and scan skip optimization used in BIDE (BI-Directional Extension) algorithm. Many researchers were inspired with the efficiency of BIDE, have tried to apply the technique implied by BIDE to various kinds of databases; we toofelt that it can be applied over progressive databases. Without tailoring BIDE, it cannot be applied to dynamic databases. The concept of progressive databases explores the nature of incremental databases by defining the parameters like, Period of Interest (POI), user defined minimum support. An algorithm PISA (Progressive mIning Sequential pAttern mining) was proposed by Huang et al. for finding all sequential patterns over progressive databases. The structure of PISA helps in space utilization by limiting the height of the tree, to the length of POI and this issue is also a motivation for further improvement in this work. In this paper, a tree structure LCT (Label, Customer-id, and Time stamp) is proposed, and an approach formining closed sequential patterns using closure checking schemas across the progressive databases concept. The significance of LCT structure is, confining its height to a maximum of two levels. The algorithmic approach describes that the window size can be increased by one unit of time. The complexity of the proposed algorithmic approach is also analysed. The approach is validated using synthetic data sets available in Internet and shows a better performance in comparison to the existing methods.


2018 ◽  
Vol 144 ◽  
pp. 188-205 ◽  
Author(s):  
Unil Yun ◽  
Donggyu Kim ◽  
Eunchul Yoon ◽  
Hamido Fujita

Sign in / Sign up

Export Citation Format

Share Document