An Efficient Approach for Mining Weighted Sequential Patterns in Dynamic Databases

Author(s):  
Sabrina Zaman Ishita ◽  
Faria Noor ◽  
Chowdhury Farhan Ahmed
2014 ◽  
Vol 35 ◽  
pp. 131-142 ◽  
Author(s):  
Binbin Zhang ◽  
Chun-Wei Lin ◽  
Wensheng Gan ◽  
Tzung-Pei Hong

Author(s):  
Weigang Huo ◽  
Xingjie Feng ◽  
Zhiyuan Zhang

Keeping the generated fuzzy frequent itemsets up-to-date and discovering the new fuzzy frequent itemsets are challenging problems in dynamic databases. In this paper, the classical H-struct structure is extended to mining fuzzy frequent itemsets. The extended H-mine algorithm can use any t-norm operator to calculate the support of fuzzy itemset. The FP-tree-based structure called the Initial-FP-tree and the New-FP-tree are built to maintain the fuzzy frequent itemsets in the original database and the new inserted transactions respectively. The strategy of incremental mining of fuzzy frequent itemsets is achieved by breath-first-traversing the Initial-FP-tree and the New-FP-tree. All of the fuzzy frequent itemsets in the updated database can be obtained by traversing the Initial-FP-tree. The experiments on real datasets show that the proposed approach runs faster than the batch extended H-mine algorithm. Comparing with the existing algorithm for incremental mining fuzzy frequent itemsets, the proposed approach is superior in terms of the execution time. The memory cost of the proposed approach is lower than that of the existing algorithm when the minimum support threshold is low.


2015 ◽  
Vol 11 (1) ◽  
pp. 1-22 ◽  
Author(s):  
Jerry Chun-Wei Lin ◽  
Wensheng Gan ◽  
Tzung-Pei Hong ◽  
Jingliang Zhang

Mining useful information or knowledge from a very large database to aid managers or decision makers to make appropriate decisions is a critical issue in recent years. Sequential patterns can be used to discover the purchased behaviors of customers or the usage behaviors of users from Web log data. Most approaches process a static database to discover sequential patterns in a batch way. In real-world applications, transactions or sequences in databases are frequently changed. In the past, a fast updated sequential pattern (FUSP)-tree was proposed to handle dynamic databases whether for sequence insertion, deletion or modification based on FUP concepts. Original database is required to be re-scanned if it is necessary to maintain the small sequences which was not kept in the FUSP tree. In this paper, the prelarge concept was adopted to maintain and update the built prelarge FUSP tree for sequence modification. A prelarge FUSP tree is modified from FUSP tree for preserving not only the frequent 1-sequences but also the prelarge 1-sequences in the tree structure. The PRELARGE-FUSP-TREE-MOD maintenance algorithm is proposed to reduce the rescans of the original database due to the pruning properties of prelarge concept. When the number of modified sequences is smaller than the safety bound of the prelarge concept, better results can be obtained by the proposed PRELARGE-FUSP-TREE-MOD maintenance algorithm for sequence modification in dynamic databases.


Author(s):  
Jerry Chun-Wei Lin ◽  
Wensheng Gan ◽  
Philippe Fournier-Viger ◽  
Tzung-Pei Hong

Mining sequential patterns (SPs) is a popular data mining task, which consists in finding interesting, unexpected, and useful patterns in sequence databases. It has several applications in many domains. However, most sequential pattern mining algorithms assume that databases are static, i.e. that they do not change over time. But in real-word applications, sequences are often modified. Thus, it is an important issue to design algorithms for updating SPs in a dynamic database environment. Although some algorithms have been proposed to maintain SPs in dynamic databases, these algorithms may have poor performance, especially when databases contain long sequences or a large number of sequences. This paper addresses this issue by proposing a novel dynamic mining approach named PreFUSP-TREE-MOD to address the problem of maintaining and updating discovered SPs when sequences in a database are modified. The proposed approach adopts the previously proposed pre-large concept using two support thresholds, to avoid scanning the database when possible, for updating the set of discovered patterns. Due to the pruning properties of the pre-large concept, the PreFUSP-TREE-MOD maintenance algorithm can effectively reduce the cost of database scans to maintain and update the built FUSP-tree for sequence modification. When the number of modified sequences is less than the safety bound of the pre-large concept, the proposed maintenance algorithm outperforms traditional SPM algorithms in batch mode, and the state-of-the-art maintenance algorithm in terms of execution time and number of tree nodes.


2013 ◽  
Vol 12 (03) ◽  
pp. 1350024
Author(s):  
R. B. V. Subramanyam ◽  
A. Suresh Rao ◽  
Ramesh Karnati ◽  
Somaraju Suvvari ◽  
D. V. L. N. Somayajulu

Previous studies of Mining Closed Sequential Patterns suggested several heuristics and proposed some computationally effective techniques. Like, Bidirectional Extension with closure checking schemas, Back scan search space pruning, and scan skip optimization used in BIDE (BI-Directional Extension) algorithm. Many researchers were inspired with the efficiency of BIDE, have tried to apply the technique implied by BIDE to various kinds of databases; we toofelt that it can be applied over progressive databases. Without tailoring BIDE, it cannot be applied to dynamic databases. The concept of progressive databases explores the nature of incremental databases by defining the parameters like, Period of Interest (POI), user defined minimum support. An algorithm PISA (Progressive mIning Sequential pAttern mining) was proposed by Huang et al. for finding all sequential patterns over progressive databases. The structure of PISA helps in space utilization by limiting the height of the tree, to the length of POI and this issue is also a motivation for further improvement in this work. In this paper, a tree structure LCT (Label, Customer-id, and Time stamp) is proposed, and an approach formining closed sequential patterns using closure checking schemas across the progressive databases concept. The significance of LCT structure is, confining its height to a maximum of two levels. The algorithmic approach describes that the window size can be increased by one unit of time. The complexity of the proposed algorithmic approach is also analysed. The approach is validated using synthetic data sets available in Internet and shows a better performance in comparison to the existing methods.


2014 ◽  
Vol 41 (2) ◽  
pp. 439-452 ◽  
Author(s):  
Guo-Cheng Lan ◽  
Tzung-Pei Hong ◽  
Hong-Yu Lee

Sign in / Sign up

Export Citation Format

Share Document