An Efficient Approach for Mining Weighted Sequential Patterns in Dynamic Databases

Keeping the generated fuzzy frequent itemsets up-to-date and discovering the new fuzzy frequent itemsets are challenging problems in dynamic databases. In this paper, the classical H-struct structure is extended to mining fuzzy frequent itemsets. The extended H-mine algorithm can use any t-norm operator to calculate the support of fuzzy itemset. The FP-tree-based structure called the Initial-FP-tree and the New-FP-tree are built to maintain the fuzzy frequent itemsets in the original database and the new inserted transactions respectively. The strategy of incremental mining of fuzzy frequent itemsets is achieved by breath-first-traversing the Initial-FP-tree and the New-FP-tree. All of the fuzzy frequent itemsets in the updated database can be obtained by traversing the Initial-FP-tree. The experiments on real datasets show that the proposed approach runs faster than the batch extended H-mine algorithm. Comparing with the existing algorithm for incremental mining fuzzy frequent itemsets, the proposed approach is superior in terms of the execution time. The memory cost of the proposed approach is lower than that of the existing algorithm when the minimum support threshold is low.

Download Full-text

EFFICIENT APPROACH TO DISCOVER INTERVAL-BASED SEQUENTIAL PATTERNS

Journal of Computer Science ◽

10.3844/jcssp.2013.225.234 ◽

2013 ◽

Vol 9 (2) ◽

pp. 225-234 ◽

Cited By ~ 6

Author(s):

Sadasivam

Keyword(s):

Sequential Patterns ◽

Efficient Approach

Download Full-text

An efficient approach for mining sequential patterns using multiple threads on very large databases

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2018.06.009 ◽

2018 ◽

Vol 74 ◽

pp. 242-251 ◽

Cited By ~ 8

Author(s):

Bao Huynh ◽

Cuong Trinh ◽

Huy Huynh ◽

Thien-Trang Van ◽

Bay Vo ◽

...

Keyword(s):

Sequential Patterns ◽

Efficient Approach ◽

Large Databases ◽

Very Large Databases ◽

Multiple Threads

Download Full-text

Updating the Built Prelarge Fast Updated Sequential Pattern Trees with Sequence Modification

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015010101 ◽

2015 ◽

Vol 11 (1) ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Tzung-Pei Hong ◽

Jingliang Zhang

Keyword(s):

Critical Issue ◽

Decision Makers ◽

Sequential Pattern ◽

Sequential Patterns ◽

Large Database ◽

The Past ◽

Real World Applications ◽

Dynamic Databases ◽

Very Large Database ◽

Sequence Modification

Mining useful information or knowledge from a very large database to aid managers or decision makers to make appropriate decisions is a critical issue in recent years. Sequential patterns can be used to discover the purchased behaviors of customers or the usage behaviors of users from Web log data. Most approaches process a static database to discover sequential patterns in a batch way. In real-world applications, transactions or sequences in databases are frequently changed. In the past, a fast updated sequential pattern (FUSP)-tree was proposed to handle dynamic databases whether for sequence insertion, deletion or modification based on FUP concepts. Original database is required to be re-scanned if it is necessary to maintain the small sequences which was not kept in the FUSP tree. In this paper, the prelarge concept was adopted to maintain and update the built prelarge FUSP tree for sequence modification. A prelarge FUSP tree is modified from FUSP tree for preserving not only the frequent 1-sequences but also the prelarge 1-sequences in the tree structure. The PRELARGE-FUSP-TREE-MOD maintenance algorithm is proposed to reduce the rescans of the original database due to the pruning properties of prelarge concept. When the number of modified sequences is smaller than the safety bound of the prelarge concept, better results can be obtained by the proposed PRELARGE-FUSP-TREE-MOD maintenance algorithm for sequence modification in dynamic databases.

Download Full-text

Efficiently Updating the Discovered Sequential Patterns for Sequence Modification

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016500455 ◽

2016 ◽

Vol 26 (08) ◽

pp. 1285-1313 ◽

Cited By ~ 1

Author(s):

Jerry Chun-Wei Lin ◽

Wensheng Gan ◽

Philippe Fournier-Viger ◽

Tzung-Pei Hong

Keyword(s):

Pattern Mining ◽

State Of The Art ◽

Poor Performance ◽

Sequential Patterns ◽

Batch Mode ◽

Dynamic Databases ◽

The Cost ◽

Mining Algorithms ◽

Sequence Modification ◽

Over Time

Mining sequential patterns (SPs) is a popular data mining task, which consists in finding interesting, unexpected, and useful patterns in sequence databases. It has several applications in many domains. However, most sequential pattern mining algorithms assume that databases are static, i.e. that they do not change over time. But in real-word applications, sequences are often modified. Thus, it is an important issue to design algorithms for updating SPs in a dynamic database environment. Although some algorithms have been proposed to maintain SPs in dynamic databases, these algorithms may have poor performance, especially when databases contain long sequences or a large number of sequences. This paper addresses this issue by proposing a novel dynamic mining approach named PreFUSP-TREE-MOD to address the problem of maintaining and updating discovered SPs when sequences in a database are modified. The proposed approach adopts the previously proposed pre-large concept using two support thresholds, to avoid scanning the database when possible, for updating the set of discovered patterns. Due to the pruning properties of the pre-large concept, the PreFUSP-TREE-MOD maintenance algorithm can effectively reduce the cost of database scans to maintain and update the built FUSP-tree for sequence modification. When the number of modified sequences is less than the safety bound of the pre-large concept, the proposed maintenance algorithm outperforms traditional SPM algorithms in batch mode, and the state-of-the-art maintenance algorithm in terms of execution time and number of tree nodes.

Download Full-text

Mining Closed Sequential Patterns in Progressive Databases

Journal of Information & Knowledge Management ◽

10.1142/s021964921350024x ◽

2013 ◽

Vol 12 (03) ◽

pp. 1350024

Author(s):

R. B. V. Subramanyam ◽

A. Suresh Rao ◽

Ramesh Karnati ◽

Somaraju Suvvari ◽

D. V. L. N. Somayajulu

Keyword(s):

Pattern Mining ◽

Synthetic Data ◽

Window Size ◽

Search Space ◽

Sequential Patterns ◽

Data Sets ◽

Time Stamp ◽

Algorithmic Approach ◽

Dynamic Databases ◽

Search Space Pruning

Previous studies of Mining Closed Sequential Patterns suggested several heuristics and proposed some computationally effective techniques. Like, Bidirectional Extension with closure checking schemas, Back scan search space pruning, and scan skip optimization used in BIDE (BI-Directional Extension) algorithm. Many researchers were inspired with the efficiency of BIDE, have tried to apply the technique implied by BIDE to various kinds of databases; we toofelt that it can be applied over progressive databases. Without tailoring BIDE, it cannot be applied to dynamic databases. The concept of progressive databases explores the nature of incremental databases by defining the parameters like, Period of Interest (POI), user defined minimum support. An algorithm PISA (Progressive mIning Sequential pAttern mining) was proposed by Huang et al. for finding all sequential patterns over progressive databases. The structure of PISA helps in space utilization by limiting the height of the tree, to the length of POI and this issue is also a motivation for further improvement in this work. In this paper, a tree structure LCT (Label, Customer-id, and Time stamp) is proposed, and an approach formining closed sequential patterns using closure checking schemas across the progressive databases concept. The significance of LCT structure is, confining its height to a maximum of two levels. The algorithmic approach describes that the window size can be increased by one unit of time. The complexity of the proposed algorithmic approach is also analysed. The approach is validated using synthetic data sets available in Internet and shows a better performance in comparison to the existing methods.

Download Full-text