A heuristic to predict the optimal pattern-growth direction for the pattern growth-based sequential pattern mining approach

Sequential pattern mining is an efficient technique for discovering recurring structures or patterns from very large datasets, with a very large field of applications. It aims at extracting a set of attributes, shared across time among a large number of objects in a given database. Previous studies have developed two major classes of sequential pattern mining methods, namely, the candidate generation-and-test approach based on either vertical or horizontal data formats represented respectively by GSP and SPADE, and the pattern-growth approach represented by FreeSpan, PrefixSpan and their further extensions. The performances of these algorithms depend on how patterns grow. Because of this, we introduce a heuristic to predict the optimal pattern-growth direction, i.e. the pattern-growth direction leading to the best performance in terms of runtime and memory usage. Then, we perform a number of experimentations on both real-life and synthetic datasets to test the heuristic. The performance analysis of these experimentations show that the heuristic prediction is reliable in general.

Download Full-text

The Impact of the Pattern-Growth Ordering on the Performances of Pattern Growth-Based Sequential Pattern Mining Algorithms

Computer and Information Science ◽

10.5539/cis.v10n1p23 ◽

2016 ◽

Vol 10 (1) ◽

pp. 23

Author(s):

Edith Belise Kenmogne

Keyword(s):

Dna Analysis ◽

Pattern Mining ◽

User Behavior ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sensor Data ◽

Large Field ◽

Pattern Growth ◽

The Impact

Sequential Pattern Mining is an efficient technique for discovering recurring structures or patterns from very large datasetwidely addressed by the data mining community, with a very large field of applications, such as cross-marketing, DNA analysis, web log analysis,user behavior, sensor data, etc. The sequence pattern mining aims at extractinga set of attributes, shared across time among a large number of objects in a given database. Previous studies have developed two major classes of sequential pattern mining methods, namely, the candidate generation-and-test approach based on either vertical or horizontal data formats represented respectively by GSP and SPADE, and the pattern-growth approach represented by FreeSpan and PrefixSpan.In this paper, we are interested in the study of the impact of the pattern-growthordering on the performances of pattern growth-based sequential pattern mining algorithms.To this end, we introduce a class of pattern-growth orderings, called linear orderings, for which patterns are grown by making grow either the currentpattern prefix or the current pattern suffix from the same position at eachgrowth-step.We study the problem of pruning and partitioning the search space followinglinear orderings. Experimentations show that the order in which patternsgrow has a significant influence on the performances.

Download Full-text

Mining of Sequential Patterns using Directed Graphs

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2242.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4002-4007

Keyword(s):

Pattern Mining ◽

Directed Graphs ◽

Real Life ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequential Data ◽

Sequence Database ◽

Directed Paths ◽

Digraph Model

Sequential pattern mining is one of the important functionalities of data mining. It is used for analyzing sequential database and discovers sequential patterns. It is focused for extracting interesting subsequences from a set of sequences. Various factors such as rate of occurrence, length, and profit are used to define the interestingness of subsequence derived from the sequence database. Sequential pattern mining has abundant real-life applications since sequential data is logically programmed as sequences of cipher in many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis. A large diversity of competent algorithms such as Prefixspan, GSP and Freespan have been proposed during the past few years. In this paper we propose a data model for organizing the sequential database, which consists of a directed graph DGS (cycles and several edges are allowed) and an organization of directed paths in DGS to represent a sequential data for discovering sequential pattern3 from a sequence database. Competent algorithms for constructing the digraph model (DGS) for extracting all sequential patterns and mining association rules are proposed. A number of theoretical parameters of digraph model are also introduced, which lead to more understanding of the problem.

Download Full-text

HIGH UTILITY ITEM INTERVAL SEQUENTIAL PATTERN MINING ALGORITHM

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/1/1/14398 ◽

2020 ◽

Vol 36 (1) ◽

pp. 1-15

Author(s):

Tran Huy Duong ◽

Nguyen Truong Thang ◽

Vu Duc Thi ◽

Tran The Anh

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequence Database ◽

Mining Algorithm ◽

Pattern Growth ◽

High Utility ◽

Growth Approach

High utility sequential pattern mining is a popular topic in data mining with the main purpose is to extract sequential patterns with high utility in the sequence database. Many recent works have proposed methods to solve this problem. However, most of them does not consider item intervals of sequential patterns which can lead to the extraction of sequential patterns with too long item interval, thus making little sense. In this paper, we propose a High Utility Item Interval Sequential Pattern (HUISP) algorithm to solve this problem. Our algorithm uses pattern growth approach and some techniques to increase algorithm's performance.

Download Full-text

From sequential pattern mining to structured pattern mining: A pattern-growth approach

Journal of Computer Science and Technology ◽

10.1007/bf02944897 ◽

2004 ◽

Vol 19 (3) ◽

pp. 257-279 ◽

Cited By ~ 27

Author(s):

Jia-Wei Han ◽

Jian Pei ◽

Xi-Feng Yan

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Pattern Growth ◽

Growth Approach

Download Full-text

Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints

Machine Learning and Data Mining in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/3-540-45065-3_21 ◽

2007 ◽

pp. 239-251 ◽

Cited By ~ 29

Author(s):

Cláudia Antunes ◽

Arlindo L. Oliveira

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Pattern Growth ◽

Growth Methods

Download Full-text

A review on sequential pattern mining using pattern growth approach

2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) ◽

10.1109/wispnet.2016.7566371 ◽

2016 ◽

Cited By ~ 3

Author(s):

Roshani Patel ◽

Tarunika Chaudhari

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Pattern Growth ◽

Growth Approach

Download Full-text

An Efficient Algorithm for Extracting High-Utility Hierarchical Sequential Patterns

Wireless Communications and Mobile Computing ◽

10.1155/2020/8816228 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yiwen Zu

Keyword(s):

Pattern Mining ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Second Phase ◽

Two Phase ◽

High Utility ◽

Synthetic Datasets ◽

Hierarchical Relation

High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, where utility is used to measure the importance or weight of a sequence. However, the underlying informative knowledge of hierarchical relation between different items is ignored in HUSPM, which makes HUSPM unable to extract more interesting patterns. In this paper, we incorporate the hierarchical relation of items into HUSPM and propose a two-phase algorithm MHUH, the first algorithm for high-utility hierarchical sequential pattern mining (HUHSPM). In the first phase named Extension, we use the existing algorithm FHUSpan which we proposed earlier to efficiently mine the general high-utility sequences (g-sequences); in the second phase named Replacement, we mine the special high-utility sequences with the hierarchical relation (s-sequences) as high-utility hierarchical sequential patterns from g-sequences. For further improvements of efficiency, MHUH takes several strategies such as Reduction, FGS, and PBS and a novel upper bounder TSWU, which will be able to greatly reduce the search space. Substantial experiments were conducted on both real and synthetic datasets to assess the performance of the two-phase algorithm MHUH in terms of runtime, number of patterns, and scalability. Conclusion can be drawn from the experiment that MHUH extracts more interesting patterns with underlying informative knowledge efficiently in HUHSPM.

Download Full-text

Constraint-based sequential pattern mining: the pattern-growth methods

Journal of Intelligent Information Systems ◽

10.1007/s10844-006-0006-z ◽

2007 ◽

Vol 28 (2) ◽

pp. 133-160 ◽

Cited By ~ 125

Author(s):

Jian Pei ◽

Jiawei Han ◽

Wei Wang

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Pattern Growth ◽

Growth Methods

Download Full-text

MINING TOP-K FREQUENT SEQUENTIAL PATTERN IN ITEM INTERVAL EXTENDED SEQUENCE DATABASE

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/34/3/13053 ◽

2018 ◽

Vol 34 (3) ◽

pp. 249-263

Author(s):

Duong Huy Tran ◽

Thang Truong Nguyen ◽

Thi Duc Vu ◽

Anh The Tran

Keyword(s):

Pattern Mining ◽

Real Life ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequence Database ◽

Extended Sequence ◽

Support Threshold ◽

Interesting Task ◽

Frequent Sequential Pattern

Abstract. Frequent sequential pattern mining in item interval extended sequence database (iSDB) has been one of interesting task in recent years. Unlike classic frequent sequential pattern mining, the pattern mining in iSDB also consider the item interval between successive items; thus, it may extract more meaningful sequential patterns in real life. Most previous frequent sequential pattern mining in iSDB algorithms needs a minimum support threshold (minsup) to perform the mining. However, it’s not easy for users to provide an appropriate threshold in practice. The too high minsup value will lead to missing valuable patterns, while the too low minsup value may generate too many useless patterns. To address this problem, we propose an algorithm: TopKWFP – Top-k weighted frequent sequential pattern mining in item interval extended sequence database. Our algorithm doesn’t need to provide a fixed minsup value, this minsup value will dynamically raise during the mining process

Download Full-text

A pattern growth-based sequential pattern mining algorithm called prefixSuffixSpan

ICST Transactions on Scalable Information Systems ◽

10.4108/eai.18-1-2017.152103 ◽

2017 ◽

Vol 4 (12) ◽

pp. 152103

Author(s):

Kenmogne Edith Belise ◽

Tadmon Calvin ◽

Nkambou Roger

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Mining Algorithm ◽

Pattern Growth

Download Full-text