Mining Closed Repetitive Gapped Sequential Patterns Based on Repetition Linked WAP-Tree

Closed repetitive gapped sequential pattern mining has been gained more and more attention in recent years, in this paper, we propose a novel method MRCGP(mining closed repetitive gapped sequential pattern based on repetition linked WAP-Tree). In the first step of MRCGP, the given sequential database is transformed into a new database in which every item is expressed by its landmark; then a positional information table(PIT) which includes all of the position information of 1-frequent items is constructed, all of the repetitive gapped 2-sequential patterns of different items (RPDI) can be obtained through searching the positional information table; following, a repetitive linked web access pattern tree (RLWAP-Tree) is built, in RLWAP-Tree, the 1-frequent items are stored as header table, the items in header table will be linked to their same items which appear earliest in each sequence corresponding to RLWAP-Tree with solid line, all of the items in RLWAP-Tree are linked to their same items in the same sequences with broken line; through mining projection tree of the existing repetitive gapped pattern recursively, we can obtain the repetitive gapped sequential pattern; at the end, we get the closed repetitive gapped sequential pattern by checking inclusion relation of any two patterns. The experiment result shows MRCGP has better time efficiency.

Download Full-text

Knowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey

10.1063/1.3526223 ◽

2010 ◽

Cited By ~ 1

Author(s):

G. Shivaprasad ◽

N. V. Subbareddy ◽

U. Dinesh Acharya ◽

R. B. Patel ◽

B. P. Singh

Keyword(s):

Research And Development ◽

Knowledge Discovery ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Access Pattern ◽

Web Usage ◽

Web Access ◽

Web Access Pattern ◽

Usage Data

Download Full-text

Efficient Mining of Robust Closed Weighted Sequential Patterns Without Information Loss

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500074 ◽

2015 ◽

Vol 24 (01) ◽

pp. 1550007 ◽

Cited By ~ 16

Author(s):

Unil Yun ◽

Gwangbum Pyun ◽

Eunchul Yoon

Keyword(s):

High Performance ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Information Loss ◽

Sequential Pattern ◽

Sequential Patterns ◽

Performance Tests ◽

Web Access ◽

Complete Set ◽

Access Patterns

Sequential pattern mining has become one of the most important topics in data mining. It has broad applications such as analyzing customer purchase data, Web access patterns, network traffic data, DNA sequencing, and so on. Previous studies have concentrated on reducing redundant patterns among the sequential patterns, and on finding meaningful patterns from huge datasets. In sequential pattern mining, closed sequential pattern mining and weighted sequential pattern mining are the two main approaches to perform mining tasks. This is because closed sequential pattern mining finds representative sequential patterns which show exactly the same knowledge as the complete set of frequent sequential patterns, and weight-based sequential pattern mining discovers important sequential patterns by considering the importance of each sequential pattern. In this paper, we study the problem of mining robust closed weighted sequential patterns by integrating two paradigms from large sequence databases. We first show that the joining order between the weight constraints and the closure property in sequential pattern mining leads to different sets of results. From our analysis of joining orders, we suggest robust closed weighted sequential pattern mining without information loss, and present how to discover representative important sequential patterns without information loss. Through performance tests, we show that our approach gives high performance in terms of efficiency, effectiveness, memory usage, and scalability.

Download Full-text

Web Personalization Based on Enhanced Web Access Pattern using Sequential Pattern Mining

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v5i7.08 ◽

2016 ◽

Author(s):

Kuldeep Singh Rathore ◽

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Access Pattern ◽

Web Personalization ◽

Web Access ◽

Web Access Pattern

Download Full-text

Web Access Pattern Mining – A Survey

Lecture Notes in Computer Science - Data Engineering and Management ◽

10.1007/978-3-642-27872-3_4 ◽

2012 ◽

pp. 24-31 ◽

Cited By ~ 1

Author(s):

A. Rajimol ◽

G. Raju

Keyword(s):

Pattern Mining ◽

Access Pattern ◽

Web Access ◽

Web Access Pattern

Download Full-text

Sequential Pattern Mining Algorithm Based on Text Data: Taking the Fault Text Records as an Example

Sustainability ◽

10.3390/su10114330 ◽

2018 ◽

Vol 10 (11) ◽

pp. 4330 ◽

Cited By ~ 2

Author(s):

Xinglong Yuan ◽

Wenbing Chang ◽

Shenghan Zhou ◽

Yang Cheng

Keyword(s):

Time Series ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Fault Classification ◽

Sequential Patterns ◽

Series Data ◽

Similarity Measurement ◽

Text Similarity ◽

Text Data

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.

Download Full-text

Detecting Implicit Security Exceptions Using an Improved Variable-Length Sequential Pattern Mining Method

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500462 ◽

2017 ◽

Vol 27 (08) ◽

pp. 1235-1268

Author(s):

Jinfu Chen ◽

Saihua Cai ◽

Dave Towey ◽

Lili Zhu ◽

Rubing Huang ◽

...

Keyword(s):

Visual Inspection ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Variable Length ◽

Sequential Pattern ◽

Sequential Patterns ◽

Mining Method ◽

Security Testing ◽

String Searching ◽

Correct Execution

The process of component security testing can produce massive amounts of monitor logs. Current approaches to detect implicit security exceptions (those which cannot be identified by visual inspection alone) compare correct execution sequences with fixed patterns mined from the execution of sequential patterns in the monitor logs. However, this is not efficient and is not suitable for mining large monitor logs. To enable effective mining of implicit security exceptions from large monitor logs, this paper proposes a method based on improved variable-length sequential pattern mining. The proposed method first mines the variable-length sequential patterns from correct execution sequences and from actual execution sequences, thus reducing the number of patterns. The sequential patterns are then detected using the Sunday string-searching algorithm. We conducted an experimental study based on this method, the results of which show that the proposed method can efficiently detect the implicit security exceptions of components.

Download Full-text

Mining of Sequential Patterns using Directed Graphs

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2242.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4002-4007

Keyword(s):

Pattern Mining ◽

Directed Graphs ◽

Real Life ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequential Data ◽

Sequence Database ◽

Directed Paths ◽

Digraph Model

Sequential pattern mining is one of the important functionalities of data mining. It is used for analyzing sequential database and discovers sequential patterns. It is focused for extracting interesting subsequences from a set of sequences. Various factors such as rate of occurrence, length, and profit are used to define the interestingness of subsequence derived from the sequence database. Sequential pattern mining has abundant real-life applications since sequential data is logically programmed as sequences of cipher in many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis. A large diversity of competent algorithms such as Prefixspan, GSP and Freespan have been proposed during the past few years. In this paper we propose a data model for organizing the sequential database, which consists of a directed graph DGS (cycles and several edges are allowed) and an organization of directed paths in DGS to represent a sequential data for discovering sequential pattern3 from a sequence database. Competent algorithms for constructing the digraph model (DGS) for extracting all sequential patterns and mining association rules are proposed. A number of theoretical parameters of digraph model are also introduced, which lead to more understanding of the problem.

Download Full-text

Sequential Pattern Mining from Sequential Data

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch067 ◽

2009 ◽

pp. 622-631

Author(s):

Shigeaki Sakurai

Keyword(s):

Pattern Mining ◽

Pattern Discovery ◽

Sequential Pattern ◽

The Other ◽

Sequential Patterns ◽

Sequential Data ◽

Frequent Patterns ◽

New Knowledge ◽

Discovery Method ◽

Time Information

Owing to the progress of computer and network environments, it is easy to collect data with time information such as daily business reports, weblog data, and physiological information. This is the context in which methods of analyzing data with time information have been studied. This chapter focuses on a sequential pattern discovery method from discrete sequential data. The methods proposed by Pei et al. (2001), Srikant & Agrawal (1996), and Zaki (2001) efficiently discover the frequent patterns as characteristic patterns. However, the discovered patterns do not always correspond to the interests of analysts, because the patterns are common and are not a source of new knowledge for the analysts. The problem has been pointed out in connection with the discovery of associative rules. Blanchard et al. (2005), Brin et al. (1997), Silberschatz et al. (1996), and Suzuki et al. (2005) propose other criteria in order to discover other kinds of characteristic patterns. The patterns discovered by the criteria are not always frequent but are characteristic of viewpoints. The criteria may be applicable to discovery methods of sequential patterns. However, these criteria do not satisfy the Apriori property. It is difficult for the methods based on the criteria to efficiently discover the patterns. On the other hand, methods that use the background knowledge of analysts have been proposed in order to discover sequential patterns corresponding to the interests of analysts (Garofalakis et al., 1999; Pei et al., 2002; Sakurai et al., 2008b; Yen, 2005).

Download Full-text

Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3487046 ◽

2022 ◽

Vol 16 (3) ◽

pp. 1-26

Author(s):

Jerry Chun-Wei Lin ◽

Youcef Djenouri ◽

Gautam Srivastava ◽

Yuanfa Li ◽

Philip S. Yu

Keyword(s):

Large Scale ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Main Memory ◽

Frequent Itemset ◽

Sequential Pattern ◽

Sequential Patterns ◽

Speed Up ◽

Mapreduce Model ◽

High Utility

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.

Download Full-text

Method of forming multi-leveled sequential patterns

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2016.02-03.158 ◽

2016 ◽

pp. 158-163

Author(s):

A.V. Moldavskaya ◽

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

New Form

The research is dedicated to the problem of large volumes of results acquired from sequential pattern mining. The new form of sequential patterns is proposed. The requirements for a programmed implementation of the described method are introduced. The results of experiments based on real malware behavior data are demonstrated.

Download Full-text