Sequential Pattern Mining Algorithm Based on Text Data: Taking the Fault Text Records as an Example

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.

Download Full-text

A hybrid context-aware approach for e-tourism package recommendation based on asymmetric similarity measurement and sequential pattern mining

Electronic Commerce Research and Applications ◽

10.1016/j.elerap.2020.100978 ◽

2020 ◽

Vol 42 ◽

pp. 100978 ◽

Cited By ~ 2

Author(s):

Maral Kolahkaj ◽

Ali Harounabadi ◽

Alireza Nikravanshalmani ◽

Rahim Chinipardaz

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Similarity Measurement ◽

Context Aware ◽

Asymmetric Similarity

Download Full-text

Detecting Implicit Security Exceptions Using an Improved Variable-Length Sequential Pattern Mining Method

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500462 ◽

2017 ◽

Vol 27 (08) ◽

pp. 1235-1268

Author(s):

Jinfu Chen ◽

Saihua Cai ◽

Dave Towey ◽

Lili Zhu ◽

Rubing Huang ◽

...

Keyword(s):

Visual Inspection ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Variable Length ◽

Sequential Pattern ◽

Sequential Patterns ◽

Mining Method ◽

Security Testing ◽

String Searching ◽

Correct Execution

The process of component security testing can produce massive amounts of monitor logs. Current approaches to detect implicit security exceptions (those which cannot be identified by visual inspection alone) compare correct execution sequences with fixed patterns mined from the execution of sequential patterns in the monitor logs. However, this is not efficient and is not suitable for mining large monitor logs. To enable effective mining of implicit security exceptions from large monitor logs, this paper proposes a method based on improved variable-length sequential pattern mining. The proposed method first mines the variable-length sequential patterns from correct execution sequences and from actual execution sequences, thus reducing the number of patterns. The sequential patterns are then detected using the Sunday string-searching algorithm. We conducted an experimental study based on this method, the results of which show that the proposed method can efficiently detect the implicit security exceptions of components.

Download Full-text

Mining of Sequential Patterns using Directed Graphs

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2242.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4002-4007

Keyword(s):

Pattern Mining ◽

Directed Graphs ◽

Real Life ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequential Data ◽

Sequence Database ◽

Directed Paths ◽

Digraph Model

Sequential pattern mining is one of the important functionalities of data mining. It is used for analyzing sequential database and discovers sequential patterns. It is focused for extracting interesting subsequences from a set of sequences. Various factors such as rate of occurrence, length, and profit are used to define the interestingness of subsequence derived from the sequence database. Sequential pattern mining has abundant real-life applications since sequential data is logically programmed as sequences of cipher in many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis. A large diversity of competent algorithms such as Prefixspan, GSP and Freespan have been proposed during the past few years. In this paper we propose a data model for organizing the sequential database, which consists of a directed graph DGS (cycles and several edges are allowed) and an organization of directed paths in DGS to represent a sequential data for discovering sequential pattern3 from a sequence database. Competent algorithms for constructing the digraph model (DGS) for extracting all sequential patterns and mining association rules are proposed. A number of theoretical parameters of digraph model are also introduced, which lead to more understanding of the problem.

Download Full-text

Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3487046 ◽

2022 ◽

Vol 16 (3) ◽

pp. 1-26

Author(s):

Jerry Chun-Wei Lin ◽

Youcef Djenouri ◽

Gautam Srivastava ◽

Yuanfa Li ◽

Philip S. Yu

Keyword(s):

Large Scale ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Main Memory ◽

Frequent Itemset ◽

Sequential Pattern ◽

Sequential Patterns ◽

Speed Up ◽

Mapreduce Model ◽

High Utility

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.

Download Full-text

Method of forming multi-leveled sequential patterns

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2016.02-03.158 ◽

2016 ◽

pp. 158-163

Author(s):

A.V. Moldavskaya ◽

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

New Form

The research is dedicated to the problem of large volumes of results acquired from sequential pattern mining. The new form of sequential patterns is proposed. The requirements for a programmed implementation of the described method are introduced. The results of experiments based on real malware behavior data are demonstrated.

Download Full-text

Mining Time-Interval Sequential Patterns with High Utility from Transaction Databases

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2016.p1018 ◽

2016 ◽

Vol 20 (6) ◽

pp. 1018-1026 ◽

Cited By ~ 1

Author(s):

Wen-Yen Wang ◽

◽

Anna Y.-Q. Huang ◽

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Business Practice ◽

Sequential Pattern ◽

Sequential Patterns ◽

Time Interval ◽

Business Managers ◽

Time Intervals ◽

High Utility ◽

Product Sales

The purpose of time-interval sequential pattern mining is to help superstore business managers promote product sales. Sequential pattern mining discovers the time interval patterns for items: for example, if most customers purchase product item A, and then buy items B and C after r to s and t to u days respectively, the time interval between r to s and t to u days can be provided to business managers to facilitate informed marketing decisions. We treat these time intervals as patterns to be mined, to predict the purchasing time intervals between A and B, as well as B and C. Nevertheless, little work considers the significance of product items while mining these time-interval sequential patterns. This work extends previous work and retains high-utility time interval patterns during pattern mining. This type of mining is meant to more closely reflect actual business practice. Experimental results show the differences between three mining approaches when jointly considering item utility and time intervals for purchased items. In addition to yielding more accurate patterns than the other two methods, the proposed UTMining_A method shortens execution times by delaying join processing and removing unnecessary records.

Download Full-text

Mining Frequent Utility Sequential Patterns in Progressive Databases by U-Pisa

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8442 ◽

2020 ◽

Vol 17 (4) ◽

pp. 1786-1795

Author(s):

K. M. V. Madan Kumar ◽

B. Srinivasa Rao

Keyword(s):

Decision Making ◽

Pattern Mining ◽

Weather Forecasting ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Recent Effort ◽

Biomedical Analysis ◽

Decision Making System ◽

Traditional Approaches

Sequential pattern mining is one of the most important aspects of data mining world and has a significant role in many applications like market analysis, biomedical analysis, weather forecasting etc. In the category of mining sequential patterns the usage of progressive database as an input database is relatively new and has a wide impact in decision-making system. In progressive sequential pattern mining, we discover the frequent sequences progressively with the help of period of Interest. As the traditional approaches of frequency based framework are not much more informative for decision making, in recent effort utility framework has been incorporated instead of frequency. This addressed many typical business concerns such as profit value associated with each pattern. In this paper, we applied the concept of frequent utility over the progressive database and discovered the sequential pattern efficiently. To do so we proposed an algorithm called U-Pisa which works progressively with the help of a quantitative progressive database. We conducted sub-stantial experiments on the proposed algorithm and proved that this process performs well.

Download Full-text

Trend Analysis of Product Function Using Sequential Pattern Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.519-520.736 ◽

2014 ◽

Vol 519-520 ◽

pp. 736-740

Author(s):

Li Yu ◽

Zai Fang Zhang

Keyword(s):

Pattern Mining ◽

Early Stage ◽

Dynamic Change ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Design Engineers ◽

Product Function ◽

Historical Database

During the early stage of product design, it is important for design engineers to decide the most appropriate functions for various customers. To facilitate this time consuming task, sequential pattern mining is applied to uncover the useful patterns in historical database. The mined sequential patterns can reflect the dynamic change of product functions, which can help design engineers find the most suitable product functions for customers. Based on the historical sales transactions of computer, a case study is conducted to illustrate the proposed method.

Download Full-text

DISCOVERING IMPORTANT SEQUENTIAL PATTERNS WITH LENGTH-DECREASING WEIGHTED SUPPORT CONSTRAINTS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003968 ◽

2010 ◽

Vol 09 (04) ◽

pp. 575-599 ◽

Cited By ~ 16

Author(s):

UNIL YUN ◽

KEUN HO RYU

Keyword(s):

Performance Test ◽

Pattern Mining ◽

Extension Property ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Efficiency And Effectiveness ◽

Growth Method ◽

Pruning Techniques

Sequential pattern mining with constraints has been developed to improve the efficiency and effectiveness in mining process. Specifically, there are two interesting constraints for sequential pattern mining. First, some sequences are more important and others are less important. Weight constraints consider the importance of sequences and items within sequences. Second, patterns including only a few items are interesting if they have high support. Meanwhile, long patterns can be interesting although their supports are relatively small. Weight constraints and length-decreasing support constraints are two paradigms aimed at finding important sequential patterns and reducing uninteresting patterns. Although weight and length-decreasing support constraints are vital elements, it is hard to consider both constraints by using previous approaches. In this paper, we integrate weight and length-decreasing support constraints by pushing two constraints into the prefix projection growth method. For pruning techniques, we define the Weighted Smallest Valid Extension property and apply the property to our pruning methods for reducing search space. In performance test, we show that our algorithm mines important sequential patterns with length-decreasing support constraints.

Download Full-text

The Sequential Pattern Mining Algorithm MHSP Based on MH

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.63-64.425 ◽

2011 ◽

Vol 63-64 ◽

pp. 425-430

Author(s):

Jun Wang ◽

Ya Qiong Jiang

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Experimental Results ◽

Sequential Pattern ◽

Sequential Patterns ◽

Important Method ◽

The Real ◽

Mining Algorithm ◽

Large Projection ◽

Growth Approach

Pattern growth approach is an important method in sequential pattern mining. Projection database based on the method is introduced in PrefixSpan, and the PrefixSpan algorithm can solve the problem of mining sequential patterns. But relative to large projection database, the performance of PrefixSpan is affected. Inspired by the prefix-divide method and MH structure, this paper proposed a new algorithm MHSP for sequential pattern mining. Based on the real datasets, experimental results show that the performance of MHSP algorithm is more than twice as fast as PrefixSpan.

Download Full-text