A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance

Abstract Patterns have been extensively used to extract hypernym relations from texts. The most popular patterns are Hearst’s patterns, formulated as regular expressions mainly based on lexical information. Experiences have reported good precision and low recall for such patterns. Thus, several approaches have been developed for improving recall. While these approaches perform better in terms of recall, it remains quite difficult to further increase recall without degrading precision. In this paper, we propose a novel 3-phase approach based on sequential pattern mining to improve pattern-based approaches in terms of both precision and recall by (i) using a rich pattern representation based on grammatical dependencies (ii) discovering new hypernym patterns, and (iii) extending hypernym patterns with anti-hypernym patterns to prune wrong extracted hypernym relations. The results obtained by performing experiments on three corpora confirm that using our approach, we are able to learn sequential patterns and combine them to outperform existing hypernym patterns in terms of precision and recall. The comparison to unsupervised distributional baselines for hypernym detection shows that, as expected, our approach yields much better performance. When compared to supervised distributional baselines for hypernym detection, our approach can be shown to be complementary and much less loosely coupled with training datasets and corpora.

Download Full-text

Transportation Service Quality Improvement through Closed Sequential Pattern Mining Approach

Cybernetics and Information Technologies ◽

10.1515/cait-2016-0042 ◽

2016 ◽

Vol 16 (3) ◽

pp. 185-194

Author(s):

Haisong Huang ◽

Liguo Yao ◽

Chieh-Yuan Tsai

Keyword(s):

Quality Improvement ◽

Food Safety ◽

Dairy Products ◽

Pattern Mining ◽

Sequential Patterns ◽

Analysis Method ◽

New Approach ◽

Transportation Service ◽

Sequential Mining ◽

Food Safety And Quality

Abstract With the improvement of people’s living quality, more attention has been paid in food safety and quality. This is especially true for perishable agricultural and dairy products. It is quite often that customers receive poor or broken products due to mistakes or wrong ways in transportation. This leads customers the unsatisfied for companies’ products are relatively low. To solve the above problem, this paper proposes a new approach of using frequent closed sequential mining technology to analysis logistics data for helping companies to track the possible transportation problems. The approach consists of several important steps: RFID-enabled raw data collection, frequent sequential patterns mining, and patterns analysis. The experiment shows the proposed analysis method can discover many inside transportation service causes.

Download Full-text

DRL-Prefixspan: A novel pattern growth algorithm for discovering downturn, revision and launch (DRL) sequential patterns

Open Computer Science ◽

10.2478/s13537-012-0030-8 ◽

2012 ◽

Vol 2 (4) ◽

Cited By ~ 4

Author(s):

Aloysius George ◽

D. Binu

Keyword(s):

Data Mining ◽

Efficient Algorithm ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Experimental Results ◽

Sequential Pattern ◽

Product Launch ◽

Sequential Patterns ◽

Sequential Mining ◽

One Step

AbstractDiscovering sequential patterns is a rather well-studied area in data mining and has been found many diverse applications, such as basket analysis, telecommunications, etc. In this article, we propose an efficient algorithm that incorporates constraints and promotion-based marketing scenarios for the mining of valuable sequential patterns. Incorporating specific constraints into the sequential mining process has enabled the discovery of more user-centered patterns. We move one step ahead and integrate three significant marketing scenarios for mining promotion-oriented sequential patterns. The promotion-based market scenarios considered in the proposed research are 1) product Downturn, 2) product Revision and 3) product Launch (DRL). Each of these scenarios is characterized by distinct item and adjacency constraints. We have developed a novel DRL-PrefixSpan algorithm (tailored form of the PrefixSpan) for mining all length DRL patterns. The proposed algorithm has been validated on synthetic sequential databases. The experimental results demonstrate the effectiveness of incorporating the promotion-based marketing scenarios in the sequential pattern mining process.

Download Full-text

Pattern Mining as Abduction: From Snapshots to Spatio-Temporal Sequential Patterns

Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007) ◽

10.1109/icdmw.2007.71 ◽

2007 ◽

Author(s):

Shyamanta M. Hazarika

Keyword(s):

Pattern Mining ◽

Sequential Patterns ◽

Spatio Temporal

Download Full-text

RE-SPaM: Using Regular Expressions for Sequential Pattern Mining in Trajectory Databases

10.1109/icdmw.2008.14 ◽

2008 ◽

Cited By ~ 1

Author(s):

Leticia I. Gómez ◽

Alejandro A. Vaisman

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Regular Expressions

Download Full-text

Traversal Pattern Mining in Web Usage Data

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch119 ◽

2008 ◽

pp. 2004-2021

Author(s):

Jenq-Foung Yao ◽

Yongqiao Xiao

Keyword(s):

Pattern Mining ◽

Pattern Discovery ◽

Web Usage Mining ◽

Sequential Patterns ◽

Web Usage ◽

Web Logs ◽

Frequent Episodes ◽

Browsing Behavior ◽

The Web ◽

Usage Data

Web usage mining is to discover useful patterns in the web usage data, and the patterns provide useful information about the user’s browsing behavior. This chapter examines different types of web usage traversal patterns and the related techniques used to uncover them, including Association Rules, Sequential Patterns, Frequent Episodes, Maximal Frequent Forward Sequences, and Maximal Frequent Sequences. As a necessary step for pattern discovery, the preprocessing of the web logs is described. Some important issues, such as privacy, sessionization, are raised, and the possible solutions are also discussed.

Download Full-text

Sequential Pattern Mining Algorithm Based on Text Data: Taking the Fault Text Records as an Example

Sustainability ◽

10.3390/su10114330 ◽

2018 ◽

Vol 10 (11) ◽

pp. 4330 ◽

Cited By ~ 2

Author(s):

Xinglong Yuan ◽

Wenbing Chang ◽

Shenghan Zhou ◽

Yang Cheng

Keyword(s):

Time Series ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Fault Classification ◽

Sequential Patterns ◽

Series Data ◽

Similarity Measurement ◽

Text Similarity ◽

Text Data

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.

Download Full-text

Detecting Implicit Security Exceptions Using an Improved Variable-Length Sequential Pattern Mining Method

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194017500462 ◽

2017 ◽

Vol 27 (08) ◽

pp. 1235-1268

Author(s):

Jinfu Chen ◽

Saihua Cai ◽

Dave Towey ◽

Lili Zhu ◽

Rubing Huang ◽

...

Keyword(s):

Visual Inspection ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Variable Length ◽

Sequential Pattern ◽

Sequential Patterns ◽

Mining Method ◽

Security Testing ◽

String Searching ◽

Correct Execution

The process of component security testing can produce massive amounts of monitor logs. Current approaches to detect implicit security exceptions (those which cannot be identified by visual inspection alone) compare correct execution sequences with fixed patterns mined from the execution of sequential patterns in the monitor logs. However, this is not efficient and is not suitable for mining large monitor logs. To enable effective mining of implicit security exceptions from large monitor logs, this paper proposes a method based on improved variable-length sequential pattern mining. The proposed method first mines the variable-length sequential patterns from correct execution sequences and from actual execution sequences, thus reducing the number of patterns. The sequential patterns are then detected using the Sunday string-searching algorithm. We conducted an experimental study based on this method, the results of which show that the proposed method can efficiently detect the implicit security exceptions of components.

Download Full-text

Multi-Relational Sequential Pattern Mining Based on Iceberg Concept Lattice

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.109.729 ◽

2011 ◽

Vol 109 ◽

pp. 729-733

Author(s):

Jiang Yin ◽

Yun Li ◽

Cen Cheng Shen ◽

Bo Liu

Keyword(s):

Data Mining ◽

Time Complexity ◽

Pattern Mining ◽

Concept Lattice ◽

Optimization Methods ◽

Sequential Pattern Mining ◽

Experimental Results ◽

Sequential Pattern ◽

Sequential Mining ◽

Mining Methods

Multi-Relational Sequential mining is one of the areas of data mining that rapidly developed in recent years. However, the performance issues of traditional mining methods are not ideal. To effectively mining the pattern, we proposed an algorithm based on Iceberg concept lattice, adopting optimization methods of partition and merger to just mining the frequent sequences. Experimental results show this algorithm effectively reduced the time complexity of multi-relational sequential pattern mining.

Download Full-text

Mining of Sequential Patterns using Directed Graphs

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2242.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4002-4007

Keyword(s):

Pattern Mining ◽

Directed Graphs ◽

Real Life ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequential Data ◽

Sequence Database ◽

Directed Paths ◽

Digraph Model

Sequential pattern mining is one of the important functionalities of data mining. It is used for analyzing sequential database and discovers sequential patterns. It is focused for extracting interesting subsequences from a set of sequences. Various factors such as rate of occurrence, length, and profit are used to define the interestingness of subsequence derived from the sequence database. Sequential pattern mining has abundant real-life applications since sequential data is logically programmed as sequences of cipher in many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis. A large diversity of competent algorithms such as Prefixspan, GSP and Freespan have been proposed during the past few years. In this paper we propose a data model for organizing the sequential database, which consists of a directed graph DGS (cycles and several edges are allowed) and an organization of directed paths in DGS to represent a sequential data for discovering sequential pattern3 from a sequence database. Competent algorithms for constructing the digraph model (DGS) for extracting all sequential patterns and mining association rules are proposed. A number of theoretical parameters of digraph model are also introduced, which lead to more understanding of the problem.

Download Full-text

Frequent Sequential Patterns (FSP) Algorithm for Finding Mutations in BRCA2 Gene

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6507.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 8585-8586

Keyword(s):

Input Data ◽

Pattern Mining ◽

Brca2 Gene ◽

Threshold Value ◽

Biological Data ◽

Frequent Pattern ◽

Sequential Patterns ◽

Cancer Gene ◽

Abnormal Increase ◽

Breast Cancer Gene

The Sequential Pattern Mining (SPM) is a fundamental task in data mining. The SPM mines subsequences from given sequence which can be used for various analyses. This paper aims to propose an efficient method for mining frequent sequential patterns in biological data. It also includes the k-mer for decomposing the sequence according to the user defined threshold value. The input data used is breast cancer gene BRCA2 normal and mutated BRCA2 gene. The parameters used for analyses are suffix, candidate pattern and frequent pattern. The suffix value is increased for mono-,di and trinucleotide in mutated gene and in frequent pattern tri-nucleotide has increased nucleotide in mutated gene. So this abnormal increase in pattern may leads to cancer in the human

Download Full-text