A 3-phase approach based on sequential mining and dependency parsing for enhancing hypernym patterns performance

2021 ◽  
Vol 36 ◽  
Author(s):  
Ahmad Issa Alaa Aldine ◽  
Mounira Harzallah ◽  
Giuseppe Berio ◽  
Nicolas Béchet ◽  
Ahmad Faour

Abstract Patterns have been extensively used to extract hypernym relations from texts. The most popular patterns are Hearst’s patterns, formulated as regular expressions mainly based on lexical information. Experiences have reported good precision and low recall for such patterns. Thus, several approaches have been developed for improving recall. While these approaches perform better in terms of recall, it remains quite difficult to further increase recall without degrading precision. In this paper, we propose a novel 3-phase approach based on sequential pattern mining to improve pattern-based approaches in terms of both precision and recall by (i) using a rich pattern representation based on grammatical dependencies (ii) discovering new hypernym patterns, and (iii) extending hypernym patterns with anti-hypernym patterns to prune wrong extracted hypernym relations. The results obtained by performing experiments on three corpora confirm that using our approach, we are able to learn sequential patterns and combine them to outperform existing hypernym patterns in terms of precision and recall. The comparison to unsupervised distributional baselines for hypernym detection shows that, as expected, our approach yields much better performance. When compared to supervised distributional baselines for hypernym detection, our approach can be shown to be complementary and much less loosely coupled with training datasets and corpora.

2016 ◽  
Vol 16 (3) ◽  
pp. 185-194
Author(s):  
Haisong Huang ◽  
Liguo Yao ◽  
Chieh-Yuan Tsai

Abstract With the improvement of people’s living quality, more attention has been paid in food safety and quality. This is especially true for perishable agricultural and dairy products. It is quite often that customers receive poor or broken products due to mistakes or wrong ways in transportation. This leads customers the unsatisfied for companies’ products are relatively low. To solve the above problem, this paper proposes a new approach of using frequent closed sequential mining technology to analysis logistics data for helping companies to track the possible transportation problems. The approach consists of several important steps: RFID-enabled raw data collection, frequent sequential patterns mining, and patterns analysis. The experiment shows the proposed analysis method can discover many inside transportation service causes.


2012 ◽  
Vol 2 (4) ◽  
Author(s):  
Aloysius George ◽  
D. Binu

AbstractDiscovering sequential patterns is a rather well-studied area in data mining and has been found many diverse applications, such as basket analysis, telecommunications, etc. In this article, we propose an efficient algorithm that incorporates constraints and promotion-based marketing scenarios for the mining of valuable sequential patterns. Incorporating specific constraints into the sequential mining process has enabled the discovery of more user-centered patterns. We move one step ahead and integrate three significant marketing scenarios for mining promotion-oriented sequential patterns. The promotion-based market scenarios considered in the proposed research are 1) product Downturn, 2) product Revision and 3) product Launch (DRL). Each of these scenarios is characterized by distinct item and adjacency constraints. We have developed a novel DRL-PrefixSpan algorithm (tailored form of the PrefixSpan) for mining all length DRL patterns. The proposed algorithm has been validated on synthetic sequential databases. The experimental results demonstrate the effectiveness of incorporating the promotion-based marketing scenarios in the sequential pattern mining process.


2008 ◽  
pp. 2004-2021
Author(s):  
Jenq-Foung Yao ◽  
Yongqiao Xiao

Web usage mining is to discover useful patterns in the web usage data, and the patterns provide useful information about the user’s browsing behavior. This chapter examines different types of web usage traversal patterns and the related techniques used to uncover them, including Association Rules, Sequential Patterns, Frequent Episodes, Maximal Frequent Forward Sequences, and Maximal Frequent Sequences. As a necessary step for pattern discovery, the preprocessing of the web logs is described. Some important issues, such as privacy, sessionization, are raised, and the possible solutions are also discussed.


2018 ◽  
Vol 10 (11) ◽  
pp. 4330 ◽  
Author(s):  
Xinglong Yuan ◽  
Wenbing Chang ◽  
Shenghan Zhou ◽  
Yang Cheng

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.


Author(s):  
Jinfu Chen ◽  
Saihua Cai ◽  
Dave Towey ◽  
Lili Zhu ◽  
Rubing Huang ◽  
...  

The process of component security testing can produce massive amounts of monitor logs. Current approaches to detect implicit security exceptions (those which cannot be identified by visual inspection alone) compare correct execution sequences with fixed patterns mined from the execution of sequential patterns in the monitor logs. However, this is not efficient and is not suitable for mining large monitor logs. To enable effective mining of implicit security exceptions from large monitor logs, this paper proposes a method based on improved variable-length sequential pattern mining. The proposed method first mines the variable-length sequential patterns from correct execution sequences and from actual execution sequences, thus reducing the number of patterns. The sequential patterns are then detected using the Sunday string-searching algorithm. We conducted an experimental study based on this method, the results of which show that the proposed method can efficiently detect the implicit security exceptions of components.


2011 ◽  
Vol 109 ◽  
pp. 729-733
Author(s):  
Jiang Yin ◽  
Yun Li ◽  
Cen Cheng Shen ◽  
Bo Liu

Multi-Relational Sequential mining is one of the areas of data mining that rapidly developed in recent years. However, the performance issues of traditional mining methods are not ideal. To effectively mining the pattern, we proposed an algorithm based on Iceberg concept lattice, adopting optimization methods of partition and merger to just mining the frequent sequences. Experimental results show this algorithm effectively reduced the time complexity of multi-relational sequential pattern mining.


Sequential pattern mining is one of the important functionalities of data mining. It is used for analyzing sequential database and discovers sequential patterns. It is focused for extracting interesting subsequences from a set of sequences. Various factors such as rate of occurrence, length, and profit are used to define the interestingness of subsequence derived from the sequence database. Sequential pattern mining has abundant real-life applications since sequential data is logically programmed as sequences of cipher in many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis. A large diversity of competent algorithms such as Prefixspan, GSP and Freespan have been proposed during the past few years. In this paper we propose a data model for organizing the sequential database, which consists of a directed graph DGS (cycles and several edges are allowed) and an organization of directed paths in DGS to represent a sequential data for discovering sequential pattern3 from a sequence database. Competent algorithms for constructing the digraph model (DGS) for extracting all sequential patterns and mining association rules are proposed. A number of theoretical parameters of digraph model are also introduced, which lead to more understanding of the problem.


2019 ◽  
Vol 8 (3) ◽  
pp. 8585-8586

The Sequential Pattern Mining (SPM) is a fundamental task in data mining. The SPM mines subsequences from given sequence which can be used for various analyses. This paper aims to propose an efficient method for mining frequent sequential patterns in biological data. It also includes the k-mer for decomposing the sequence according to the user defined threshold value. The input data used is breast cancer gene BRCA2 normal and mutated BRCA2 gene. The parameters used for analyses are suffix, candidate pattern and frequent pattern. The suffix value is increased for mono-,di and trinucleotide in mutated gene and in frequent pattern tri-nucleotide has increased nucleotide in mutated gene. So this abnormal increase in pattern may leads to cancer in the human


Sign in / Sign up

Export Citation Format

Share Document