A New Approach to String Pattern Mining with Approximate Match

Author(s):  
Tetsushi Matsui ◽  
Takeaki Uno ◽  
Juzoh Umemori ◽  
Tsuyoshi Koide
2012 ◽  
Vol 25 (1) ◽  
pp. 45-50 ◽  
Author(s):  
M. Baena-Garcı´a ◽  
R. Morales-Bueno

Author(s):  
DONG (HAOYUAN) LI ◽  
ANNE LAURENT ◽  
PASCAL PONCELET

Sequential pattern mining is the method that has received much attention in sequence data mining research and applications, however, a drawback is that it does not profit from prior knowledge of domains. In our previous work, we proposed a belief-driven method with fuzzy set theory for discovering the unexpected sequences that contradict existing knowledge of data, including occurrence constraints and semantic contradictions. In this paper, we present a new approach that discovers unexpected sequences with determining semantic contradictions by using concept hierarchies associated with the data. We evaluate the effectiveness of our approach with experiments on Web usage analysis.


2010 ◽  
Vol 50 (1) ◽  
pp. 270-280 ◽  
Author(s):  
Hongyan Liu ◽  
Fangzhou Lin ◽  
Jun He ◽  
Yunjue Cai

2015 ◽  
Vol 24 (2) ◽  
pp. 181-197
Author(s):  
Lan Vu ◽  
Gita Alaghband

AbstractIn this article, we present a new approach for frequent pattern mining (FPM) that runs fast for both sparse and dense databases. Two algorithms, FEM and DFEM, based on our approach are also introduced. FEM applies a fixed threshold as the condition for switching between the two mining strategies; meanwhile, DFEM adopts this threshold dynamically at runtime to best fit the characteristics of the database during the mining process, especially when minimum support threshold is low. Additionally, we present optimization techniques for the proposed algorithms to speed the mining process, reduce the memory usage, and optimize the I/O cost. We also analyze in depth the performance of FEM and DFEM and compare them with several existing algorithms. The experimental results show that FEM and DFEM achieve a significant improvement in execution time and consume less memory than many popular FPM algorithms including the well-known Apriori, FP-growth, and Eclat.


2015 ◽  
Vol 2015 ◽  
pp. 1-11 ◽  
Author(s):  
Yun Xue ◽  
Zhengling Liao ◽  
Meihang Li ◽  
Jie Luo ◽  
Qiuhua Kuang ◽  
...  

Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long patterns with few supporting sequences, incur explosive computational costs and are completely pruned by most popular methods. In this paper, we propose an exact method to discover all OPSMs based on frequent sequential pattern mining. First, an existing algorithm was adjusted to disclose all common subsequence (ACS) between every two row sequences, and therefore all deep OPSMs will not be missed. Then, an improved data structure for prefix tree was used to store and traverse ACS, and Apriori principle was employed to efficiently mine the frequent sequential pattern. Finally, experiments were implemented on gene and synthetic datasets. Results demonstrated the effectiveness and efficiency of this method.


2016 ◽  
Vol 16 (3) ◽  
pp. 185-194
Author(s):  
Haisong Huang ◽  
Liguo Yao ◽  
Chieh-Yuan Tsai

Abstract With the improvement of people’s living quality, more attention has been paid in food safety and quality. This is especially true for perishable agricultural and dairy products. It is quite often that customers receive poor or broken products due to mistakes or wrong ways in transportation. This leads customers the unsatisfied for companies’ products are relatively low. To solve the above problem, this paper proposes a new approach of using frequent closed sequential mining technology to analysis logistics data for helping companies to track the possible transportation problems. The approach consists of several important steps: RFID-enabled raw data collection, frequent sequential patterns mining, and patterns analysis. The experiment shows the proposed analysis method can discover many inside transportation service causes.


Sign in / Sign up

Export Citation Format

Share Document