scholarly journals Status Set Sequential Pattern Mining Considering Time Windows and Periodic Analysis of Patterns

Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 738
Author(s):  
Shenghan Zhou ◽  
Houxiang Liu ◽  
Bang Chen ◽  
Wenkui Hou ◽  
Xinpeng Ji ◽  
...  

The traditional sequential pattern mining method is carried out considering the whole time period and often ignores the sequential patterns that only occur in local time windows, as well as possible periodicity. Therefore, in order to overcome the limitations of traditional methods, this paper proposes status set sequential pattern mining with time windows (SSPMTW). In contrast to traditional methods, the item status is considered, and time windows, minimum confidence, minimum coverage, minimum factor set ratios and other constraints are added to mine more valuable rules in local time windows. The periodicity of these rules is also analyzed. According to the proposed method, this paper improves the Apriori algorithm, proposes the TW-Apriori algorithm, and explains the basic idea of the algorithm. Then, the feasibility, validity and efficiency of the proposed method and algorithm are verified by small-scale and large-scale examples. In a large-scale numerical example solution, the influence of various constraints on the mining results is analyzed. Finally, the solution results of SSPM and SSPMTW are compared and analyzed, and it is suggested that SSPMTW can excavate the laws existing in local time windows and analyze the periodicity of the laws, which solves the problem of SSPM ignoring the laws existing in local time windows and overcomes the limitations of traditional sequential pattern mining algorithms. In addition, the rules mined by SSPMTW reduce the entropy of the system.

2022 ◽  
Vol 16 (3) ◽  
pp. 1-26
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Yuanfa Li ◽  
Philip S. Yu

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.


Author(s):  
Houxiang Liu ◽  
Shenghan Zhou ◽  
Bang Chen ◽  
XinPeng Ji ◽  
Yue Zhang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document