Mining Top-K Click Stream Sequences Patterns

Author(s):  
MEHDI Haj Ali ◽  
Qun-Xiong Zhu ◽  
Yan-Lin He

<p><em>Sequential pattern mining, it  is not just important in data mining field , but  it is the basis of many applications .However, running applications cost time and memory, especially when dealing with dense of the dataset. Setting the proper minimum support threshold is one of the factors that consume more memory and time. However ,  it is difficult for users to get the appropriate patterns, it may present too many sequential patterns  and makes it difficult for users to comprehend the results. The problem becomes worse and worse when dealing with long click stream sequences or huge dataset. As a solution, we developed an efficient algorithm, called TopK (Top-K click stream sequence pattern mining), which employs the output as top-k patterns , K is the most important and relevant frequencies (with a high support) . However ,our algorithm based on pseudo-projection to avoid consuming more time and memory, and uses several efficient search space pruning methods together with BI-Directional Extension. Our extensive study and experiments on real click stream datasets show TopK significantly outperforms the previous algorithms.</em></p>

2013 ◽  
Vol 12 (03) ◽  
pp. 1350024
Author(s):  
R. B. V. Subramanyam ◽  
A. Suresh Rao ◽  
Ramesh Karnati ◽  
Somaraju Suvvari ◽  
D. V. L. N. Somayajulu

Previous studies of Mining Closed Sequential Patterns suggested several heuristics and proposed some computationally effective techniques. Like, Bidirectional Extension with closure checking schemas, Back scan search space pruning, and scan skip optimization used in BIDE (BI-Directional Extension) algorithm. Many researchers were inspired with the efficiency of BIDE, have tried to apply the technique implied by BIDE to various kinds of databases; we toofelt that it can be applied over progressive databases. Without tailoring BIDE, it cannot be applied to dynamic databases. The concept of progressive databases explores the nature of incremental databases by defining the parameters like, Period of Interest (POI), user defined minimum support. An algorithm PISA (Progressive mIning Sequential pAttern mining) was proposed by Huang et al. for finding all sequential patterns over progressive databases. The structure of PISA helps in space utilization by limiting the height of the tree, to the length of POI and this issue is also a motivation for further improvement in this work. In this paper, a tree structure LCT (Label, Customer-id, and Time stamp) is proposed, and an approach formining closed sequential patterns using closure checking schemas across the progressive databases concept. The significance of LCT structure is, confining its height to a maximum of two levels. The algorithmic approach describes that the window size can be increased by one unit of time. The complexity of the proposed algorithmic approach is also analysed. The approach is validated using synthetic data sets available in Internet and shows a better performance in comparison to the existing methods.


2012 ◽  
Vol 2 (4) ◽  
Author(s):  
Aloysius George ◽  
D. Binu

AbstractDiscovering sequential patterns is a rather well-studied area in data mining and has been found many diverse applications, such as basket analysis, telecommunications, etc. In this article, we propose an efficient algorithm that incorporates constraints and promotion-based marketing scenarios for the mining of valuable sequential patterns. Incorporating specific constraints into the sequential mining process has enabled the discovery of more user-centered patterns. We move one step ahead and integrate three significant marketing scenarios for mining promotion-oriented sequential patterns. The promotion-based market scenarios considered in the proposed research are 1) product Downturn, 2) product Revision and 3) product Launch (DRL). Each of these scenarios is characterized by distinct item and adjacency constraints. We have developed a novel DRL-PrefixSpan algorithm (tailored form of the PrefixSpan) for mining all length DRL patterns. The proposed algorithm has been validated on synthetic sequential databases. The experimental results demonstrate the effectiveness of incorporating the promotion-based marketing scenarios in the sequential pattern mining process.


Author(s):  
UNIL YUN ◽  
KEUN HO RYU

Sequential pattern mining with constraints has been developed to improve the efficiency and effectiveness in mining process. Specifically, there are two interesting constraints for sequential pattern mining. First, some sequences are more important and others are less important. Weight constraints consider the importance of sequences and items within sequences. Second, patterns including only a few items are interesting if they have high support. Meanwhile, long patterns can be interesting although their supports are relatively small. Weight constraints and length-decreasing support constraints are two paradigms aimed at finding important sequential patterns and reducing uninteresting patterns. Although weight and length-decreasing support constraints are vital elements, it is hard to consider both constraints by using previous approaches. In this paper, we integrate weight and length-decreasing support constraints by pushing two constraints into the prefix projection growth method. For pruning techniques, we define the Weighted Smallest Valid Extension property and apply the property to our pruning methods for reducing search space. In performance test, we show that our algorithm mines important sequential patterns with length-decreasing support constraints.


2020 ◽  
Vol 36 (1) ◽  
pp. 1-15
Author(s):  
Tran Huy Duong ◽  
Nguyen Truong Thang ◽  
Vu Duc Thi ◽  
Tran The Anh

High utility sequential pattern mining is a popular topic in data mining with the main purpose is to extract sequential patterns with high utility in the sequence database. Many recent works have proposed methods to solve this problem. However, most of them does not consider item intervals of sequential patterns which can lead to the extraction of sequential patterns with too long item interval, thus making little sense. In this paper, we propose a High Utility Item Interval Sequential Pattern (HUISP) algorithm to solve this problem. Our algorithm uses pattern growth approach and some techniques to increase algorithm's performance.


Author(s):  
Yue-Shi Lee ◽  
Show-Jane Yen

Web mining is one of the mining technologies, which applies data mining techniques in large amount of web data to improve the web services. Web traversal pattern mining discovers most of the users’ access patterns from web logs. This information can provide the navigation suggestions for web users such that appropriate actions can be adopted. However, the web data will grow rapidly in the short time, and some of the web data may be antiquated. The user behaviors may be changed when the new web data is inserted into and the old web data is deleted from web logs. Besides, it is considerably difficult to select a perfect minimum support threshold during the mining process to find the interesting rules. Even though the experienced experts, they also cannot determine the appropriate minimum support. Thus, we must constantly adjust the minimum support until the satisfactory mining results can be found. The essences of incremental or interactive data mining are that we can use the previous mining results to reduce the unnecessary processes when the minimum support is changed or web logs are updated. In this paper, we propose efficient incremental and interactive data mining algorithms to discover web traversal patterns and make the mining results to satisfy the users’ requirements. The experimental results show that our algorithms are more efficient than the other approaches.


Information ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Scott Buffett

A ubiquitous challenge throughout all areas of data mining, particularly in the mining of frequent patterns in large databases, is centered on the necessity to reduce the time and space required to perform the search. The extent of this reduction proportionally facilitates the ability to identify patterns of interest. High utility sequential pattern mining (HUSPM) seeks to identify frequent patterns that are (1) sequential in nature and (2) hold a significant magnitude of utility in a sequence database, by considering the aspect of item value or importance. While traditional sequential pattern mining relies on the downward closure property to significantly reduce the required search space, with HUSPM, this property does not hold. To address this drawback, an approach is proposed that establishes a tight upper bound on the utility of future candidate sequential patterns by maintaining a list of items that are deemed potential candidates for concatenation. Such candidates are provably the only items that are ever needed for any extension of a given sequential pattern or its descendants in the search tree. This list is then exploited to significantly further tighten the upper bound on the utilities of descendent patterns. An extension of this work is then proposed that significantly reduces the computational cost of updating database utilities each time a candidate item is removed from the list, resulting in a massive reduction in the number of candidate sequential patterns that need to be generated in the search. Sequential pattern mining methods implementing these new techniques for bound reduction and further candidate list reduction are demonstrated via the introduction of the CRUSP and CRUSPPivot algorithms, respectively. Validation of the techniques was conducted on six public datasets. Tests show that use of the CRUSP algorithm results in a significant reduction in the overall number of candidate sequential patterns that need to be considered, and subsequently a significant reduction in run time, when compared to the current state of the art in bounding techniques. When employing the CRUSPPivot algorithm, the further reduction in the size of the search space was found to be dramatic, with the reduction in run time found to be dramatic to moderate, depending on the dataset. Demonstrating the practical significance of the work, experiments showed that time required for one particularly complex dataset was reduced from many hours to less than one minute.


2011 ◽  
Vol 341-342 ◽  
pp. 530-534
Author(s):  
Zai Ping Tao

In this paper, a new algorithm named TCSP is proposed to mine sequential patterns with different time constraints. It scans the database into memory and constructs time-index sets for efficient processing. It mines the desired sequential patterns without generating any candidates. We have evaluated the new algorithm with the well-known GSP algorithm and the DELISP algorithm for various datasets and constraints. The comprehensive experiments show that the TCSP algorithm works better and it has good scalability.


Author(s):  
Céline Fiot

The explosive growth of collected and stored data has generated a need for new techniques transforming these large amounts of data into useful comprehensible knowledge. Among these techniques, referred to as data mining, sequential pattern approaches handle sequence databases, extracting frequently occurring patterns related to time. Since most real-world databases consist of historical and quantitative data, some works have been done for mining the quantitative information stored within such sequence databases, uncovering fuzzy sequential patterns. In this chapter, we first introduce the various fuzzy sequential pattern approaches and the general principles they are based on. Then, we focus on a complete framework for mining fuzzy sequential patterns handling different levels of consideration of quantitative information. This framework is then applied to two real-life data sets: Web access logs and a textual database. We conclude on a discussion about future trends in fuzzy pattern mining.


Author(s):  
Yue-Shi Lee

Web mining is one of the mining technologies, which applies data mining techniques in large amounts of Web data to improve the Web services. Web traversal pattern mining discovers most of the users’ access patterns from Web logs. This information can provide the navigation suggestions for Web users such that appropriate actions can be adopted. However, the Web data will grow rapidly in the short time, and some of the Web data may be antiquated. The user behaviors may be changed when the new Web data is inserted into and the old Web data is deleted from Web logs. Besides, it is considerably difficult to select a perfect minimum support threshold during the mining process to find the interesting rules. Even the experienced experts also cannot determine the appropriate minimum support. Thus, we must constantly adjust the minimum support until the satisfactory mining results can be found. The essences of incremental or interactive data mining are that we can use the previous mining results to reduce the unnecessary processes when the minimum support is changed or Web logs are updated. In this chapter, we propose efficient incremental and interactive data mining algorithms to discover Web traversal patterns and make the mining results to satisfy the users’ requirements. The experimental results show that our algorithms are more efficient than the other approaches.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Chunkai Zhang ◽  
Zilin Du ◽  
Yiwen Zu

High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, where utility is used to measure the importance or weight of a sequence. However, the underlying informative knowledge of hierarchical relation between different items is ignored in HUSPM, which makes HUSPM unable to extract more interesting patterns. In this paper, we incorporate the hierarchical relation of items into HUSPM and propose a two-phase algorithm MHUH, the first algorithm for high-utility hierarchical sequential pattern mining (HUHSPM). In the first phase named Extension, we use the existing algorithm FHUSpan which we proposed earlier to efficiently mine the general high-utility sequences (g-sequences); in the second phase named Replacement, we mine the special high-utility sequences with the hierarchical relation (s-sequences) as high-utility hierarchical sequential patterns from g-sequences. For further improvements of efficiency, MHUH takes several strategies such as Reduction, FGS, and PBS and a novel upper bounder TSWU, which will be able to greatly reduce the search space. Substantial experiments were conducted on both real and synthetic datasets to assess the performance of the two-phase algorithm MHUH in terms of runtime, number of patterns, and scalability. Conclusion can be drawn from the experiment that MHUH extracts more interesting patterns with underlying informative knowledge efficiently in HUHSPM.


Sign in / Sign up

Export Citation Format

Share Document