User Behaviour Pattern Mining from Weblog

2012 ◽  
Vol 8 (2) ◽  
pp. 1-22 ◽  
Author(s):  
Vishnu Priya ◽  
A. Vadivel

In this paper, the authors build a tree using both frequent as well as non-frequent items and named as Revised PLWAP with Non-frequent Items RePLNI-tree in single scan. While mining sequential patterns, the links related to the non-frequent items are virtually discarded. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated weblog. It is not required to reconstruct the tree from scratch and re-compute the patterns each time, while weblog is updated or minimum support changed, since the algorithm supports both incremental and interactive mining. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one, while it is not so in recently proposed algorithm. For evaluation purpose, the authors have used the benchmark weblog and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.

2012 ◽  
Vol 197 ◽  
pp. 283-291
Author(s):  
Jun Dong ◽  
Yu Jie Xie ◽  
Jia Dong Ren ◽  
Wei Wei Zhou

Closed repetitive gapped sequential pattern mining has been gained more and more attention in recent years, in this paper, we propose a novel method MRCGP(mining closed repetitive gapped sequential pattern based on repetition linked WAP-Tree). In the first step of MRCGP, the given sequential database is transformed into a new database in which every item is expressed by its landmark; then a positional information table(PIT) which includes all of the position information of 1-frequent items is constructed, all of the repetitive gapped 2-sequential patterns of different items (RPDI) can be obtained through searching the positional information table; following, a repetitive linked web access pattern tree (RLWAP-Tree) is built, in RLWAP-Tree, the 1-frequent items are stored as header table, the items in header table will be linked to their same items which appear earliest in each sequence corresponding to RLWAP-Tree with solid line, all of the items in RLWAP-Tree are linked to their same items in the same sequences with broken line; through mining projection tree of the existing repetitive gapped pattern recursively, we can obtain the repetitive gapped sequential pattern; at the end, we get the closed repetitive gapped sequential pattern by checking inclusion relation of any two patterns. The experiment result shows MRCGP has better time efficiency.


2021 ◽  
Vol 14 (1) ◽  
pp. 244-256
Author(s):  
Gokulapriya Raman ◽  
◽  
Ganesh Raj ◽  

Web usage behaviour mining is a substantial research problem to be resolved as it identifies different user’s behaviour pattern by analysing web log files. But, accuracy of finding the usage behaviour of users frequently accessed web patterns was limited and also it requires more time. Mutual Information Pre-processing based Broken-Stick Linear Regression (MIP-BSLR) technique is proposed for refining the performance of web user behaviour pattern mining with higher accuracy. Initially, web log files from Apache web log dataset and NASA dataset are considered as input. Then, Mutual Information based Pre-processing (MI-P) method is applied to compute mutual dependence between the two web patterns. Based on the computed value, web access patterns which relevant are taken for further processing and irrelevant patterns are removed. After that, Broken-Stick Linear Regression analysis (BLRA) is performed in MIPBSLR for Web User Behaviour analysis. By applying the BLRA, the frequently visited web patterns are identified. With the identification of frequently visited web patterns, MIP-BSLR technique exactly predicts the usage behaviour of web users, and also increases the performance of web usage behaviour mining. Experimental evaluation of MIPBSLR method is conducted on factors such as pattern mining accuracy, false positives, time requirements and space requirements with respect to number of web patterns. Outcomes show that the proposed technique improves the pattern mining accuracy by 14%, and reduces the false positive rate by 52%, time requirement by 19% and space complexity by 21% using Apache web log dataset as compared to conventional methods. Similarly, the pattern mining accuracy of NASA dataset is increased by 16% with the reduction of false positive rate by 47%, time requirement by 20% and space complexity by 22% as compared to conventional methods.


2017 ◽  
Vol 71 (1) ◽  
pp. 100-116 ◽  
Author(s):  
Kai Sheng ◽  
Zhong Liu ◽  
Dechao Zhou ◽  
Ailin He ◽  
Chengxu Feng

It is important for maritime authorities to effectively classify and identify unknown types of ships in historical trajectory data. This paper uses a logistic regression model to construct a ship classifier by utilising the features extracted from ship trajectories. First of all, three basic movement patterns are proposed according to ship sailing characteristics, with related sub-trajectory partitioning algorithms. Subsequently, three categories of trajectory features with their extraction methods are presented. Finally, a case study on building a model for classifying fishing boats and cargo ships based on real Automatic Identification System (AIS) data is given. Experimental results indicate that the proposed classification method can meet the needs of recognising uncertain types of targets in historical trajectory data, laying a foundation for further research on camouflaged ship identification, behaviour pattern mining, outlier behaviour detection and other applications.


2021 ◽  
Vol 36 ◽  
Author(s):  
Ahmad Issa Alaa Aldine ◽  
Mounira Harzallah ◽  
Giuseppe Berio ◽  
Nicolas Béchet ◽  
Ahmad Faour

Abstract Patterns have been extensively used to extract hypernym relations from texts. The most popular patterns are Hearst’s patterns, formulated as regular expressions mainly based on lexical information. Experiences have reported good precision and low recall for such patterns. Thus, several approaches have been developed for improving recall. While these approaches perform better in terms of recall, it remains quite difficult to further increase recall without degrading precision. In this paper, we propose a novel 3-phase approach based on sequential pattern mining to improve pattern-based approaches in terms of both precision and recall by (i) using a rich pattern representation based on grammatical dependencies (ii) discovering new hypernym patterns, and (iii) extending hypernym patterns with anti-hypernym patterns to prune wrong extracted hypernym relations. The results obtained by performing experiments on three corpora confirm that using our approach, we are able to learn sequential patterns and combine them to outperform existing hypernym patterns in terms of precision and recall. The comparison to unsupervised distributional baselines for hypernym detection shows that, as expected, our approach yields much better performance. When compared to supervised distributional baselines for hypernym detection, our approach can be shown to be complementary and much less loosely coupled with training datasets and corpora.


2008 ◽  
pp. 2004-2021
Author(s):  
Jenq-Foung Yao ◽  
Yongqiao Xiao

Web usage mining is to discover useful patterns in the web usage data, and the patterns provide useful information about the user’s browsing behavior. This chapter examines different types of web usage traversal patterns and the related techniques used to uncover them, including Association Rules, Sequential Patterns, Frequent Episodes, Maximal Frequent Forward Sequences, and Maximal Frequent Sequences. As a necessary step for pattern discovery, the preprocessing of the web logs is described. Some important issues, such as privacy, sessionization, are raised, and the possible solutions are also discussed.


2010 ◽  
Vol 2 (1) ◽  
pp. 66-72 ◽  
Author(s):  
María J. Santofimia ◽  
Francisco Moya ◽  
Félix J. Villanueva ◽  
David Villa ◽  
Juan C. López

Since the appearance of the Ambient Intelligence paradigm, as an evolution of the Ubiquitous Computing, a great deal of the research efforts in this ?eld have been mainly aimed at anticipating user actions and needs, out of a pre?xed set. However, Ambient Intelligence is not just constrained to user behaviour pattern matching, but to wisely supervise the whole environment, satisfying those unforeseen requirements or needs, by means of rational decisions. This work points at the lack of commonsense reasoning, as the main reason underlying the existance of these idiots savant systems, capable of accomplishing very speci?c and complex tasks, but incapable of making decisions out of the pre?xed behavioral patterns. This work advocates for the integration of the commonsense reasoning and understanding capabilities as the key elements in bridging the gap between idiot savant systems and real Ambient Intelligence systems.


2018 ◽  
Vol 10 (11) ◽  
pp. 4330 ◽  
Author(s):  
Xinglong Yuan ◽  
Wenbing Chang ◽  
Shenghan Zhou ◽  
Yang Cheng

Sequential pattern mining (SPM) is an effective and important method for analyzing time series. This paper proposed a SPM algorithm to mine fault sequential patterns in text data. Because the structure of text data is poor and there are many different forms of text expression for the same concept, the traditional SPM algorithm cannot be directly applied to text data. The proposed algorithm is designed to solve this problem. First, this study measured the similarity of fault text data and classified similar faults into one class. Next, this paper proposed a new text similarity measurement model based on the word embedding distance. Compared with the classic text similarity measurement method, this model can achieve good results in short text classification. Then, on the basis of fault classification, this paper proposed the SPM algorithm with an event window, which is a time soft constraint for obtaining a certain number of sequential patterns according to needs. Finally, this study used the fault text records of a certain aircraft as experimental data for mining fault sequential patterns. Experiment showed that this algorithm can effectively mine sequential patterns in text data. The proposed algorithm can be widely applied to text time series data in many fields such as industry, business, finance and so on.


2020 ◽  
Author(s):  
Alfonso González-Briones ◽  
Javier Prieto ◽  
Fernando De La Prieta ◽  
Yves Demazeau ◽  
Juan M. Corchado

Sign in / Sign up

Export Citation Format

Share Document