A Relative Performance of Dissimilarity Measures for Matching Relational Web Access Patterns Between User Sessions

Author(s):  
Dilip Singh Sisodia

Customized web services are offered to users by grouping them according to their access patterns. Clustering techniques are very useful in grouping users and analyzing web access patterns. Clustering can be an object clustering performed on feature vectors or relational clustering performed on relational data. The relational clustering is preferred over object clustering for web users' sessions because of high dimensionality and sparsity of web users' data. However, relational clustering of web users depends on underlying dissimilarity measures used. Therefore, correct dissimilarity measure for matching relational web access patterns between user sessions is very important. In this chapter, the various dissimilarity measures used in relational clustering of web users' data are discussed. The concept of an augmented user session is also discussed to derive different augmented session dissimilarity measures. The discussed session dissimilarity measures are used with relational fuzzy clustering algorithms. The comparative performance binary session similarity and augmented session similarity measures are evaluated using intra-cluster and inter-cluster distance-based cluster quality ratio. The results suggested the augmented session dissimilarity measures in general, and intuitive augmented session (dis)similarity measure, in particular, performed better than the other measures.

2019 ◽  
Vol 20 (S15) ◽  
Author(s):  
Yuping Lu ◽  
Charles A. Phillips ◽  
Michael A. Langston

Abstract Background Cluster analysis is a core task in modern data-centric computation. Algorithmic choice is driven by factors such as data size and heterogeneity, the similarity measures employed, and the type of clusters sought. Familiarity and mere preference often play a significant role as well. Comparisons between clustering algorithms tend to focus on cluster quality. Such comparisons are complicated by the fact that algorithms often have multiple settings that can affect the clusters produced. Such a setting may represent, for example, a preset variable, a parameter of interest, or various sorts of initial assignments. A question of interest then is this: to what degree do the clusters produced vary as setting values change? Results This work introduces a new metric, termed simply “robustness”, designed to answer that question. Robustness is an easily-interpretable measure of the propensity of a clustering algorithm to maintain output coherence over a range of settings. The robustness of eleven popular clustering algorithms is evaluated over some two dozen publicly available mRNA expression microarray datasets. Given their straightforwardness and predictability, hierarchical methods generally exhibited the highest robustness on most datasets. Of the more complex strategies, the paraclique algorithm yielded consistently higher robustness than other algorithms tested, approaching and even surpassing hierarchical methods on several datasets. Other techniques exhibited mixed robustness, with no clear distinction between them. Conclusions Robustness provides a simple and intuitive measure of the stability and predictability of a clustering algorithm. It can be a useful tool to aid both in algorithm selection and in deciding how much effort to devote to parameter tuning.


2001 ◽  
Vol 9 (4) ◽  
pp. 595-607 ◽  
Author(s):  
R. Krishnapuram ◽  
A. Joshi ◽  
O. Nasraoui ◽  
L. Yi

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xin Wang ◽  
Xinzheng Niu ◽  
Jiahui Zhu ◽  
Zuoyan Liu

Nowadays, large volumes of multimodal data have been collected for analysis. An important type of data is trajectory data, which contains both time and space information. Trajectory analysis and clustering are essential to learn the pattern of moving objects. Computing trajectory similarity is a key aspect of trajectory analysis, but it is very time consuming. To address this issue, this paper presents an improved branch and bound strategy based on time slice segmentation, which reduces the time to obtain the similarity matrix by decreasing the number of distance calculations required to compute similarity. Then, the similarity matrix is transformed into a trajectory graph and a community detection algorithm is applied on it for clustering. Extensive experiments were done to compare the proposed algorithms with existing similarity measures and clustering algorithms. Results show that the proposed method can effectively mine the trajectory cluster information from the spatiotemporal trajectories.


Author(s):  
Muhammad Zia Aftab Khan ◽  
Jihyun Park

The purpose of this paper is to develop WebSecuDMiner algorithm to discover unusual web access patterns based on analysing the potential rules hidden in web server log and user navigation history. Design/methodology/approach: WebSecuDMiner uses equivalence class transformation (ECLAT) algorithm to extract user access patterns from the web log data, which will be used to identify the user access behaviours pattern and detect unusual one. Data extracted from the web serve log and user browsing behaviour is exploited to retrieve the web access pattern that is produced by the same user. Findings: WebSecuDMiner is used to detect whether any unauthorized access have been posed and take appropriate decisions regarding the review of the original rights of suspicious user. Research limitations/implications: The present work uses the database which is extracted from web serve log file and user browsing behaviour. Although the page is viewed by the user, the visit is not recorded in the server log file, since it can be access from the browser's cache.


2013 ◽  
Vol 12 (5) ◽  
pp. 3443-3451
Author(s):  
Rajesh Pasupuleti ◽  
Narsimha Gugulothu

Clustering analysis initiatives  a new direction in data mining that has major impact in various domains including machine learning, pattern recognition, image processing, information retrieval and bioinformatics. Current clustering techniques address some of the  requirements not adequately and failed in standardizing clustering algorithms to support for all real applications. Many clustering methods mostly depend on user specified parametric methods and initial seeds of clusters are randomly selected by  user.  In this paper, we proposed new clustering method based on linear approximation of function by getting over all idea of behavior knowledge of clustering function, then pick the initial seeds of clusters as the points on linear approximation line and perform clustering operations, unlike grouping data objects into clusters by using distance measures, similarity measures and statistical distributions in traditional clustering methods. We have shown experimental results as clusters based on linear approximation yields good  results in practice with an example of  business data are provided.  It also  explains privacy preserving clusters of sensitive data objects.


Fuzzy Systems ◽  
2017 ◽  
pp. 573-608
Author(s):  
Mahfuzur Rahman Siddiquee ◽  
Naimul Haider ◽  
Rashedur M. Rahman

One of most prominent features that social networks or e-commerce sites now provide is recommendation of items. However, the recommendation task is challenging as high degree of accuracy is required. This paper analyzes the improvement in recommendation of movies using Fuzzy Inference System (FIS) and Adaptive Neuro Fuzzy Inference System (ANFIS). Two similarity measures have been used: one by taking account similar users' choice and the other by matching genres of similar movies rated by the user. For similarity calculation, four different techniques, namely Euclidean Distance, Manhattan Distance, Pearson Coefficient and Cosine Similarity are used. FIS and ANFIS system are used in decision making. The experiments have been carried out on Movie Lens dataset and a comparative performance analysis has been reported. Experimental results demonstrate that ANFIS outperforms FIS in most of the cases when Pearson Correlation metric is used for similarity calculation.


2014 ◽  
Vol 13 (4) ◽  
pp. 746-753 ◽  
Author(s):  
Yonglong Ge ◽  
Chungang Yan ◽  
Zhijun Ding ◽  
Wangyang Yu
Keyword(s):  

Author(s):  
Amina Kemmar ◽  
Yahia Lebbah ◽  
Samir Loudni

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.


Sign in / Sign up

Export Citation Format

Share Document