A Relative Performance of Dissimilarity Measures for Matching Relational Web Access Patterns Between User Sessions

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Pattern Engineering System Development for Big Data Analytics ◽

10.4018/978-1-5225-3870-7.ch010 ◽

2018 ◽

pp. 153-176

Author(s):

Dilip Singh Sisodia

Keyword(s):

Clustering Algorithms ◽

Similarity Measures ◽

Comparative Performance ◽

Dissimilarity Measures ◽

User Session ◽

Relational Clustering ◽

Cluster Distance ◽

Web Access ◽

Access Patterns ◽

Cluster Quality

Customized web services are offered to users by grouping them according to their access patterns. Clustering techniques are very useful in grouping users and analyzing web access patterns. Clustering can be an object clustering performed on feature vectors or relational clustering performed on relational data. The relational clustering is preferred over object clustering for web users' sessions because of high dimensionality and sparsity of web users' data. However, relational clustering of web users depends on underlying dissimilarity measures used. Therefore, correct dissimilarity measure for matching relational web access patterns between user sessions is very important. In this chapter, the various dissimilarity measures used in relational clustering of web users' data are discussed. The concept of an augmented user session is also discussed to derive different augmented session dissimilarity measures. The discussed session dissimilarity measures are used with relational fuzzy clustering algorithms. The comparative performance binary session similarity and augmented session similarity measures are evaluated using intra-cluster and inter-cluster distance-based cluster quality ratio. The results suggested the augmented session dissimilarity measures in general, and intuitive augmented session (dis)similarity measure, in particular, performed better than the other measures.

Download Full-text

A robustness metric for biological data clustering algorithms

BMC Bioinformatics ◽

10.1186/s12859-019-3089-6 ◽

2019 ◽

Vol 20 (S15) ◽

Author(s):

Yuping Lu ◽

Charles A. Phillips ◽

Michael A. Langston

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Similarity Measures ◽

Parameter Tuning ◽

Biological Data ◽

Algorithm Selection ◽

The Stability ◽

Microarray Datasets ◽

Cluster Quality

Abstract Background Cluster analysis is a core task in modern data-centric computation. Algorithmic choice is driven by factors such as data size and heterogeneity, the similarity measures employed, and the type of clusters sought. Familiarity and mere preference often play a significant role as well. Comparisons between clustering algorithms tend to focus on cluster quality. Such comparisons are complicated by the fact that algorithms often have multiple settings that can affect the clusters produced. Such a setting may represent, for example, a preset variable, a parameter of interest, or various sorts of initial assignments. A question of interest then is this: to what degree do the clusters produced vary as setting values change? Results This work introduces a new metric, termed simply “robustness”, designed to answer that question. Robustness is an easily-interpretable measure of the propensity of a clustering algorithm to maintain output coherence over a range of settings. The robustness of eleven popular clustering algorithms is evaluated over some two dozen publicly available mRNA expression microarray datasets. Given their straightforwardness and predictability, hierarchical methods generally exhibited the highest robustness on most datasets. Of the more complex strategies, the paraclique algorithm yielded consistently higher robustness than other algorithms tested, approaching and even surpassing hierarchical methods on several datasets. Other techniques exhibited mixed robustness, with no clear distinction between them. Conclusions Robustness provides a simple and intuitive measure of the stability and predictability of a clustering algorithm. It can be a useful tool to aid both in algorithm selection and in deciding how much effort to devote to parameter tuning.

Download Full-text

Low-complexity fuzzy relational clustering algorithms for Web mining

IEEE Transactions on Fuzzy Systems ◽

10.1109/91.940971 ◽

2001 ◽

Vol 9 (4) ◽

pp. 595-607 ◽

Cited By ~ 228

Author(s):

R. Krishnapuram ◽

A. Joshi ◽

O. Nasraoui ◽

L. Yi

Keyword(s):

Web Mining ◽

Clustering Algorithms ◽

Low Complexity ◽

Relational Clustering

Download Full-text

Improving the Clustering Algorithms Automatic Generation Process with Cluster Quality Indexes

Computational Science and Its Applications – ICCSA 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-58799-4_73 ◽

2020 ◽

pp. 1017-1031

Author(s):

Michel Montenegro ◽

Aruanda Meiguins ◽

Bianchi Meiguins ◽

Jefferson Morais

Keyword(s):

Clustering Algorithms ◽

Automatic Generation ◽

Generation Process ◽

Quality Indexes ◽

Cluster Quality

Download Full-text

An Approach to Spatiotemporal Trajectory Clustering Based on Community Detection

Wireless Communications and Mobile Computing ◽

10.1155/2021/5582341 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Xin Wang ◽

Xinzheng Niu ◽

Jiahui Zhu ◽

Zuoyan Liu

Keyword(s):

Community Detection ◽

Trajectory Analysis ◽

Moving Objects ◽

Clustering Algorithms ◽

Similarity Measures ◽

Detection Algorithm ◽

Similarity Matrix ◽

Trajectory Data ◽

Trajectory Similarity ◽

Community Detection Algorithm

Nowadays, large volumes of multimodal data have been collected for analysis. An important type of data is trajectory data, which contains both time and space information. Trajectory analysis and clustering are essential to learn the pattern of moving objects. Computing trajectory similarity is a key aspect of trajectory analysis, but it is very time consuming. To address this issue, this paper presents an improved branch and bound strategy based on time slice segmentation, which reduces the time to obtain the similarity matrix by decreasing the number of distance calculations required to compute similarity. Then, the similarity matrix is transformed into a trajectory graph and a community detection algorithm is applied on it for clustering. Extensive experiments were done to compare the proposed algorithms with existing similarity measures and clustering algorithms. Results show that the proposed method can effectively mine the trajectory cluster information from the spatiotemporal trajectories.

Download Full-text

Application of Data Mining on Web Usage Data for Security: WebSecuDMiner

10.20944/preprints201909.0040.v1 ◽

2019 ◽

Author(s):

Muhammad Zia Aftab Khan ◽

Jihyun Park

Keyword(s):

Design Methodology ◽

Access Pattern ◽

User Research ◽

Web Log ◽

Web Access ◽

User Access ◽

Log File ◽

Access Patterns ◽

Web Access Pattern ◽

The Web

The purpose of this paper is to develop WebSecuDMiner algorithm to discover unusual web access patterns based on analysing the potential rules hidden in web server log and user navigation history. Design/methodology/approach: WebSecuDMiner uses equivalence class transformation (ECLAT) algorithm to extract user access patterns from the web log data, which will be used to identify the user access behaviours pattern and detect unusual one. Data extracted from the web serve log and user browsing behaviour is exploited to retrieve the web access pattern that is produced by the same user. Findings: WebSecuDMiner is used to detect whether any unauthorized access have been posed and take appropriate decisions regarding the review of the original rights of suspicious user. Research limitations/implications: The present work uses the database which is extracted from web serve log file and user browsing behaviour. Although the page is viewed by the user, the visit is not recorded in the server log file, since it can be access from the browser's cache.

Download Full-text

PRIVACY PRESERVING CLUSTERING BASED ON LINEAR APPROXIMATION OF FUNCTION

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v12i5.2914 ◽

2013 ◽

Vol 12 (5) ◽

pp. 3443-3451

Author(s):

Rajesh Pasupuleti ◽

Narsimha Gugulothu

Keyword(s):

Linear Approximation ◽

Clustering Algorithms ◽

Similarity Measures ◽

Privacy Preserving ◽

Distance Measures ◽

Clustering Methods ◽

Sensitive Data ◽

Processing Information ◽

Data Objects ◽

Approximation Of Function

Clustering analysis initiativesÂ a new direction in data mining that has major impact in various domains including machine learning, pattern recognition, image processing, information retrieval and bioinformatics. Current clustering techniques address some of theÂ requirements not adequately and failed in standardizing clustering algorithms to support for all real applications. Many clustering methods mostly depend on user specified parametric methods and initial seeds of clusters are randomly selected byÂ user.Â In this paper, we proposed new clustering method based on linear approximation of function by getting over all idea of behavior knowledge of clustering function, then pick the initial seeds of clusters as the points on linear approximation line and perform clustering operations, unlike grouping data objects into clusters by using distance measures, similarity measures and statistical distributions in traditional clustering methods. We have shown experimental results as clusters based on linear approximation yields goodÂ results in practice with an example ofÂ business data are provided.Â It alsoÂ explains privacy preserving clusters of sensitive data objects.

Download Full-text

Mining frequent web access patterns with partial enumeration

Proceedings of the 45th annual southeast regional conference on - ACM-SE 45 ◽

10.1145/1233341.1233382 ◽

2007 ◽

Cited By ~ 2

Author(s):

Peiyi Tang ◽

Markus P. Turkia

Keyword(s):

Partial Enumeration ◽

Web Access ◽

Access Patterns

Download Full-text

Movie Recommendation System Based on Fuzzy Inference System and Adaptive Neuro Fuzzy Inference System

Fuzzy Systems ◽

10.4018/978-1-5225-1908-9.ch026 ◽

2017 ◽

pp. 573-608

Author(s):

Mahfuzur Rahman Siddiquee ◽

Naimul Haider ◽

Rashedur M. Rahman

Keyword(s):

Fuzzy Inference System ◽

Recommendation System ◽

Fuzzy Inference ◽

Pearson Correlation ◽

Similarity Measures ◽

Manhattan Distance ◽

Comparative Performance ◽

Inference System ◽

Neuro Fuzzy ◽

Similarity Calculation

One of most prominent features that social networks or e-commerce sites now provide is recommendation of items. However, the recommendation task is challenging as high degree of accuracy is required. This paper analyzes the improvement in recommendation of movies using Fuzzy Inference System (FIS) and Adaptive Neuro Fuzzy Inference System (ANFIS). Two similarity measures have been used: one by taking account similar users' choice and the other by matching genres of similar movies rated by the user. For similarity calculation, four different techniques, namely Euclidean Distance, Manhattan Distance, Pearson Coefficient and Cosine Similarity are used. FIS and ANFIS system are used in decision making. The experiments have been carried out on Movie Lens dataset and a comparative performance analysis has been reported. Experimental results demonstrate that ANFIS outperforms FIS in most of the cases when Pearson Correlation metric is used for similarity calculation.

Download Full-text

Web Access Patterns Mining for Individuals with Timing and Link Sequence

Information Technology Journal ◽

10.3923/itj.2014.746.753 ◽

2014 ◽

Vol 13 (4) ◽

pp. 746-753 ◽

Cited By ~ 2

Author(s):

Yonglong Ge ◽

Chungang Yan ◽

Zhijun Ding ◽

Wangyang Yu

Keyword(s):

Web Access ◽

Access Patterns

Download Full-text

A Constraint Programming Approach for Web Log Mining

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2016100102 ◽

2016 ◽

Vol 11 (4) ◽

pp. 24-42 ◽

Cited By ~ 2

Author(s):

Amina Kemmar ◽

Yahia Lebbah ◽

Samir Loudni

Keyword(s):

Constraint Programming ◽

Pattern Mining ◽

Programming Approach ◽

Web Log Mining ◽

Web Log ◽

Web Access ◽

Log Mining ◽

Log File ◽

Access Patterns ◽

The Web

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.

Download Full-text