Research on Improved Clustering Algorithm on Web Usage Mining Based on Scientific Analysis of Web Materials

Clustering analysis is an important method to research the Web user’s browsing behavior and identify the potential customers on Web usage mining. The traditional user clustering algorithms are not quite accurate. In this paper, we give two improved user clustering algorithms, which are based on the associated matrix of the user’s hits in the process of browsing website. To this matrix, an improved Hamming distance matrix is generated by defining the minimum norm or the generalized relative Hamming distance between any two vectors. Then, similar user clustering are obtained by setting the threshold value. At the last step of our algorithm, the clustering results are confirmed by defining the clustering’s Similar Index and setting sub-algorithm. Finally, the testing examples show that the new algorithms are more accurate than the old one, and the real log data presents that the improved algorithms are practical.

Download Full-text

A user clustering algorithm on web usage mining

2017 First International Conference on Electronics Instrumentation & Information Systems (EIIS) ◽

10.1109/eiis.2017.8298745 ◽

2017 ◽

Cited By ~ 1

Author(s):

Sun Hao ◽

Shen Zhaoxiang ◽

Zhang Bingbing

Keyword(s):

Clustering Algorithm ◽

Web Usage Mining ◽

Web Usage ◽

User Clustering

Download Full-text

Semantic Based Weighted Web Session Clustering Using Adapted K-Means and Hierarchical Agglomerative Algorithms

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2125 ◽

2022 ◽

Author(s):

Sowmya HK ◽

R. J. Anandhi

Keyword(s):

Clustering Algorithms ◽

Threshold Value ◽

Semantic Distance ◽

Web Usage Mining ◽

Identification Algorithm ◽

Agglomerative Clustering ◽

Dissimilarity Matrix ◽

Identification Methods ◽

Web Usage ◽

Stay Time

The WWW has a big number of pages and URLs that supply the user with a great amount of content. In an intensifying epoch of information, analysing users browsing behaviour is a significant affair. Web usage mining techniques are applied to the web server log to analyse the user behaviour. Identification of user sessions is one of the key and demanding tasks in the pre-processing stage of web usage mining. This paper emphasizes on two important fallouts with the approaches used in the existing session identification methods such as Time based and Referrer based sessionization. The first is dealing with comparing of current request’s referrer field with the URL of previous request. The second is dealing with session creation, new sessions are created or comes in to one session due to threshold value of page stay time and session time. So, authors developed enhanced semantic distance based session identification algorithm that tackles above mentioned issues of traditional session identification methods. The enhanced semantic based method has an accuracy of 84 percent, which is higher than the Time based and Time-Referrer based session identification approaches. The authors also used adapted K-Means and Hierarchical Agglomerative clustering algorithms to improve the prediction of user browsing patterns. Clusters were found using a weighted dissimilarity matrix, which is calculated using two key parameters: page weight and session weight. The Dunn Index and Davies-Bouldin Index are then used to evaluate the clusters. Experimental results shows that more pure and accurate session clusters are formed when adapted clustering algorithms are applied on the weighted sessions rather than the session obtained from traditional sessionization algorithms. Accuracy of the semantic session cluster is higher compared with the cluster of sessions obtained using traditional sessionization.

Download Full-text

Traversal Pattern Mining in Web Usage Data

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch119 ◽

2008 ◽

pp. 2004-2021

Author(s):

Jenq-Foung Yao ◽

Yongqiao Xiao

Keyword(s):

Pattern Mining ◽

Pattern Discovery ◽

Web Usage Mining ◽

Sequential Patterns ◽

Web Usage ◽

Web Logs ◽

Frequent Episodes ◽

Browsing Behavior ◽

The Web ◽

Usage Data

Web usage mining is to discover useful patterns in the web usage data, and the patterns provide useful information about the user’s browsing behavior. This chapter examines different types of web usage traversal patterns and the related techniques used to uncover them, including Association Rules, Sequential Patterns, Frequent Episodes, Maximal Frequent Forward Sequences, and Maximal Frequent Sequences. As a necessary step for pattern discovery, the preprocessing of the web logs is described. Some important issues, such as privacy, sessionization, are raised, and the possible solutions are also discussed.

Download Full-text

Hamming Distance based Clustering Algorithm

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2012010102 ◽

2012 ◽

Vol 2 (1) ◽

pp. 11-20 ◽

Cited By ~ 3

Author(s):

Ritu Vijay ◽

Prerna Mahajan ◽

Rekha Kandwal

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Hamming Distance ◽

Promising Result ◽

Clustering Algorithms ◽

Distribution Patterns ◽

Mixed Data ◽

Binary Representation ◽

Data Sets ◽

Performance Study

Cluster analysis has been extensively used in machine learning and data mining to discover distribution patterns in the data. Clustering algorithms are generally based on a distance metric in order to partition the data into small groups such that data instances in the same group are more similar than the instances belonging to different groups. In this paper the authors have extended the concept of hamming distance for categorical data .As a data processing step they have transformed the data into binary representation. The authors have used proposed algorithm to group data points into clusters. The experiments are carried out on the data sets from UCI machine learning repository to analyze the performance study. They conclude by stating that this proposed algorithm shows promising result and can be extended to handle numeric as well as mixed data.

Download Full-text

Traversal Pattern Mining in Web Usage Data

Web Information Systems ◽

10.4018/978-1-59140-208-4.ch010 ◽

2004 ◽

pp. 335-358 ◽

Cited By ~ 2

Author(s):

Yongqiao Xiao ◽

Jenq-Foung (J.F.) Yao

Keyword(s):

Pattern Mining ◽

Pattern Discovery ◽

Web Usage Mining ◽

Sequential Patterns ◽

Web Usage ◽

Web Logs ◽

Frequent Episodes ◽

Browsing Behavior ◽

The Web ◽

Usage Data

Download Full-text

A Coupled User Clustering Algorithm Based on Mixed Data for Web-Based Learning Systems

Mathematical Problems in Engineering ◽

10.1155/2015/747628 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Ke Niu ◽

Zhendong Niu ◽

Yan Su ◽

Can Wang ◽

Hao Lu ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Learning Systems ◽

Mixed Data ◽

Continuous Data ◽

Significant Information ◽

Web Based ◽

Study Guides ◽

Web Based Learning ◽

User Clustering

In traditional Web-based learning systems, due to insufficient learning behaviors analysis and personalized study guides, a few user clustering algorithms are introduced. While analyzing the behaviors with these algorithms, researchers generally focus on continuous data but easily neglect discrete data, each of which is generated from online learning actions. Moreover, there are implicit coupled interactions among the data but are frequently ignored in the introduced algorithms. Therefore, a mass of significant information which can positively affect clustering accuracy is neglected. To solve the above issues, we proposed a coupled user clustering algorithm for Wed-based learning systems by taking into account both discrete and continuous data, as well as intracoupled and intercoupled interactions of the data. The experiment result in this paper demonstrates the outperformance of the proposed algorithm.

Download Full-text

ANALISIS KINERJA PEGAWAI PUSBINDIKLAT PENELITI LIPI BERDASARKAN POLA PEMANFAATAN INTERNET MELALUI PENDEKATAN WEB USAGE MINING

Jurnal Penelitian Pos dan informatika ◽

10.17933/jppi.v8i2.212 ◽

2018 ◽

Vol 8 (2) ◽

pp. 141-153

Author(s):

Sutrisno Heru Sukoco ◽

Imas Sukaesih Sitanggang ◽

Heru Sukoco

Keyword(s):

Internet Use ◽

Clustering Algorithm ◽

User Behavior ◽

Employee Performance ◽

Web Usage Mining ◽

Internet Services ◽

Employee Productivity ◽

Proxy Server ◽

Web Usage ◽

Performance Target

Pengukuran kinerja pegawai dalam penggunaan layanan internet dapat dilakukan sebagai bagian dari penilaian kinerja. Pendekatan web usage mining melalui pengamatan rekam jejak akses internet yang tersimpan pada proxy server merupakan salah satu cara yang dapat diterapkan untuk memahami perilaku pengguna. Penelitian ini bertujuan untuk mendapatkan gambaran perilaku pegawai Pusbindiklat Peneliti LIPI dalam memanfaatkan layanan internet, mengukur level produktivitas pegawai berdasarkan lama waktu akses terhadap situs yang tidak mendukung pekerjaan dan memetakan kategori situs yang diakses apakah medukung tugas fungsi jabatannya. Penerapan algoritme clustering K-Means digunakan untuk memudahkan memahami pola akses pengguna. Data yang digunakan adalah log proxy server dan nilai prilaku pegawai Pusbindiklat Peneliti LIPI periode Agustus-Desember 2016. Hasil penelitian menunjukkan pola pemanfaatan internet oleh pegawai Pusbindiklat Peneliti LIPI belum sepenuhnya mendukung tugas fungsi jabatannya. Sekitar 83% pegawai menggunakan internet untuk mengakses situs yang tidak mendukung pekerjaan berada pada level rendah (0-4 jam per minggu). Berdasarkan hasil tersebut dapat disimpulkan bahwa prilaku penggunaan internet yang dilakukan pegawai Pusbindiklat Peneliti LIPI tidak mempengaruhi produktivitas secara signifikan.AbstractMeasurement of employee performance in the use of internet services can be conducted as part of employee’s performance target. Web usage mining approach through observation of internet access records stored in the proxy server can be applied in understanding user behavior. This study aims to obtain an overview of employee behavior in utilizing internet services in Pusbindiklat Peneliti LIPI, measure the level of employee productivity based on the length of time access to sites that do not support the work and map the category of sites accessed to the task dutyof employee. K-Means clustering algorithm is used to group user access patterns. The data used are proxy server logs and employee’s performance target in Pusbindiklat Peneliti LIPI in period of August-December 2016. The results shows that the pattern of Internet use by employees Pusbindiklat Peneliti LIPI do not fully support the job function. About 83% of employees use the internet to access sites do not support jobs at low level access (ranging from 0-4 hours per week). Based on these results, it can be concluded that the behavior of internet use by employees of Pusbindiklat Peneliti LIPI does not affect their productivity significantly. Keywords: clustering, K-Means, log proxy server, performance of employees, web usage mining

Download Full-text

Genetically-Modified K-Medoid Clustering Algorithm for Heterogeneous Data Set

Handbook of Research on Applications and Implementations of Machine Learning Techniques - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9902-9.ch004 ◽

2020 ◽

pp. 63-76

Author(s):

Dhayanithi Jaganathan ◽

Akilandeswari Jeyapal

Keyword(s):

Clustering Algorithm ◽

Genetically Modified ◽

Clustering Algorithms ◽

Distance Matrix ◽

Heterogeneous Data ◽

Distance Measures ◽

Experimental Result ◽

Data Set ◽

Individual Distance ◽

Modified Algorithm

In recent days, researchers are doing research studies for clustering of data which are heterogeneous in nature. The data generated in many real-world applications like data form IoT environments and big data domains are heterogeneous in nature. Most of the available clustering algorithms deal with data in homogeneous nature, and there are few algorithms discussed in the literature to deal the data with numeric and categorical nature. Applying the clustering algorithm used by homogenous data to the heterogeneous data leads to information loss. This chapter proposes a new genetically-modified k-medoid clustering algorithm (GMODKMD) which takes fused distance matrix as input that adopts from applying individual distance measures for each attribute based on its characteristics. The GMODKMD is a modified algorithm where Davies Boudlin index is applied in the iteration phase. The proposed algorithm is compared with existing techniques based on accuracy. The experimental result shows that the modified algorithm with fused distance matrix outperforms the existing clustering technique.

Download Full-text

Traversal Pattern Mining in Web Usage Data

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch508 ◽

2005 ◽

pp. 2857-2860

Author(s):

Jenq-Foung (J.F.) Yao ◽

Yongqiao Xiao

Keyword(s):

Web Site ◽

Pattern Mining ◽

Web Server ◽

Web Design ◽

Web Usage Mining ◽

Web Usage ◽

Web Logs ◽

Browsing Behavior ◽

Web Server Performance ◽

Usage Data

Web usage mining is designed to discover useful patterns in Web usage data, i.e., Web logs. Web logs record the user’s browsing of a Web site, and the patterns provide useful information about the user’s browsing behavior. Such patterns can be used for Web design, improving Web server performance, personalization, etc.

Download Full-text

Chameleon Clustering Algorithm with Semantic Analysis Algorithm for Efficient Web Usage Mining

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v10i6.6298 ◽

2015 ◽

Vol 10 (6) ◽

pp. 580

Author(s):

Anupama Prasanth ◽

M. Hemalatha

Keyword(s):

Clustering Algorithm ◽

Semantic Analysis ◽

Web Usage Mining ◽

Analysis Algorithm ◽

Web Usage

Download Full-text