Research on Improved Clustering Algorithm on Web Usage Mining Based on Scientific Analysis of Web Materials

2011 ◽  
Vol 63-64 ◽  
pp. 863-867 ◽  
Author(s):  
Bin Li ◽  
Jin Yang ◽  
Cai Ming Liu ◽  
Jian Dong Zhang ◽  
Yan Zhang

Clustering analysis is an important method to research the Web user’s browsing behavior and identify the potential customers on Web usage mining. The traditional user clustering algorithms are not quite accurate. In this paper, we give two improved user clustering algorithms, which are based on the associated matrix of the user’s hits in the process of browsing website. To this matrix, an improved Hamming distance matrix is generated by defining the minimum norm or the generalized relative Hamming distance between any two vectors. Then, similar user clustering are obtained by setting the threshold value. At the last step of our algorithm, the clustering results are confirmed by defining the clustering’s Similar Index and setting sub-algorithm. Finally, the testing examples show that the new algorithms are more accurate than the old one, and the real log data presents that the improved algorithms are practical.

Author(s):  
Sowmya HK ◽  
R. J. Anandhi

The WWW has a big number of pages and URLs that supply the user with a great amount of content. In an intensifying epoch of information, analysing users browsing behaviour is a significant affair. Web usage mining techniques are applied to the web server log to analyse the user behaviour. Identification of user sessions is one of the key and demanding tasks in the pre-processing stage of web usage mining. This paper emphasizes on two important fallouts with the approaches used in the existing session identification methods such as Time based and Referrer based sessionization. The first is dealing with comparing of current request’s referrer field with the URL of previous request. The second is dealing with session creation, new sessions are created or comes in to one session due to threshold value of page stay time and session time. So, authors developed enhanced semantic distance based session identification algorithm that tackles above mentioned issues of traditional session identification methods. The enhanced semantic based method has an accuracy of 84 percent, which is higher than the Time based and Time-Referrer based session identification approaches. The authors also used adapted K-Means and Hierarchical Agglomerative clustering algorithms to improve the prediction of user browsing patterns. Clusters were found using a weighted dissimilarity matrix, which is calculated using two key parameters: page weight and session weight. The Dunn Index and Davies-Bouldin Index are then used to evaluate the clusters. Experimental results shows that more pure and accurate session clusters are formed when adapted clustering algorithms are applied on the weighted sessions rather than the session obtained from traditional sessionization algorithms. Accuracy of the semantic session cluster is higher compared with the cluster of sessions obtained using traditional sessionization.


2008 ◽  
pp. 2004-2021
Author(s):  
Jenq-Foung Yao ◽  
Yongqiao Xiao

Web usage mining is to discover useful patterns in the web usage data, and the patterns provide useful information about the user’s browsing behavior. This chapter examines different types of web usage traversal patterns and the related techniques used to uncover them, including Association Rules, Sequential Patterns, Frequent Episodes, Maximal Frequent Forward Sequences, and Maximal Frequent Sequences. As a necessary step for pattern discovery, the preprocessing of the web logs is described. Some important issues, such as privacy, sessionization, are raised, and the possible solutions are also discussed.


2012 ◽  
Vol 2 (1) ◽  
pp. 11-20 ◽  
Author(s):  
Ritu Vijay ◽  
Prerna Mahajan ◽  
Rekha Kandwal

Cluster analysis has been extensively used in machine learning and data mining to discover distribution patterns in the data. Clustering algorithms are generally based on a distance metric in order to partition the data into small groups such that data instances in the same group are more similar than the instances belonging to different groups. In this paper the authors have extended the concept of hamming distance for categorical data .As a data processing step they have transformed the data into binary representation. The authors have used proposed algorithm to group data points into clusters. The experiments are carried out on the data sets from UCI machine learning repository to analyze the performance study. They conclude by stating that this proposed algorithm shows promising result and can be extended to handle numeric as well as mixed data.


2004 ◽  
pp. 335-358 ◽  
Author(s):  
Yongqiao Xiao ◽  
Jenq-Foung (J.F.) Yao

Web usage mining is to discover useful patterns in the web usage data, and the patterns provide useful information about the user’s browsing behavior. This chapter examines different types of web usage traversal patterns and the related techniques used to uncover them, including Association Rules, Sequential Patterns, Frequent Episodes, Maximal Frequent Forward Sequences, and Maximal Frequent Sequences. As a necessary step for pattern discovery, the preprocessing of the web logs is described. Some important issues, such as privacy, sessionization, are raised, and the possible solutions are also discussed.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Ke Niu ◽  
Zhendong Niu ◽  
Yan Su ◽  
Can Wang ◽  
Hao Lu ◽  
...  

In traditional Web-based learning systems, due to insufficient learning behaviors analysis and personalized study guides, a few user clustering algorithms are introduced. While analyzing the behaviors with these algorithms, researchers generally focus on continuous data but easily neglect discrete data, each of which is generated from online learning actions. Moreover, there are implicit coupled interactions among the data but are frequently ignored in the introduced algorithms. Therefore, a mass of significant information which can positively affect clustering accuracy is neglected. To solve the above issues, we proposed a coupled user clustering algorithm for Wed-based learning systems by taking into account both discrete and continuous data, as well as intracoupled and intercoupled interactions of the data. The experiment result in this paper demonstrates the outperformance of the proposed algorithm.


2018 ◽  
Vol 8 (2) ◽  
pp. 141-153
Author(s):  
Sutrisno Heru Sukoco ◽  
Imas Sukaesih Sitanggang ◽  
Heru Sukoco

Pengukuran kinerja pegawai dalam penggunaan layanan internet dapat dilakukan sebagai bagian dari penilaian kinerja. Pendekatan web usage mining melalui pengamatan rekam jejak akses internet yang tersimpan pada proxy server merupakan salah satu cara yang dapat diterapkan untuk memahami perilaku pengguna. Penelitian ini bertujuan untuk mendapatkan gambaran perilaku pegawai Pusbindiklat Peneliti LIPI dalam memanfaatkan layanan internet, mengukur level produktivitas pegawai berdasarkan lama waktu akses terhadap situs yang tidak mendukung pekerjaan dan memetakan kategori situs yang diakses apakah medukung tugas fungsi jabatannya. Penerapan algoritme clustering K-Means digunakan untuk memudahkan memahami pola akses pengguna. Data yang digunakan adalah log proxy server dan nilai prilaku pegawai Pusbindiklat Peneliti LIPI  periode Agustus-Desember 2016. Hasil penelitian menunjukkan pola pemanfaatan internet oleh pegawai Pusbindiklat Peneliti LIPI belum sepenuhnya mendukung tugas fungsi jabatannya. Sekitar 83% pegawai menggunakan internet untuk mengakses situs yang tidak mendukung pekerjaan berada pada level rendah (0-4 jam per minggu). Berdasarkan hasil tersebut dapat disimpulkan bahwa prilaku penggunaan internet yang dilakukan pegawai Pusbindiklat Peneliti LIPI  tidak mempengaruhi produktivitas secara signifikan.AbstractMeasurement of employee performance in the use of internet services can be conducted as part of employee’s performance target. Web usage mining approach through observation of internet access records stored in the proxy server can be applied in understanding user behavior. This study aims to obtain an overview of employee behavior in utilizing internet services in Pusbindiklat Peneliti LIPI, measure the level of employee productivity based on the length of time access to sites that do not support the work and map the category of sites accessed to the task dutyof employee.  K-Means clustering algorithm is used to group  user access patterns. The data used are proxy server logs and employee’s performance target in Pusbindiklat Peneliti LIPI  in period of August-December 2016. The results shows that  the pattern of Internet use by employees Pusbindiklat Peneliti LIPI  do not fully support the job function. About 83% of employees use the internet to access sites do not support jobs at low level access (ranging from 0-4 hours per week). Based on these results, it can be concluded that the behavior of internet use by employees of Pusbindiklat Peneliti LIPI does not affect their productivity significantly. Keywords: clustering, K-Means, log proxy server, performance of employees, web usage mining


Author(s):  
Dhayanithi Jaganathan ◽  
Akilandeswari Jeyapal

In recent days, researchers are doing research studies for clustering of data which are heterogeneous in nature. The data generated in many real-world applications like data form IoT environments and big data domains are heterogeneous in nature. Most of the available clustering algorithms deal with data in homogeneous nature, and there are few algorithms discussed in the literature to deal the data with numeric and categorical nature. Applying the clustering algorithm used by homogenous data to the heterogeneous data leads to information loss. This chapter proposes a new genetically-modified k-medoid clustering algorithm (GMODKMD) which takes fused distance matrix as input that adopts from applying individual distance measures for each attribute based on its characteristics. The GMODKMD is a modified algorithm where Davies Boudlin index is applied in the iteration phase. The proposed algorithm is compared with existing techniques based on accuracy. The experimental result shows that the modified algorithm with fused distance matrix outperforms the existing clustering technique.


Author(s):  
Jenq-Foung (J.F.) Yao ◽  
Yongqiao Xiao

Web usage mining is designed to discover useful patterns in Web usage data, i.e., Web logs. Web logs record the user’s browsing of a Web site, and the patterns provide useful information about the user’s browsing behavior. Such patterns can be used for Web design, improving Web server performance, personalization, etc.


Sign in / Sign up

Export Citation Format

Share Document