scholarly journals Indexing Techniques for Web Access Logs

2004 ◽  
pp. 305-334 ◽  
Author(s):  
Yannis Manolopoulos ◽  
Mikolaj Morzy ◽  
Tadeusz Morzy ◽  
Alexandros Nanopoulos ◽  
Marek Wojciechowski ◽  
...  

Access histories of users visiting a web server are automatically recorded in web access logs. Conceptually, the web-log data can be regarded as a collection of clients’ access-sequences, where each sequence is a list of pages accessed by a single user in a single session. This chapter presents novel indexing techniques that support efficient processing of so-called pattern queries, which consist of finding all access sequences that contain a given subsequence. Pattern queries are a key element of advanced analyses of web-log data, especially those concerning typical navigation schemes. In this chapter, we discuss the particularities of efficiently processing user access-sequences with pattern queries, compared to the case of searching unordered sets. Extensive experimental results are given, which examine a variety of factors and illustrate the superiority of the proposed methods over indexing techniques for unordered data adapted to access sequences.

Author(s):  
Muhammad Zia Aftab Khan ◽  
Jihyun Park

The purpose of this paper is to develop WebSecuDMiner algorithm to discover unusual web access patterns based on analysing the potential rules hidden in web server log and user navigation history. Design/methodology/approach: WebSecuDMiner uses equivalence class transformation (ECLAT) algorithm to extract user access patterns from the web log data, which will be used to identify the user access behaviours pattern and detect unusual one. Data extracted from the web serve log and user browsing behaviour is exploited to retrieve the web access pattern that is produced by the same user. Findings: WebSecuDMiner is used to detect whether any unauthorized access have been posed and take appropriate decisions regarding the review of the original rights of suspicious user. Research limitations/implications: The present work uses the database which is extracted from web serve log file and user browsing behaviour. Although the page is viewed by the user, the visit is not recorded in the server log file, since it can be access from the browser's cache.


Author(s):  
Yannis Manolopoulos ◽  
Alexandros Nanopoulos ◽  
Mikolaj Morzy ◽  
Tadeusz Morzy ◽  
Marek Wojciechowski ◽  
...  

Web servers have recently become the main source of information on the Internet. Every Web server uses a Web log to automatically record access of its users. Each Web-log entry represents a single user’s access to a Web resource (e.g., HTML document) and contains the client’s IP address, the timestamp, the URL address of the requested resource, and some additional information. An example log file is depicted in Figure 1. Each row contains the IP address of the requesting client, the timestamp of the request, the name of the method used with the URL of the resource, the return code issued by the server, and the size of the requested object.


Author(s):  
Xiangji Huang

With the rapid growth of the World Wide Web, the use of automated Web-mining techniques to discover useful and relevant information has become increasingly important. One challenging direction is Web usage mining, wherein one attempts to discover user navigation patterns of Web usage from Web access logs. Properly exploited, the information obtained from Web usage log can assist us to improve the design of a Web site, refine queries for effective Web search, and build personalized search engines. However, Web log data are usually large in size and extremely detailed, because they are likely to record every aspect of a user request to a Web server. It is thus of great importance to process the raw Web log data in an appropriate way, and identify the target information intelligently. In this chapter, we first briefly review the concept of Web Usage Mining and discuss its difference from classic Knowledge Discovery techniques, and then focus on exploiting Web log sessions, defined as a group of requests made by a single user for a single navigation purpose, in Web usage mining. We also compare some of the state-of-the-art techniques in identifying log sessions from Web servers, and present some popular Web mining techniques, including Association Rule Mining, Clustering, Classification, Collaborative Filtering, and Sequential Pattern Learning, that can be exploited on the Web log data for different research and application purposes.


Author(s):  
Amina Kemmar ◽  
Yahia Lebbah ◽  
Samir Loudni

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.


Author(s):  
Zhifang Liao ◽  
Min Liu ◽  
Tianhui Song ◽  
Li Kuang ◽  
Yan Zhang ◽  
...  

Since web pages visited by users contain a variety of data resources and the clustering algorithms frequently used for web data do not take the heterogeneous nature into account when processing the heterogeneous data, this paper proposes a new algorithm, namely IHPSOC algorithm, to cluster web log data on the basis of web log mining. Based on particle swarm optimization (PSO), IHPSOC algorithm clusters the web log data through particle swarm iteration. Based on clustering results, this paper establishes Markov chain-like models which create a corresponding Markov chain for users in each different category so as to predict the web resources in users’ need. The results of the experiments show that the proposed model gives better predication.


2020 ◽  
Vol 9 (1) ◽  
pp. 1045-1050

Nowadays, WWW has grown into significant and vast data storage. Every one of clients' exercises will be put away in log record. The log file shows the eagerness on the website. With an abundant use of web, the log file size is developing hurriedly. Web mining is a utilization of information digging innovations for immense information storehouses. It is the procedure of uncover data from web information. Before applying web mining procedures, the information in the web log must be pre-processed, consolidated and changed. It is essential for the web excavators to use smart apparatuses so as to discover, concentrate, channel and assess the ideal data. The information preprocessing stage is the most significant stage during the time spent web mining and is basic and complex in fruitful extraction of helpful information. The web logs are circulated in nature also they are non-versatile and unfeasible. Subsequently we require a broad learning calculation so as to get the ideal data.


2002 ◽  
Vol 22 (1Supplement) ◽  
pp. 111-114
Author(s):  
Yumi YAMAGUCHI ◽  
Yuko IKEHATA ◽  
Takayuki ITOH ◽  
Yasumasa KAJINAGA

Sign in / Sign up

Export Citation Format

Share Document