Indexing Techniques for Web Access Logs

Access histories of users visiting a web server are automatically recorded in web access logs. Conceptually, the web-log data can be regarded as a collection of clients’ access-sequences, where each sequence is a list of pages accessed by a single user in a single session. This chapter presents novel indexing techniques that support efficient processing of so-called pattern queries, which consist of finding all access sequences that contain a given subsequence. Pattern queries are a key element of advanced analyses of web-log data, especially those concerning typical navigation schemes. In this chapter, we discuss the particularities of efficiently processing user access-sequences with pattern queries, compared to the case of searching unordered sets. Extensive experimental results are given, which examine a variety of factors and illustrate the superiority of the proposed methods over indexing techniques for unordered data adapted to access sequences.

Download Full-text

Application of Data Mining on Web Usage Data for Security: WebSecuDMiner

10.20944/preprints201909.0040.v1 ◽

2019 ◽

Author(s):

Muhammad Zia Aftab Khan ◽

Jihyun Park

Keyword(s):

Design Methodology ◽

Access Pattern ◽

User Research ◽

Web Log ◽

Web Access ◽

User Access ◽

Log File ◽

Access Patterns ◽

Web Access Pattern ◽

The Web

The purpose of this paper is to develop WebSecuDMiner algorithm to discover unusual web access patterns based on analysing the potential rules hidden in web server log and user navigation history. Design/methodology/approach: WebSecuDMiner uses equivalence class transformation (ECLAT) algorithm to extract user access patterns from the web log data, which will be used to identify the user access behaviours pattern and detect unusual one. Data extracted from the web serve log and user browsing behaviour is exploited to retrieve the web access pattern that is produced by the same user. Findings: WebSecuDMiner is used to detect whether any unauthorized access have been posed and take appropriate decisions regarding the review of the original rights of suspicious user. Research limitations/implications: The present work uses the database which is extracted from web serve log file and user browsing behaviour. Although the page is viewed by the user, the visit is not recorded in the server log file, since it can be access from the browser's cache.

Download Full-text

Signature-Based Indexing Techniques for Web Access Logs

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch439 ◽

2005 ◽

pp. 2481-2485

Author(s):

Yannis Manolopoulos ◽

Alexandros Nanopoulos ◽

Mikolaj Morzy ◽

Tadeusz Morzy ◽

Marek Wojciechowski ◽

...

Keyword(s):

The Internet ◽

Ip Address ◽

Web Log ◽

Additional Information ◽

Web Resource ◽

Web Access ◽

Log File ◽

Access Logs ◽

Source Of Information ◽

Web Access Logs

Web servers have recently become the main source of information on the Internet. Every Web server uses a Web log to automatically record access of its users. Each Web-log entry represents a single user’s access to a Web resource (e.g., HTML document) and contains the client’s IP address, the timestamp, the URL address of the requested resource, and some additional information. An example log file is depicted in Figure 1. Each row contains the IP address of the requesting client, the timestamp of the request, the name of the method used with the URL of the resource, the return code issued by the server, and the size of the requested object.

Download Full-text

Web Usage Mining with Web Logs

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch321 ◽

2011 ◽

pp. 2096-2102 ◽

Cited By ~ 1

Author(s):

Xiangji Huang

Keyword(s):

Web Mining ◽

Web Search ◽

Relevant Information ◽

Web Usage Mining ◽

Log Data ◽

Web Log ◽

Web Usage ◽

Web Access ◽

Navigation Patterns ◽

Access Logs

With the rapid growth of the World Wide Web, the use of automated Web-mining techniques to discover useful and relevant information has become increasingly important. One challenging direction is Web usage mining, wherein one attempts to discover user navigation patterns of Web usage from Web access logs. Properly exploited, the information obtained from Web usage log can assist us to improve the design of a Web site, refine queries for effective Web search, and build personalized search engines. However, Web log data are usually large in size and extremely detailed, because they are likely to record every aspect of a user request to a Web server. It is thus of great importance to process the raw Web log data in an appropriate way, and identify the target information intelligently. In this chapter, we first briefly review the concept of Web Usage Mining and discuss its difference from classic Knowledge Discovery techniques, and then focus on exploiting Web log sessions, defined as a group of requests made by a single user for a single navigation purpose, in Web usage mining. We also compare some of the state-of-the-art techniques in identifying log sessions from Web servers, and present some popular Web mining techniques, including Association Rule Mining, Clustering, Classification, Collaborative Filtering, and Sequential Pattern Learning, that can be exploited on the Web log data for different research and application purposes.

Download Full-text

Framework for Analyzing Web Access Logs using Hadoop and MapReduce

2018 International Conference on Recent Innovations in Electrical, Electronics & Communication Engineering (ICRIEECE) ◽

10.1109/icrieece44171.2018.9009325 ◽

2018 ◽

Author(s):

Pranjali Borgaonkar ◽

Gaurav Kumar ◽

Jyoti Yaduwanshi

Keyword(s):

Web Access ◽

Access Logs ◽

Web Access Logs

Download Full-text

Query Recommendation Using Large-Scale Web Access Logs and Web Page Archive

Lecture Notes in Computer Science - Database and Expert Systems Applications ◽

10.1007/978-3-540-85654-2_16 ◽

2008 ◽

pp. 134-141 ◽

Cited By ~ 6

Author(s):

Lin Li ◽

Shingo Otsuka ◽

Masaru Kitsuregawa

Keyword(s):

Large Scale ◽

Web Page ◽

Query Recommendation ◽

Web Access ◽

Access Logs ◽

Web Access Logs

Download Full-text

A Constraint Programming Approach for Web Log Mining

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2016100102 ◽

2016 ◽

Vol 11 (4) ◽

pp. 24-42 ◽

Cited By ~ 2

Author(s):

Amina Kemmar ◽

Yahia Lebbah ◽

Samir Loudni

Keyword(s):

Constraint Programming ◽

Pattern Mining ◽

Programming Approach ◽

Web Log Mining ◽

Web Log ◽

Web Access ◽

Log Mining ◽

Log File ◽

Access Patterns ◽

The Web

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.

Download Full-text

Markov Chain-Like Model for Prediction Service Based on Improved Hierarchical Particle Swarm Optimization Cluster Algorithm

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194016400064 ◽

2016 ◽

Vol 26 (04) ◽

pp. 653-674 ◽

Cited By ~ 4

Author(s):

Zhifang Liao ◽

Min Liu ◽

Tianhui Song ◽

Li Kuang ◽

Yan Zhang ◽

...

Keyword(s):

Particle Swarm Optimization ◽

Markov Chain ◽

Clustering Algorithms ◽

Particle Swarm ◽

Heterogeneous Data ◽

Swarm Optimization ◽

Log Data ◽

Web Log ◽

Proposed Model ◽

The Web

Since web pages visited by users contain a variety of data resources and the clustering algorithms frequently used for web data do not take the heterogeneous nature into account when processing the heterogeneous data, this paper proposes a new algorithm, namely IHPSOC algorithm, to cluster web log data on the basis of web log mining. Based on particle swarm optimization (PSO), IHPSOC algorithm clusters the web log data through particle swarm iteration. Based on clustering results, this paper establishes Markov chain-like models which create a corresponding Markov chain for users in each different category so as to predict the web resources in users’ need. The results of the experiments show that the proposed model gives better predication.

Download Full-text

The Role of Data Preprocessing System on Web Log Files for Mining Students Access Logs

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8354.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1045-1050

Keyword(s):

Data Storage ◽

Web Mining ◽

Web Log ◽

Log Files ◽

Significant Stage ◽

Log File ◽

Access Logs ◽

The Web ◽

The Ideal

Nowadays, WWW has grown into significant and vast data storage. Every one of clients' exercises will be put away in log record. The log file shows the eagerness on the website. With an abundant use of web, the log file size is developing hurriedly. Web mining is a utilization of information digging innovations for immense information storehouses. It is the procedure of uncover data from web information. Before applying web mining procedures, the information in the web log must be pre-processed, consolidated and changed. It is essential for the web excavators to use smart apparatuses so as to discover, concentrate, channel and assess the ideal data. The information preprocessing stage is the most significant stage during the time spent web mining and is basic and complex in fruitful extraction of helpful information. The web logs are circulated in nature also they are non-versatile and unfeasible. Subsequently we require a broad learning calculation so as to get the ideal data.

Download Full-text