Signature-Based Indexing Techniques for Web Access Logs

Author(s):  
Yannis Manolopoulos ◽  
Alexandros Nanopoulos ◽  
Mikolaj Morzy ◽  
Tadeusz Morzy ◽  
Marek Wojciechowski ◽  
...  

Web servers have recently become the main source of information on the Internet. Every Web server uses a Web log to automatically record access of its users. Each Web-log entry represents a single user’s access to a Web resource (e.g., HTML document) and contains the client’s IP address, the timestamp, the URL address of the requested resource, and some additional information. An example log file is depicted in Figure 1. Each row contains the IP address of the requesting client, the timestamp of the request, the name of the method used with the URL of the resource, the return code issued by the server, and the size of the requested object.

2004 ◽  
pp. 305-334 ◽  
Author(s):  
Yannis Manolopoulos ◽  
Mikolaj Morzy ◽  
Tadeusz Morzy ◽  
Alexandros Nanopoulos ◽  
Marek Wojciechowski ◽  
...  

Access histories of users visiting a web server are automatically recorded in web access logs. Conceptually, the web-log data can be regarded as a collection of clients’ access-sequences, where each sequence is a list of pages accessed by a single user in a single session. This chapter presents novel indexing techniques that support efficient processing of so-called pattern queries, which consist of finding all access sequences that contain a given subsequence. Pattern queries are a key element of advanced analyses of web-log data, especially those concerning typical navigation schemes. In this chapter, we discuss the particularities of efficiently processing user access-sequences with pattern queries, compared to the case of searching unordered sets. Extensive experimental results are given, which examine a variety of factors and illustrate the superiority of the proposed methods over indexing techniques for unordered data adapted to access sequences.


Author(s):  
Muhammad Zia Aftab Khan ◽  
Jihyun Park

The purpose of this paper is to develop WebSecuDMiner algorithm to discover unusual web access patterns based on analysing the potential rules hidden in web server log and user navigation history. Design/methodology/approach: WebSecuDMiner uses equivalence class transformation (ECLAT) algorithm to extract user access patterns from the web log data, which will be used to identify the user access behaviours pattern and detect unusual one. Data extracted from the web serve log and user browsing behaviour is exploited to retrieve the web access pattern that is produced by the same user. Findings: WebSecuDMiner is used to detect whether any unauthorized access have been posed and take appropriate decisions regarding the review of the original rights of suspicious user. Research limitations/implications: The present work uses the database which is extracted from web serve log file and user browsing behaviour. Although the page is viewed by the user, the visit is not recorded in the server log file, since it can be access from the browser's cache.


Author(s):  
Amina Kemmar ◽  
Yahia Lebbah ◽  
Samir Loudni

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Miguel Rodriguez ◽  
Antoinette A. Danvers ◽  
Carolina Sanabia ◽  
Siobhan M. Dolan

Abstract Background The objective of the study was to understand how pregnant women learned about Zika infection and to identify what sources of information were likely to influence them during their pregnancy. Methods We conducted 13 semi-structed interviews in English and Spanish with women receiving prenatal care who were tested for Zika virus infection. We analyzed the qualitative data using descriptive approach. Results Pregnant women in the Bronx learned about Zika from family, television, the internet and their doctor. Informational sources played different roles. Television, specifically Spanish language networks, was often the initial source of information. Women searched the internet for additional information about Zika. Later, they engaged in further discussions with their healthcare providers. Conclusions Television played an important role in providing awareness about Zika to pregnant women in the Bronx, but that information was incomplete. The internet and healthcare providers were sources of more complete information and are likely the most influential. Efforts to educate pregnant women about emerging infectious diseases will benefit from using a variety of approaches including television messages that promote public awareness followed up by reliable information via the internet and healthcare providers.


2020 ◽  
Vol 9 (1) ◽  
pp. 1045-1050

Nowadays, WWW has grown into significant and vast data storage. Every one of clients' exercises will be put away in log record. The log file shows the eagerness on the website. With an abundant use of web, the log file size is developing hurriedly. Web mining is a utilization of information digging innovations for immense information storehouses. It is the procedure of uncover data from web information. Before applying web mining procedures, the information in the web log must be pre-processed, consolidated and changed. It is essential for the web excavators to use smart apparatuses so as to discover, concentrate, channel and assess the ideal data. The information preprocessing stage is the most significant stage during the time spent web mining and is basic and complex in fruitful extraction of helpful information. The web logs are circulated in nature also they are non-versatile and unfeasible. Subsequently we require a broad learning calculation so as to get the ideal data.


2002 ◽  
Vol 22 (1Supplement) ◽  
pp. 111-114
Author(s):  
Yumi YAMAGUCHI ◽  
Yuko IKEHATA ◽  
Takayuki ITOH ◽  
Yasumasa KAJINAGA

2019 ◽  
Vol 23 (22) ◽  
pp. 11947-11965
Author(s):  
Te-En Wei ◽  
Hahn-Ming Lee ◽  
Albert B. Jeng ◽  
Hemank Lamba ◽  
Christos Faloutsos

Web Mining ◽  
2011 ◽  
pp. 373-392 ◽  
Author(s):  
Yew-Kwong Woon ◽  
Wee-Keong Ng ◽  
Ee-Peng Lim

The rising popularity of electronic commerce makes data mining an indispensable technology for several applications, especially online business competitiveness. The World Wide Web provides abundant raw data in the form of Web access logs. However, without data mining techniques, it is difficult to make any sense out of such massive data. In this chapter, we focus on the mining of Web access logs, commonly known as Web usage mining. We analyze algorithms for preprocessing and extracting knowledge from such logs. We will also propose our own techniques to mine the logs in a more holistic manner. Experiments conducted on real Web server logs verify the practicality as well as the efficiency of the proposed techniques as compared to an existing technique. Finally, challenges in Web usage mining are discussed.


Sign in / Sign up

Export Citation Format

Share Document