Signature-Based Indexing Techniques for Web Access Logs

Web servers have recently become the main source of information on the Internet. Every Web server uses a Web log to automatically record access of its users. Each Web-log entry represents a single user’s access to a Web resource (e.g., HTML document) and contains the client’s IP address, the timestamp, the URL address of the requested resource, and some additional information. An example log file is depicted in Figure 1. Each row contains the IP address of the requesting client, the timestamp of the request, the name of the method used with the URL of the resource, the return code issued by the server, and the size of the requested object.

Download Full-text

Indexing Techniques for Web Access Logs

Web Information Systems ◽

10.4018/978-1-59140-208-4.ch009 ◽

2004 ◽

pp. 305-334 ◽

Cited By ~ 2

Author(s):

Yannis Manolopoulos ◽

Mikolaj Morzy ◽

Tadeusz Morzy ◽

Alexandros Nanopoulos ◽

Marek Wojciechowski ◽

...

Keyword(s):

Log Data ◽

Web Log ◽

Indexing Techniques ◽

Web Access ◽

User Access ◽

Single User ◽

Efficient Processing ◽

Access Logs ◽

Web Access Logs ◽

The Web

Access histories of users visiting a web server are automatically recorded in web access logs. Conceptually, the web-log data can be regarded as a collection of clients’ access-sequences, where each sequence is a list of pages accessed by a single user in a single session. This chapter presents novel indexing techniques that support efficient processing of so-called pattern queries, which consist of finding all access sequences that contain a given subsequence. Pattern queries are a key element of advanced analyses of web-log data, especially those concerning typical navigation schemes. In this chapter, we discuss the particularities of efficiently processing user access-sequences with pattern queries, compared to the case of searching unordered sets. Extensive experimental results are given, which examine a variety of factors and illustrate the superiority of the proposed methods over indexing techniques for unordered data adapted to access sequences.

Download Full-text

Framework for Analyzing Web Access Logs using Hadoop and MapReduce

2018 International Conference on Recent Innovations in Electrical, Electronics & Communication Engineering (ICRIEECE) ◽

10.1109/icrieece44171.2018.9009325 ◽

2018 ◽

Author(s):

Pranjali Borgaonkar ◽

Gaurav Kumar ◽

Jyoti Yaduwanshi

Keyword(s):

Web Access ◽

Access Logs ◽

Web Access Logs

Download Full-text

Application of Data Mining on Web Usage Data for Security: WebSecuDMiner

10.20944/preprints201909.0040.v1 ◽

2019 ◽

Author(s):

Muhammad Zia Aftab Khan ◽

Jihyun Park

Keyword(s):

Design Methodology ◽

Access Pattern ◽

User Research ◽

Web Log ◽

Web Access ◽

User Access ◽

Log File ◽

Access Patterns ◽

Web Access Pattern ◽

The Web

The purpose of this paper is to develop WebSecuDMiner algorithm to discover unusual web access patterns based on analysing the potential rules hidden in web server log and user navigation history. Design/methodology/approach: WebSecuDMiner uses equivalence class transformation (ECLAT) algorithm to extract user access patterns from the web log data, which will be used to identify the user access behaviours pattern and detect unusual one. Data extracted from the web serve log and user browsing behaviour is exploited to retrieve the web access pattern that is produced by the same user. Findings: WebSecuDMiner is used to detect whether any unauthorized access have been posed and take appropriate decisions regarding the review of the original rights of suspicious user. Research limitations/implications: The present work uses the database which is extracted from web serve log file and user browsing behaviour. Although the page is viewed by the user, the visit is not recorded in the server log file, since it can be access from the browser's cache.

Download Full-text

Query Recommendation Using Large-Scale Web Access Logs and Web Page Archive

Lecture Notes in Computer Science - Database and Expert Systems Applications ◽

10.1007/978-3-540-85654-2_16 ◽

2008 ◽

pp. 134-141 ◽

Cited By ~ 6

Author(s):

Lin Li ◽

Shingo Otsuka ◽

Masaru Kitsuregawa

Keyword(s):

Large Scale ◽

Web Page ◽

Query Recommendation ◽

Web Access ◽

Access Logs ◽

Web Access Logs

Download Full-text

A Constraint Programming Approach for Web Log Mining

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2016100102 ◽

2016 ◽

Vol 11 (4) ◽

pp. 24-42 ◽

Cited By ~ 2

Author(s):

Amina Kemmar ◽

Yahia Lebbah ◽

Samir Loudni

Keyword(s):

Constraint Programming ◽

Pattern Mining ◽

Programming Approach ◽

Web Log Mining ◽

Web Log ◽

Web Access ◽

Log Mining ◽

Log File ◽

Access Patterns ◽

The Web

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.

Download Full-text

Educational behaviors of pregnant women in the Bronx during Zika’s International emerging epidemic: “First mom … and then I’d Google. And then my doctor”

BMC Pregnancy and Childbirth ◽

10.1186/s12884-021-04170-0 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Miguel Rodriguez ◽

Antoinette A. Danvers ◽

Carolina Sanabia ◽

Siobhan M. Dolan

Keyword(s):

Pregnant Women ◽

Public Awareness ◽

Healthcare Providers ◽

Emerging Infectious Diseases ◽

Complete Information ◽

The Internet ◽

Sources Of Information ◽

Additional Information ◽

Initial Source ◽

Source Of Information

Abstract Background The objective of the study was to understand how pregnant women learned about Zika infection and to identify what sources of information were likely to influence them during their pregnancy. Methods We conducted 13 semi-structed interviews in English and Spanish with women receiving prenatal care who were tested for Zika virus infection. We analyzed the qualitative data using descriptive approach. Results Pregnant women in the Bronx learned about Zika from family, television, the internet and their doctor. Informational sources played different roles. Television, specifically Spanish language networks, was often the initial source of information. Women searched the internet for additional information about Zika. Later, they engaged in further discussions with their healthcare providers. Conclusions Television played an important role in providing awareness about Zika to pregnant women in the Bronx, but that information was incomplete. The internet and healthcare providers were sources of more complete information and are likely the most influential. Efforts to educate pregnant women about emerging infectious diseases will benefit from using a variety of approaches including television messages that promote public awareness followed up by reliable information via the internet and healthcare providers.

Download Full-text

The Role of Data Preprocessing System on Web Log Files for Mining Students Access Logs

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8354.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1045-1050

Keyword(s):

Data Storage ◽

Web Mining ◽

Web Log ◽

Log Files ◽

Significant Stage ◽

Log File ◽

Access Logs ◽

The Web ◽

The Ideal

Nowadays, WWW has grown into significant and vast data storage. Every one of clients' exercises will be put away in log record. The log file shows the eagerness on the website. With an abundant use of web, the log file size is developing hurriedly. Web mining is a utilization of information digging innovations for immense information storehouses. It is the procedure of uncover data from web information. Before applying web mining procedures, the information in the web log must be pre-processed, consolidated and changed. It is essential for the web excavators to use smart apparatuses so as to discover, concentrate, channel and assess the ideal data. The information preprocessing stage is the most significant stage during the time spent web mining and is basic and complex in fruitful extraction of helpful information. The web logs are circulated in nature also they are non-versatile and unfeasible. Subsequently we require a broad learning calculation so as to get the ideal data.

Download Full-text

Visualization of Web access logs using Data Jewelry Box

Journal of the Visualization Society of Japan ◽

10.3154/jvs.22.1supplement_111 ◽

2002 ◽

Vol 22 (1Supplement) ◽

pp. 111-114

Author(s):

Yumi YAMAGUCHI ◽

Yuko IKEHATA ◽

Takayuki ITOH ◽

Yasumasa KAJINAGA

Keyword(s):

Web Access ◽

Using Data ◽

Access Logs ◽

Web Access Logs

Download Full-text

WebHound: a data-driven intrusion detection from real-world web access logs

Soft Computing ◽

10.1007/s00500-018-03750-1 ◽

2019 ◽

Vol 23 (22) ◽

pp. 11947-11965

Author(s):

Te-En Wei ◽

Hahn-Ming Lee ◽

Albert B. Jeng ◽

Hemank Lamba ◽

Christos Faloutsos

Keyword(s):

Intrusion Detection ◽

Real World ◽

Data Driven ◽

Web Access ◽

Access Logs ◽

Web Access Logs

Download Full-text

Web Usage Mining

Web Mining ◽

10.4018/978-1-59140-414-9.ch018 ◽

2011 ◽

pp. 373-392 ◽

Cited By ~ 1

Author(s):

Yew-Kwong Woon ◽

Wee-Keong Ng ◽

Ee-Peng Lim

Keyword(s):

Data Mining ◽

World Wide ◽

Web Usage Mining ◽

Web Usage ◽

Online Business ◽

Web Access ◽

Business Competitiveness ◽

Access Logs ◽

Web Server Logs ◽

Web Access Logs

The rising popularity of electronic commerce makes data mining an indispensable technology for several applications, especially online business competitiveness. The World Wide Web provides abundant raw data in the form of Web access logs. However, without data mining techniques, it is difficult to make any sense out of such massive data. In this chapter, we focus on the mining of Web access logs, commonly known as Web usage mining. We analyze algorithms for preprocessing and extracting knowledge from such logs. We will also propose our own techniques to mine the logs in a more holistic manner. Experiments conducted on real Web server logs verify the practicality as well as the efficiency of the proposed techniques as compared to an existing technique. Finally, challenges in Web usage mining are discussed.

Download Full-text