The Research of Preprocessing and Pattern Discovery Techniques on Web Log Files

The web is a rich data mining source which is dynamic and fast growing, providing great opportunities which are often not exploited. Web data represent a real challenge to traditional data mining techniques due to its huge amount and the unstructured nature. Web logs contain information about the interactions between visitors and the website. Analyzing these logs provides insights into visitors' behavior, usage patterns, and trends. Web usage mining, also known as web log mining, is the process of applying data mining techniques to discover useful information hidden in web server's logs. Web logs are primarily used by Web administrators to know how much traffic they get and to detect broken links and other types of errors. Web usage mining extracts useful information that can be beneficial to a number of application areas such as: web personalization, website restructuring, system performance improvement, and business intelligence. The Web usage mining process involves three main phases: pre-processing, pattern discovery, and pattern analysis. Various preprocessing techniques have been proposed to extract information from log files and group primitive data items into meaningful, lighter level abstractions that are suitable for mining, usually in forms of visitors' sessions. Major data mining techniques in web usage mining pattern discovery are: clustering, association analysis, classification, and sequential patterns discovery. This chapter discusses the process of web usage mining, its procedure, methods, and patterns discovery techniques. The chapter also presents a practical example using real web log data.

Download Full-text

They Know What You Will Do Next Click

Interdisciplinary Approaches to Digital Transformation and Innovation - Advances in E-Business Research ◽

10.4018/978-1-7998-1879-3.ch005 ◽

2020 ◽

pp. 100-122

Author(s):

Serra Çelik

Keyword(s):

Focus Group ◽

User Behavior ◽

Web Usage Mining ◽

Web Log ◽

Web Usage ◽

User Behaviors ◽

Log Files ◽

The Web

This chapter focuses on predicting web user behaviors. When web users enter a website, every move they make on that website is stored as web log files. Unlike the focus group or questionnaire, the log files reflect real user behavior. It can easily be said that having actual user behavior is a gold value for the organizations. In this chapter, the ways of extracting user patterns (user behavior) from the log files are sought. In this context, the web usage mining process is explained. Some web usage mining techniques are mentioned.

Download Full-text

Performance Comparison of Pattern Discovery Methods on Web Log Data

IEEE International Conference on Computer Systems and Applications, 2006. ◽

10.1109/aiccsa.2006.205129 ◽

2006 ◽

Cited By ~ 4

Author(s):

M.A. Bayir ◽

I.H. Toroslu ◽

A. Cosar

Keyword(s):

Pattern Discovery ◽

Performance Comparison ◽

Log Data ◽

Web Log

Download Full-text

Analyse and Detect the IP Spoofing Attack in Web Log Files Using BPNN for Classification

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v42p120 ◽

2016 ◽

Vol 42 (2) ◽

pp. 117-123

Author(s):

Vedna Sharma ◽

Monika Thakur

Keyword(s):

Web Log ◽

Log Files ◽

Ip Spoofing

Download Full-text

The Role of Data Preprocessing System on Web Log Files for Mining Students Access Logs

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8354.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 1045-1050

Keyword(s):

Data Storage ◽

Web Mining ◽

Web Log ◽

Log Files ◽

Significant Stage ◽

Log File ◽

Access Logs ◽

The Web ◽

The Ideal

Nowadays, WWW has grown into significant and vast data storage. Every one of clients' exercises will be put away in log record. The log file shows the eagerness on the website. With an abundant use of web, the log file size is developing hurriedly. Web mining is a utilization of information digging innovations for immense information storehouses. It is the procedure of uncover data from web information. Before applying web mining procedures, the information in the web log must be pre-processed, consolidated and changed. It is essential for the web excavators to use smart apparatuses so as to discover, concentrate, channel and assess the ideal data. The information preprocessing stage is the most significant stage during the time spent web mining and is basic and complex in fruitful extraction of helpful information. The web logs are circulated in nature also they are non-versatile and unfeasible. Subsequently we require a broad learning calculation so as to get the ideal data.

Download Full-text

Web Usage Minning using Patterns with Different Algorithm

VFAST Transactions on Software Engineering ◽

10.21015/vtse.v12i1.497 ◽

2017 ◽

pp. 1-9

Author(s):

Keyword(s):

Data Mining ◽

Web Usage Mining ◽

Data Discovery ◽

Web Log ◽

Data Usage ◽

Web Usage ◽

Log Files ◽

Content Mining ◽

User Data ◽

Data Content

Web usage mining is a part of data mining. Data usage mining is divided into three parts 1) Data content mining 2) Data structured mining 3) Data usage mining. In this paper I am discussing about log files which are used in data usage mining. Log files are used to store user’s activity in web server using websites. So that websites can be improved by gathering user data. Web usage mining having three sub parts which is reprocessing, data discovery and data analysis. Further, in this paper, details about web log files are discussed. Three algorithms are discussed which are used for patterns of log files. There comparison is showed in this paper with the help of graphs.

Download Full-text

EFFICIENT PREPROCESSING FOR WEB LOG COMPRESSION

International Journal of Computing ◽

10.47839/ijc.7.1.487 ◽

2014 ◽

pp. 35-42

Author(s):

Sebastian Deorowicz ◽

Szymon Grabowski

Keyword(s):

General Purpose ◽

Test Results ◽

Log Data ◽

Web Log ◽

Log Files ◽

User Activity ◽

Sequence Substitution

Web log files, storing user activity on a server, may grow at the pace of hundreds of megabytes a day, or even more, on popular sites. They are usually archived, as it enables further analysis, e.g., for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. Our method works on individual fields of log data (each storing such information like the client’s IP, date/time, requested file or query, download size in bytes, etc.), and utilizes such compression techniques like finding and extracting common prefixes and suffixes, dictionary-based phrase sequence substitution, move-to-front coding, and more. The test results show the proposed transform improves the average compression ratios 2.70 times in case of gzip and 1.86 times in case of bzip2.

Download Full-text

AN INDISCERNIBILITY APPROACH FOR PRE PROCESSING OF WEB LOG FILES

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2012.1147 ◽

2012 ◽

pp. 231-234

Author(s):

JEEVA JOSE ◽

P. SOJAN LAL

Keyword(s):

World Wide Web ◽

Set Theory ◽

Rough Set ◽

World Wide ◽

Rough Set Theory ◽

Web Usage Mining ◽

Web Log ◽

Log Files ◽

Challenging Tasks ◽

And Behavior

World Wide Web has a spectacular growth not only in terms of the number of websites and volume of information, but also in terms of the number of visitors. Web log files contain tremendous information about the user traffic and behavior. A large amount of pre processing is required for eliminating the noise and is one of the challenging tasks in web usage mining. This paper proposes an indiscernibility approach in rough set theory for pre processing of web log files.

Download Full-text

Mutual Information Pre-processing Based Broken-stick Linear Regression Technique for Web User Behaviour Pattern Mining

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2021.0228.24 ◽

2021 ◽

Vol 14 (1) ◽

pp. 244-256

Author(s):

Gokulapriya Raman ◽

◽

Ganesh Raj ◽

Keyword(s):

Linear Regression ◽

Mutual Information ◽

Pattern Mining ◽

False Positive Rate ◽

Behaviour Pattern ◽

Time Requirement ◽

User Behaviour ◽

Web Log ◽

Log Files ◽

Positive Rate

Web usage behaviour mining is a substantial research problem to be resolved as it identifies different user’s behaviour pattern by analysing web log files. But, accuracy of finding the usage behaviour of users frequently accessed web patterns was limited and also it requires more time. Mutual Information Pre-processing based Broken-Stick Linear Regression (MIP-BSLR) technique is proposed for refining the performance of web user behaviour pattern mining with higher accuracy. Initially, web log files from Apache web log dataset and NASA dataset are considered as input. Then, Mutual Information based Pre-processing (MI-P) method is applied to compute mutual dependence between the two web patterns. Based on the computed value, web access patterns which relevant are taken for further processing and irrelevant patterns are removed. After that, Broken-Stick Linear Regression analysis (BLRA) is performed in MIPBSLR for Web User Behaviour analysis. By applying the BLRA, the frequently visited web patterns are identified. With the identification of frequently visited web patterns, MIP-BSLR technique exactly predicts the usage behaviour of web users, and also increases the performance of web usage behaviour mining. Experimental evaluation of MIPBSLR method is conducted on factors such as pattern mining accuracy, false positives, time requirements and space requirements with respect to number of web patterns. Outcomes show that the proposed technique improves the pattern mining accuracy by 14%, and reduces the false positive rate by 52%, time requirement by 19% and space complexity by 21% using Apache web log dataset as compared to conventional methods. Similarly, the pattern mining accuracy of NASA dataset is increased by 16% with the reduction of false positive rate by 47%, time requirement by 20% and space complexity by 22% as compared to conventional methods.

Download Full-text

Conceptualizing Mining of Firm’s Web Log Files

Journal of Systems Science and Information ◽

10.21078/jssi-2017-489-22 ◽

2017 ◽

Vol 5 (6) ◽

pp. 489-510

Author(s):

Ruangsak Trakunphutthirak ◽

Yen Cheung ◽

Vincent C. S. Lee

Keyword(s):

Web Sites ◽

Decision Quality ◽

Business Decision ◽

Social Media Data ◽

External Resources ◽

Web Log ◽

Implicit Information ◽

Log Files ◽

Use Of Social Media ◽

Media Data

AbstractIn this era of a data-driven society, useful data (Big Data) is often unintentionally ignored due to lack of convenient tools and expensive software. For example, web log files can be used to identify explicit information of browsing patterns when users access web sites. Some hidden information, however, cannot be directly derived from the log files. We may need external resources to discover more knowledge from browsing patterns. The purpose of this study is to investigate the application of web usage mining based on web log files. The outcome of this study sets further directions of this investigation on what and how implicit information embedded in log files can be efficiently and effectively extracted. Further work involves combining the use of social media data to improve business decision quality.

Download Full-text