The Research of Preprocessing and Pattern Discovery Techniques on Web Log Files

Author(s):  
P. Dhanalakshmi ◽  
K. Ramani ◽  
B. Eswara Reddy
Keyword(s):  
Web Log ◽  
Big Data ◽  
2016 ◽  
pp. 899-928
Author(s):  
Abubakr Gafar Abdalla ◽  
Tarig Mohamed Ahmed ◽  
Mohamed Elhassan Seliaman

The web is a rich data mining source which is dynamic and fast growing, providing great opportunities which are often not exploited. Web data represent a real challenge to traditional data mining techniques due to its huge amount and the unstructured nature. Web logs contain information about the interactions between visitors and the website. Analyzing these logs provides insights into visitors' behavior, usage patterns, and trends. Web usage mining, also known as web log mining, is the process of applying data mining techniques to discover useful information hidden in web server's logs. Web logs are primarily used by Web administrators to know how much traffic they get and to detect broken links and other types of errors. Web usage mining extracts useful information that can be beneficial to a number of application areas such as: web personalization, website restructuring, system performance improvement, and business intelligence. The Web usage mining process involves three main phases: pre-processing, pattern discovery, and pattern analysis. Various preprocessing techniques have been proposed to extract information from log files and group primitive data items into meaningful, lighter level abstractions that are suitable for mining, usually in forms of visitors' sessions. Major data mining techniques in web usage mining pattern discovery are: clustering, association analysis, classification, and sequential patterns discovery. This chapter discusses the process of web usage mining, its procedure, methods, and patterns discovery techniques. The chapter also presents a practical example using real web log data.


Author(s):  
Serra Çelik

This chapter focuses on predicting web user behaviors. When web users enter a website, every move they make on that website is stored as web log files. Unlike the focus group or questionnaire, the log files reflect real user behavior. It can easily be said that having actual user behavior is a gold value for the organizations. In this chapter, the ways of extracting user patterns (user behavior) from the log files are sought. In this context, the web usage mining process is explained. Some web usage mining techniques are mentioned.


2020 ◽  
Vol 9 (1) ◽  
pp. 1045-1050

Nowadays, WWW has grown into significant and vast data storage. Every one of clients' exercises will be put away in log record. The log file shows the eagerness on the website. With an abundant use of web, the log file size is developing hurriedly. Web mining is a utilization of information digging innovations for immense information storehouses. It is the procedure of uncover data from web information. Before applying web mining procedures, the information in the web log must be pre-processed, consolidated and changed. It is essential for the web excavators to use smart apparatuses so as to discover, concentrate, channel and assess the ideal data. The information preprocessing stage is the most significant stage during the time spent web mining and is basic and complex in fruitful extraction of helpful information. The web logs are circulated in nature also they are non-versatile and unfeasible. Subsequently we require a broad learning calculation so as to get the ideal data.


Author(s):  

Web usage mining is a part of data mining. Data usage mining is divided into three parts 1) Data content mining 2) Data structured mining 3) Data usage mining. In this paper I am discussing about log files which are used in data usage mining. Log files are used to store user’s activity in web server using websites. So that websites can be improved by gathering user data. Web usage mining having three sub parts which is reprocessing, data discovery and data analysis. Further, in this paper, details about web log files are discussed. Three algorithms are discussed which are used for patterns of log files. There comparison is showed in this paper with the help of graphs.


2014 ◽  
pp. 35-42
Author(s):  
Sebastian Deorowicz ◽  
Szymon Grabowski

Web log files, storing user activity on a server, may grow at the pace of hundreds of megabytes a day, or even more, on popular sites. They are usually archived, as it enables further analysis, e.g., for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. Our method works on individual fields of log data (each storing such information like the client’s IP, date/time, requested file or query, download size in bytes, etc.), and utilizes such compression techniques like finding and extracting common prefixes and suffixes, dictionary-based phrase sequence substitution, move-to-front coding, and more. The test results show the proposed transform improves the average compression ratios 2.70 times in case of gzip and 1.86 times in case of bzip2.


Author(s):  
JEEVA JOSE ◽  
P. SOJAN LAL

World Wide Web has a spectacular growth not only in terms of the number of websites and volume of information, but also in terms of the number of visitors. Web log files contain tremendous information about the user traffic and behavior. A large amount of pre processing is required for eliminating the noise and is one of the challenging tasks in web usage mining. This paper proposes an indiscernibility approach in rough set theory for pre processing of web log files.


2021 ◽  
Vol 14 (1) ◽  
pp. 244-256
Author(s):  
Gokulapriya Raman ◽  
◽  
Ganesh Raj ◽  

Web usage behaviour mining is a substantial research problem to be resolved as it identifies different user’s behaviour pattern by analysing web log files. But, accuracy of finding the usage behaviour of users frequently accessed web patterns was limited and also it requires more time. Mutual Information Pre-processing based Broken-Stick Linear Regression (MIP-BSLR) technique is proposed for refining the performance of web user behaviour pattern mining with higher accuracy. Initially, web log files from Apache web log dataset and NASA dataset are considered as input. Then, Mutual Information based Pre-processing (MI-P) method is applied to compute mutual dependence between the two web patterns. Based on the computed value, web access patterns which relevant are taken for further processing and irrelevant patterns are removed. After that, Broken-Stick Linear Regression analysis (BLRA) is performed in MIPBSLR for Web User Behaviour analysis. By applying the BLRA, the frequently visited web patterns are identified. With the identification of frequently visited web patterns, MIP-BSLR technique exactly predicts the usage behaviour of web users, and also increases the performance of web usage behaviour mining. Experimental evaluation of MIPBSLR method is conducted on factors such as pattern mining accuracy, false positives, time requirements and space requirements with respect to number of web patterns. Outcomes show that the proposed technique improves the pattern mining accuracy by 14%, and reduces the false positive rate by 52%, time requirement by 19% and space complexity by 21% using Apache web log dataset as compared to conventional methods. Similarly, the pattern mining accuracy of NASA dataset is increased by 16% with the reduction of false positive rate by 47%, time requirement by 20% and space complexity by 22% as compared to conventional methods.


2017 ◽  
Vol 5 (6) ◽  
pp. 489-510
Author(s):  
Ruangsak Trakunphutthirak ◽  
Yen Cheung ◽  
Vincent C. S. Lee

AbstractIn this era of a data-driven society, useful data (Big Data) is often unintentionally ignored due to lack of convenient tools and expensive software. For example, web log files can be used to identify explicit information of browsing patterns when users access web sites. Some hidden information, however, cannot be directly derived from the log files. We may need external resources to discover more knowledge from browsing patterns. The purpose of this study is to investigate the application of web usage mining based on web log files. The outcome of this study sets further directions of this investigation on what and how implicit information embedded in log files can be efficiently and effectively extracted. Further work involves combining the use of social media data to improve business decision quality.


Sign in / Sign up

Export Citation Format

Share Document