Extracting knowledge from web server logs using web usage mining

Author(s):  
Mirghani. A. Eltahir ◽  
Anour F. A. Dafa-Alla

Information decrease is the way toward limiting the measure of information that should be put away in an information stockpiling condition. Information decrease can build stockpiling effectiveness and lessen costs. Information cleaning act in the Data Preprocessing and Web Usage Mining. The work on information cleaning of web server logs, unessential things and futile information can not totally evacuated and Overlapped information causes trouble during information recovering from database. Right now, we present Ant Based Pattern Clustering Algorithm to get design information for mining .It likewise shows Log Cleaner that can sift through a lot of superfluous, conflicting information dependent on the basic of their URLs. Fundamentally right now are expelling undesirable records . so we are utilizing k-implies bunching calculation . By utilizing this exploration work we can apply this philosophy on web based business stage i.e AMAZON, FLIPKART.


Author(s):  
P. K. Nizar Banu ◽  
H. Inbarani

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, applying data mining techniques to web server logs. Web usage mining has gained much attention as a potential approach to fulfill the requirement of web personalization. In this paper, the authors propose K-means biclustering, rough biclustering and fuzzy biclustering approaches to disclose the duality between users and pages by grouping them in both dimensions simultaneously. The simultaneous clustering of users and pages discovers biclusters that correspond to groups of users that exhibit highly correlated ratings on groups of pages. The results indicate that the fuzzy C-means biclustering algorithm best and is able to detect partial matching of preferences.


Author(s):  
P. K. Nizar Banu ◽  
H. Inbarani

As websites increase in complexity, locating needed information becomes a difficult task. Such difficulty is often related to the websites’ design but also ineffective and inefficient navigation processes. Research in web mining addresses this problem by applying techniques from data mining and machine learning to web data and documents. In this study, the authors examine web usage mining, applying data mining techniques to web server logs. Web usage mining has gained much attention as a potential approach to fulfill the requirement of web personalization. In this paper, the authors propose K-means biclustering, rough biclustering and fuzzy biclustering approaches to disclose the duality between users and pages by grouping them in both dimensions simultaneously. The simultaneous clustering of users and pages discovers biclusters that correspond to groups of users that exhibit highly correlated ratings on groups of pages. The results indicate that the fuzzy C-means biclustering algorithm best and is able to detect partial matching of preferences.


2012 ◽  
Vol 3 (1) ◽  
pp. 30
Author(s):  
Mona M. Abu Al-Khair ◽  
M. Koutb ◽  
H. Kelash

Each year the number of consumers and the variety of their interests increase. As a result, providers are seeking ways to infer the customer's interests and to adapt their websites to make the content of interest more easily accessible. Assume that past navigation behavior as an indicator of the user's interests. Then, the records of this behavior, kept in the web-server logs, can be mined to extract the user's interests. On this principal, recommendations can be generated, to help old and new website's visitors to find the information about their interest faster.


2021 ◽  
Author(s):  
Ramon Abilio ◽  
Cristiano Garcia ◽  
Victor Fernandes

Browsing on Internet is part of the world population’s daily routine. The number of web pages is increasing and so is the amount of published content (news, tutorials, images, videos) provided by them. Search engines use web robots to index web contents and to offer better results to their users. However, web robots have also been used for exploiting vulnerabilities in web pages. Thus, monitoring and detecting web robots’ accesses is important in order to keep the web server as safe as possible. Data Mining methods have been applied to web server logs (used as data source) in order to detect web robots. Then, the main objective of this work was to observe evidences of definition or use of web robots detection by analyzing web server-side logs using Data Mining methods. Thus, we conducted a systematic Literature mapping, analyzing papers published between 2013 and 2020. In the systematic mapping, we analyzed 34 studies and they allowed us to better understand the area of web robots detection, mapping what is being done, the data used to perform web robots detection, the tools, and algorithms used in the Literature. From those studies, we extracted 33 machine learning algorithms, 64 features, and 13 tools. This study is helpful for researchers to find machine learning algorithms, features, and tools to detect web robots by analyzing web server logs.


Sign in / Sign up

Export Citation Format

Share Document