web server logs
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 9)

H-INDEX

10
(FIVE YEARS 1)

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Husna Sarirah Husin ◽  
James Thom ◽  
Xiuzhen Zhang

Purpose The purpose of the study is to use web serer logs in analyzing the changes of user behavior in reading online news, in terms of desktop and mobile users. Advances in mobile technology and social media have paved the way for online news consumption to evolve. There is an absence of research into the changes of user behavior in terms of desktop versus mobile users, particularly by analyzing the server logs. Design/methodology/approach In this paper, the authors investigate the evolution of user behavior using logs from the Malaysian newspaper Berita Harian Online in April 2012 and April 2017. Web usage mining techniques were used for pre-processing the logs and identifying user sessions. A Markov model is used to analyze navigation flows, and association rule mining is used to analyze user behavior within sessions. Findings It was found that page accesses have increased tremendously, particularly from Android phones, and about half of the requests in 2017 are referred from Facebook. Navigation flow between the main page, articles and section pages has changed from 2012 to 2017; while most users started navigation with the main page in 2012, readers often started with an article in 2017. Based on association rules, National and Sports are the most frequent section pages in 2012 and 2017 for desktop and mobile. However, based on the lift and conviction, these two sections are not read together in the same session as frequently as might be expected. Other less popular items have higher probability of being read together in a session. Research limitations/implications The localized data set is from Berita Harian Online; although unique to this particular newspaper, the findings and the methodology for investigating user behavior can be applied to other online news. On another note, the data set could be extended to be more than a month. Although initially data for the year 2012 was collected, unfortunately only the data for April 2012 is complete. Other months have missing days. Therefore, to make an impartial comparison for the evolution of user behavior in five years, the Web server logs for April 2017 were used. Originality/value The user behavior in 2012 and 2017 was compared using association rules and Markov flow. Different from existing studies analyzing online newspaper Web server logs, this paper uniquely investigates changes in user behavior as a result of mobile phones becoming a mainstream technology for accessing the Web.


Author(s):  
Alisa Bilal Zorić

Statistics is an old scientific discipline, but its application has never been more topical. Statistics is a branch of mathematics that deals with the collection, analysis, interpretation and presentation of data. Increased computing power had a huge impact on the popularisation of the practice of statistical science. With new technologies such as the internet of things, we start to collect data from various sources like web server logs, online transaction records, tweet streams, social media, data from all kinds of sensors. With increased access to big data, there is a need for professionals with applied statistics knowledge who can visualize and analyze data, make sense of it, and use it to solve real complex problems. Applied statistics is the root of data analysis, and the practice of applied statistics involves analyzing data to help define and determine organizational needs. Today we can find applied statistics in various fields such as medicine, information technology, engineering, finance, marketing, accounting, business, etc. The goal of this paper is to clarify the applied statistics, its principles and to present its application in various fields.


Author(s):  
Sayalee Ghule

Log records contain data generally Client Title, IP Address, Time Stamp, Get to Ask, number of Bytes Exchanged, Result Status, URL that Intimated, and Client Chairman. The log records are kept up by the internet servers. By analysing these log records gives a flawless thought to the client. The wide Web may be a solid store of web pages that gives the Net clients piles of information. With the change in the number and complexity of Websites, the degree of the net has gotten to be massively wide. Web Utilization Mining may be a division of web mining that consolidates the application of mining strategies to web server logs in coordination to expel the behaviour of clients. Log records contain basic data around the execution of a framework. This data is frequently utilized for investigating, operational profiling, finding quirks, recognizing security dangers, measuring execution,


2021 ◽  
Author(s):  
Jin Zhou

In this thesis, a novel method is proposed to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and then terms are extracted for each page in the session, meanwhile weights of terms are calculated. A new representation of web page from user's perspective is generated after going through the entire log. The new representation and the anchor-based representation are combined with original text-based representation. Two combination methods: combination of document representations and combination of ranking scores are investigated. In the experiments, three measurements are employed to evaluate the performance and the results show that for Cosine Similarity model, the highest improvement on top-10 precision is around 38%, for Okapi model, the hightest improvement is around 13%, for TFIDF model, the highest improvement is around 48% and for Indri model, the highest improvement is around 17%.


2021 ◽  
Author(s):  
Jin Zhou

In this thesis, a novel method is proposed to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and then terms are extracted for each page in the session, meanwhile weights of terms are calculated. A new representation of web page from user's perspective is generated after going through the entire log. The new representation and the anchor-based representation are combined with original text-based representation. Two combination methods: combination of document representations and combination of ranking scores are investigated. In the experiments, three measurements are employed to evaluate the performance and the results show that for Cosine Similarity model, the highest improvement on top-10 precision is around 38%, for Okapi model, the hightest improvement is around 13%, for TFIDF model, the highest improvement is around 48% and for Indri model, the highest improvement is around 17%.


2021 ◽  
Author(s):  
Ramon Abilio ◽  
Cristiano Garcia ◽  
Victor Fernandes

Browsing on Internet is part of the world population’s daily routine. The number of web pages is increasing and so is the amount of published content (news, tutorials, images, videos) provided by them. Search engines use web robots to index web contents and to offer better results to their users. However, web robots have also been used for exploiting vulnerabilities in web pages. Thus, monitoring and detecting web robots’ accesses is important in order to keep the web server as safe as possible. Data Mining methods have been applied to web server logs (used as data source) in order to detect web robots. Then, the main objective of this work was to observe evidences of definition or use of web robots detection by analyzing web server-side logs using Data Mining methods. Thus, we conducted a systematic Literature mapping, analyzing papers published between 2013 and 2020. In the systematic mapping, we analyzed 34 studies and they allowed us to better understand the area of web robots detection, mapping what is being done, the data used to perform web robots detection, the tools, and algorithms used in the Literature. From those studies, we extracted 33 machine learning algorithms, 64 features, and 13 tools. This study is helpful for researchers to find machine learning algorithms, features, and tools to detect web robots by analyzing web server logs.


Information decrease is the way toward limiting the measure of information that should be put away in an information stockpiling condition. Information decrease can build stockpiling effectiveness and lessen costs. Information cleaning act in the Data Preprocessing and Web Usage Mining. The work on information cleaning of web server logs, unessential things and futile information can not totally evacuated and Overlapped information causes trouble during information recovering from database. Right now, we present Ant Based Pattern Clustering Algorithm to get design information for mining .It likewise shows Log Cleaner that can sift through a lot of superfluous, conflicting information dependent on the basic of their URLs. Fundamentally right now are expelling undesirable records . so we are utilizing k-implies bunching calculation . By utilizing this exploration work we can apply this philosophy on web based business stage i.e AMAZON, FLIPKART.


Sign in / Sign up

Export Citation Format

Share Document