scholarly journals Pillar 3–Pre-processed web server log file dataset of the banking institution

Data in Brief ◽  
2021 ◽  
Vol 39 ◽  
pp. 107672 ◽  
Author(s):  
Michal Munk ◽  
Anna Pilkova ◽  
Ľubomír Benko ◽  
Petra Blazekova ◽  
Peter Svec
Author(s):  
Jozef Kapusta ◽  
Michal Munk ◽  
Dominik Halvoník ◽  
Martin Drlík

If we are talking about user behavior analytics, we have to understand what the main source of valuable information is. One of these sources is definitely a web server. There are multiple places where we can extract the necessary data. The most common ways are to search for these data in access log, error log, custom log files of web server, proxy server log file, web browser log, browser cookies etc. A web server log is in its default form known as a Common Log File (W3C, 1995) and keeps information about IP address; date and time of visit; ac-cessed and referenced resource. There are standardized methodologies which contain several steps leading to extract new knowledge from provided data. Usu-ally, the first step is in each one of them to identify users, users’ sessions, page views, and clickstreams. This process is called pre-processing. Main goal of this stage is to receive unprocessed web server log file as input and after processing outputs meaningful representations which can be used in next phase. In this pa-per, we describe in detail user session identification which can be considered as most important part of data pre-processing. Our paper aims to compare the us-er/session identification using the STT with the identification of user/session us-ing cookies. This comparison was performed concerning the quality of the se-quential rules generated, i.e., a comparison was made regarding generation useful, trivial and inexplicable rules.


Data Mining ◽  
2013 ◽  
pp. 1312-1319
Author(s):  
Marco Scarnò

CASPUR allows many academic Italian institutions located in the Centre-South of Italy to access more than 7 million articles through a digital library platform. The behaviour of its users were analyzed by considering their “traces”, which are stored in the web server log file. Using several web mining and data mining techniques the author discovered a gradual and dynamic change in the way articles are accessed. In particular there is evidence of a journal browsing increase in comparison to the searching mode. Such phenomenon were interpreted using the idea that browsing better meets the needs of users when they want to keep abreast about the latest advances in their scientific field, in comparison to a more generic searching inside the digital library.


Author(s):  
R. Rathipriya ◽  
K. Thangavel

This chapter focuses on recommender systems based on the coherent user's browsing patterns. Biclustering approach is used to discover the aggregate usage profiles from the preprocessed Web data. A combination of Discrete Artificial Bees Colony Optimization and Simulated Annealing technique is used for optimizing the aggregate usage profiles from the preprocessed clickstream data. Web page recommendation process is structured in to two components performed online and offline with respect to Web server activity. Offline component builds the usage profiles or usage models by analyzing historical data, such as server access log file or Web logs from the server using hybrid biclustering approach. Recommendation process is the online component. Current user's session is used in the online component for capturing the user's interest so as to recommend pages to the user for next navigation. The experiment was conducted on the benchmark clickstream data (i.e. MSNBC dataset and MSWEB dataset from UCI repository). The results signify the improved prediction accuracy of recommendations using biclustering approach.


2018 ◽  
Vol 3 (1) ◽  
pp. 37
Author(s):  
Harni Yusnidar Muhammad ◽  
Jasni Mohamad Zain

One of the most significant information resources that often overlooked and it is mostly owned by the modern organization today is logs data. Likewise, logs data analytics is practised in many industries for different purposes, including website/system performance improvement, web development, information architecture, web-based campaigns/programs, network traffic monitoring, e-commerce optimization, marketing/advertising, etc. Many tools or approaches are available for this purpose, some are proprietary and some are open source. Studying the nature of these tools in finding the suitable and the right log analyzer in order to perform log analytics economically, efficiently and effectively will give advantages to the organization towards utilizing the primary source of information for identifying the system threats and problems that occur in the system at any time through Visualizing Insights of source using Elastic Stack. These kinds of threats and problems which existed in the system can be identified by analyzing the log file and finding the patterns for possible suspicious behaviour. A case study of UMMAIL’s access logs is proposed to visualise web server logs. The system administrator's concern can then be furnished with an appropriate infographics representation regarding these security threats and problems in the system, which are generated after the log files, are analysed. Based on this signs the administrator can take appropriate actions.


2010 ◽  
Vol 2 (2) ◽  
pp. 52-59
Author(s):  
Marco Scarnò

CASPUR allows many academic Italian institutions located in the Centre-South of Italy to access more than 7 million articles through a digital library platform. The behaviour of its users were analyzed by considering their “traces”, which are stored in the web server log file. Using several web mining and data mining techniques the author discovered a gradual and dynamic change in the way articles are accessed. In particular there is evidence of a journal browsing increase in comparison to the searching mode. Such phenomenon were interpreted using the idea that browsing better meets the needs of users when they want to keep abreast about the latest advances in their scientific field, in comparison to a more generic searching inside the digital library.


2016 ◽  
Vol 151 (3) ◽  
pp. 32-36
Author(s):  
Sweta Singh ◽  
Prashant Shukla
Keyword(s):  

Author(s):  
Minh-Tri Nguyen ◽  
Thanh-Dang Diep ◽  
Tran Hoang Vinh ◽  
Takuma Nakajima ◽  
Nam Thoai
Keyword(s):  

Author(s):  
Siti Fairuz Nurr Sadikan ◽  
Azizul Azhar Ramli ◽  
Mohd Farhan Md. Fudzee ◽  
Siti Sapura Jailani ◽  
Mohd Ali Mohd Isa ◽  
...  

<span>A Web server log files contain an entire record of the user’s browsing history such as referrer, date and time access, path, operating system (OS), browser and IP address. User navigation pattern discovery involves learning of user’s browsing behaviour to gain the pattern from web server log file. This paper emphasizes on identifying user navigation pattern from web server log file data of iLearn portal. The study implements the framework for user navigation including phases of acquisition of weblog, log query parser, preprocessor, navigational pattern modelling, clustering, and classification. This study is conducted in the context of the actual data logs of the iLearn portal of Universiti Teknologi MARA (UiTM). This study revealed the navigational patterns of online learners which relatively related to their intake or group along the semester of 14 weeks. Besides, access patterns for students along the semester are different and can be classified into three (3) quarter, namely Q1, Q2 and Q3 based on the total of week per semester. Future work will focus on the development of prototype to improve the security of online learning especially during the assessment progress such as online quiz, test and examination.</span>


Author(s):  
Sayalee Ghule

Log records contain data generally Client Title, IP Address, Time Stamp, Get to Ask, number of Bytes Exchanged, Result Status, URL that Intimated, and Client Chairman. The log records are kept up by the internet servers. By analysing these log records gives a flawless thought to the client. The wide Web may be a solid store of web pages that gives the Net clients piles of information. With the change in the number and complexity of Websites, the degree of the net has gotten to be massively wide. Web Utilization Mining may be a division of web mining that consolidates the application of mining strategies to web server logs in coordination to expel the behaviour of clients. Log records contain basic data around the execution of a framework. This data is frequently utilized for investigating, operational profiling, finding quirks, recognizing security dangers, measuring execution,


Sign in / Sign up

Export Citation Format

Share Document