scholarly journals Multiple evidence combination for web site search using server log analysis

2021 ◽  
Author(s):  
Jin Zhou

In this thesis, a novel method is proposed to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and then terms are extracted for each page in the session, meanwhile weights of terms are calculated. A new representation of web page from user's perspective is generated after going through the entire log. The new representation and the anchor-based representation are combined with original text-based representation. Two combination methods: combination of document representations and combination of ranking scores are investigated. In the experiments, three measurements are employed to evaluate the performance and the results show that for Cosine Similarity model, the highest improvement on top-10 precision is around 38%, for Okapi model, the hightest improvement is around 13%, for TFIDF model, the highest improvement is around 48% and for Indri model, the highest improvement is around 17%.

2021 ◽  
Author(s):  
Jin Zhou

In this thesis, a novel method is proposed to improve the retrieval performance by using web server logs. Web server logs are grouped into different sessions and then terms are extracted for each page in the session, meanwhile weights of terms are calculated. A new representation of web page from user's perspective is generated after going through the entire log. The new representation and the anchor-based representation are combined with original text-based representation. Two combination methods: combination of document representations and combination of ranking scores are investigated. In the experiments, three measurements are employed to evaluate the performance and the results show that for Cosine Similarity model, the highest improvement on top-10 precision is around 38%, for Okapi model, the hightest improvement is around 13%, for TFIDF model, the highest improvement is around 48% and for Indri model, the highest improvement is around 17%.


Author(s):  
Jin Zhou ◽  
Chen Ding ◽  
Dimitrios Androutsos

2012 ◽  
Vol 3 (1) ◽  
pp. 30
Author(s):  
Mona M. Abu Al-Khair ◽  
M. Koutb ◽  
H. Kelash

Each year the number of consumers and the variety of their interests increase. As a result, providers are seeking ways to infer the customer's interests and to adapt their websites to make the content of interest more easily accessible. Assume that past navigation behavior as an indicator of the user's interests. Then, the records of this behavior, kept in the web-server logs, can be mined to extract the user's interests. On this principal, recommendations can be generated, to help old and new website's visitors to find the information about their interest faster.


2021 ◽  
Author(s):  
Ramon Abilio ◽  
Cristiano Garcia ◽  
Victor Fernandes

Browsing on Internet is part of the world population’s daily routine. The number of web pages is increasing and so is the amount of published content (news, tutorials, images, videos) provided by them. Search engines use web robots to index web contents and to offer better results to their users. However, web robots have also been used for exploiting vulnerabilities in web pages. Thus, monitoring and detecting web robots’ accesses is important in order to keep the web server as safe as possible. Data Mining methods have been applied to web server logs (used as data source) in order to detect web robots. Then, the main objective of this work was to observe evidences of definition or use of web robots detection by analyzing web server-side logs using Data Mining methods. Thus, we conducted a systematic Literature mapping, analyzing papers published between 2013 and 2020. In the systematic mapping, we analyzed 34 studies and they allowed us to better understand the area of web robots detection, mapping what is being done, the data used to perform web robots detection, the tools, and algorithms used in the Literature. From those studies, we extracted 33 machine learning algorithms, 64 features, and 13 tools. This study is helpful for researchers to find machine learning algorithms, features, and tools to detect web robots by analyzing web server logs.


Author(s):  
Yijun Gao

This study analyzed the Web server logs from the People's Daily Online and revealed some interesting findings: Pageview numbers of the mportant news in editors’ mind on the most obvious sections of the homepage, are not significantly different than those of the "common" news put on the less obvious sections.Cette étude a porté sur l'analyse des fichiers de journalisation de serveurs Web du Quotidien du Peuple en ligne et a révélé quelques données intéressantes : le nombre de pages vues pour les dépêches jugées importantes par la rédaction et placées en évidence de la page d'accueil n'est pas significativement différent du nombre de pages vues pour les dépêches plus « courantes » placés moins en évidence. 


Sign in / Sign up

Export Citation Format

Share Document