Handbook of Research on Web Log Analysis
Latest Publications


TOTAL DOCUMENTS

25
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781599049748, 9781599049755

Author(s):  
W. David Penniman

This historical review of the birth and evolution of transaction log analysis applied to information retrieval systems provides two perspectives. First, a detailed discussion of the early work in this area, and second, how this work has migrated into the evaluation of World Wide Web usage. The author describes the techniques and studies in the early years and makes suggestions for how that knowledge can be applied to current and future studies. A discussion of privacy issues with a framework for addressing the same is presented as well as an overview of the historical “eras” of transaction log analysis. The author concludes with the suggestion that a combination of transaction log analysis of the type used early in its application along with additional more qualitative approaches will be essential for a deep understanding of user behavior (and needs) with respect to current and future retrieval systems and their design.


Author(s):  
Michael Chau ◽  
Yan Lu ◽  
Xiao Fang ◽  
Christopher C. Yang

More non-English contents are now available on the World Wide Web and the number of non-English users on the Web is increasing. While it is important to understand the Web searching behavior of these non-English users, many previous studies on Web query logs have focused on analyzing English search logs and their results may not be directly applied to other languages. In this Chapter we discuss some methods and techniques that can be used to analyze search queries in Chinese. We also show an example of applying our methods on a Chinese Web search engine. Some interesting findings are reported.


Author(s):  
Elmer V. Bernstam ◽  
Jorge R. Herskovic ◽  
William R. Hersh

Clinicians, researchers and members of the general public are increasingly using information technology to cope with the explosion in biomedical knowledge. This chapter describes the purpose of query log analysis in the biomedical domain as well as features of the biomedical domain such as controlled vocabularies (ontologies) and existing infrastructure useful for query log analysis. We focus specifically on MEDLINE, which is the most comprehensive bibliographic database of the world’s biomedical literature, the PubMed interface to MEDLINE, the Medical Subject Headings vocabulary and the Unified Medical Language System. However, the approaches discussed here can also be applied to other query logs. We conclude with a look toward the future of biomedical query log analysis.


Author(s):  
Seda Ozmutlu ◽  
Huseyin C. Ozmutlu ◽  
Amanda Spink

This chapter emphasizes topic analysis and identification of search engine user queries. Topic analysis and identification of queries is an important task related to the discipline of information retrieval which is a key element for the development of successful personalized search engines. Topic identification of text is also no simple task, and a problem yet unsolved. The problem is even harder for search engine user queries due to real-time requirements and the limited number of terms in the user queries. The chapter includes a detailed literature review on topic analysis and identification, with an emphasis on search engine user queries, a survey of the analytical methods that have been and can be used, and the challenges and research opportunities related to topic analysis and identification.


Author(s):  
Seda Ozmutlu ◽  
Huseyin C. Ozmutlu ◽  
Amanda Spink

This chapter summarizes the progress of search engine user behavior analysis from search engine transaction log analysis to estimation of user behavior. Correct estimation of user information searching behavior paves the way to more successful and even personalized search engines. However, estimation of user behavior is not a simple task. It closely relates to natural language processing and human computer interaction, and requires preliminary analysis of user behavior and careful user profiling. This chapter details the studies performed on analysis and estimation of search engine user behavior, and surveys analytical methods that have been and can be used, and the challenges and research opportunities related to search engine user behavior or transaction log query analysis and estimation.


Author(s):  
Adriana Andrade Braga

This chapter explores the possibilities and limitations of nethnography, an ethnographic approach applied to the study of online interactions, particularly computer-mediated communication. In this chapter, a brief history of ethnography, including its relation to anthropological theories and its key methodological assumptions is addressed. Next, one of the most frequent methodologies applied to Internet settings, that is to treat logfiles as the only or main source of data, is explored, and its consequences are analyzed. In addition, some strategies related to a naturalistic perspective for data analysis are examined. Finally, an example of an ethnographic study, which involves participants of a Weblog, is presented to illustrate the potential for nethnography to enhance the study of CMC.


Author(s):  
Isak Taksa ◽  
Sarah Zelikovitz ◽  
Amanda Spink

Search query classification is a necessary step for a number of information retrieval tasks. This chapter presents an approach to non-hierarchical classification of search queries that focuses on two specific areas of machine learning: short text classification and limited manual labeling. Typically, search queries are short, display little class specific information per single query and are therefore a weak source for traditional machine learning. To improve the effectiveness of the classification process the chapter introduces background knowledge discovery by using information retrieval techniques. The proposed approach is applied to a task of age classification of a corpus of queries from a commercial search engine. In the process, various classification scenarios are generated and executed, providing insight into choice, significance and range of tuning parameters.


Author(s):  
Lee Rainie ◽  
Bernard J. Jansen

Every research methodology for data collection has both strengths and limitations, and this is certainly true for transaction log analysis. Therefore, researchers often need to use other data collection methods with transaction logs. In this chapter, we discuss surveys as a viable alternate method for transaction log analysis and then present a brief review of survey research literature, with a focus on the use of surveys for Web-related research. The chapter then identifies the steps in implementing survey research and designing a survey instrument. We conclude with a case study of a large electronic survey to illustrate what surveys in conjunction with transaction logs can bring to a research study.


Author(s):  
Mimi Zhang

In this chapter, we present the action-object pair approach as a conceptual framework for conducting transaction log analysis. We argue that there are two basic components in the interaction between the user and the system recorded in a transaction log, which are action and object. An action is a specific expression of the user. An object is a self-contained information object, the recipient of the action. These two components form one interaction set or an action-object pair. A series of action-object pairs represents the interaction session. The action-object pair approach provides a conceptual framework for the collection, analysis, and understanding of data from transaction logs. We believe that this approach can benefit system design by providing the organizing principle for implicit feedback and other interactions concerning the user and delivering, for example, personalized service to the user based on this feedback. Action-object pairs also provide a worthwhile approach to advance our theoretical and conceptual understanding of transaction log analysis as a research method.


Author(s):  
Udo Kruschwitz ◽  
Nick Webb ◽  
Richard Sutcliffe

The theme of this chapter is the improvement of Information Retrieval and Question Answering systems by the analysis of query logs. Two case studies are discussed. The first describes an intranet search engine working on a university campus which can present sophisticated query modifications to the user. It does this via a hierarchical domain model built using multi-word term co-occurrence data. The usage log was analysed using mutual information scores between a query and its refinement, between a query and its replacement, and between two queries occurring in the same session. The results can be used to validate refinements in the domain model, and to suggest replacements such as domain-dependent spelling corrections. The second case study describes a dialogue-based question answering system working over a closed document collection largely derived from the Web. Logs here are based around explicit sessions in which an analyst interacts with the system. Analysis of the logs has shown that certain types of interaction lead to increased precision of the results. Future versions of the system will encourage these forms of interaction. The conclusions of this chapter are firstly that there is a growing literature on query log analysis, much of it reviewed here, secondly that logs provide many forms of useful information for improving a system, and thirdly that mutual information measures taken with automatic term recognition algorithms and hierarchy construction techniques comprise one approach for enhancing system performance.


Sign in / Sign up

Export Citation Format

Share Document