Evaluating the effectiveness of information retrieval systems using effort-based relevance judgment

2019 ◽  
Vol 71 (1) ◽  
pp. 2-17
Author(s):  
Prabha Rajagopal ◽  
Sri Devi Ravana ◽  
Yun Sing Koh ◽  
Vimala Balakrishnan

Purpose The effort in addition to relevance is a major factor for satisfaction and utility of the document to the actual user. The purpose of this paper is to propose a method in generating relevance judgments that incorporate effort without human judges’ involvement. Then the study determines the variation in system rankings due to low effort relevance judgment in evaluating retrieval systems at different depth of evaluation. Design/methodology/approach Effort-based relevance judgments are generated using a proposed boxplot approach for simple document features, HTML features and readability features. The boxplot approach is a simple yet repeatable approach in classifying documents’ effort while ensuring outlier scores do not skew the grading of the entire set of documents. Findings The retrieval systems evaluation using low effort relevance judgments has a stronger influence on shallow depth of evaluation compared to deeper depth. It is proved that difference in the system rankings is due to low effort documents and not the number of relevant documents. Originality/value Hence, it is crucial to evaluate retrieval systems at shallow depth using low effort relevance judgments.

2020 ◽  
Vol 28 (3) ◽  
pp. 148-168
Author(s):  
Jin Zhang ◽  
Yuehua Zhao ◽  
Xin Cai ◽  
Taowen Le ◽  
Wei Fei ◽  
...  

Relevance judgment plays an extremely significant role in information retrieval. This study investigates the differences between American users and Chinese users in relevance judgment during the information retrieval process. 384 sets of relevance scores with 50 scores in each set were collected from 16 American users and 16 Chinese users as they judged retrieval records from two major search engines based on 24 predefined search tasks from 4 domain categories. Statistical analyses reveal that there are significant differences between American assessors and Chinese assessors in relevance judgments. Significant gender differences also appear within both the American and the Chinese assessor groups. The study also revealed significant interactions among cultures, genders, and subject categories. These findings can enhance the understanding of cultural impact on information retrieval and can assist in the design of effective cross-language information retrieval systems.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Parnia Samimi ◽  
Sri Devi Ravana

Test collection is used to evaluate the information retrieval systems in laboratory-based evaluation experimentation. In a classic setting, generating relevance judgments involves human assessors and is a costly and time consuming task. Researchers and practitioners are still being challenged in performing reliable and low-cost evaluation of retrieval systems. Crowdsourcing as a novel method of data acquisition is broadly used in many research fields. It has been proven that crowdsourcing is an inexpensive and quick solution as well as a reliable alternative for creating relevance judgments. One of the crowdsourcing applications in IR is to judge relevancy of query document pair. In order to have a successful crowdsourcing experiment, the relevance judgment tasks should be designed precisely to emphasize quality control. This paper is intended to explore different factors that have an influence on the accuracy of relevance judgments accomplished by workers and how to intensify the reliability of judgments in crowdsourcing experiment.


2018 ◽  
Vol 36 (3) ◽  
pp. 445-456 ◽  
Author(s):  
Shahram Sedghi ◽  
Zeinab Shormeij ◽  
Iman Tahamtan

Purpose Information seeking is an interactive behaviour of the end users with information systems, which occurs in a real environment known as context. Context affects information-seeking behaviour in many different ways. The purpose of this paper is to investigate the factors that potentially constitute the context of visual information seeking. Design/methodology/approach The authors used a Straussian version of grounded theory, a qualitative approach, to conduct the study. Using a purposive sampling method, 28 subjects participated in the study. The data were analysed using open, axial and selective coding in MAXQDA software. Findings The contextual factors influencing visual information seeking were classified into seven categories, including: “user characteristics”, “general search features”, “visual search features”, “display of results”, “accessibility of results”, “task type” and “environmental factors”. Practical/implications This study contributes to a better understanding of how people conduct searches in and interact with visual search interfaces. Results have important implications for the designers of information retrieval systems. Originality/value This paper is among the pioneer studies investigating contextual factors influencing information seeking in visual information retrieval systems.


2015 ◽  
Vol 67 (6) ◽  
pp. 700-714 ◽  
Author(s):  
Sri Devi Ravana ◽  
Prabha Rajagopal ◽  
Vimala Balakrishnan

Purpose – In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach – This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings – The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value – Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.


2020 ◽  
Vol 38 (3) ◽  
pp. 477-492
Author(s):  
Mahdi Zeynali Tazehkandi ◽  
Mohsen Nowkarizi

Purpose The purpose of this paper is to present a review on the use of the recall metric for evaluating information retrieval systems, especially search engines. Design/methodology/approach This paper investigates different researchers’ views about recall metrics. Findings Five different definitions for recall were identified. For the first group, recall refers to completeness, but it does not specify where all the relevant documents are located. For the second group, recall refers to retrieving all the relevant documents from the collection. However, it seems that the term “collection” is ambiguous. For the third group (first approach), collection means the index of search engines and, for the fourth group (second approach), collection refers to the Web. For the fifth group (third approach), ranking of the retrieved documents should also be accounted for in calculating recall. Practical implications It can be said that in the first, second and third approaches, the components of the retrieval algorithm, the retrieval algorithm and crawler, and the retrieval algorithm and crawler and ranker, respectively, are evaluated. To determine the effectiveness of search engines for the use of users, it is better to use the third approach in recall measurement. Originality/value The value of this paper is to collect, identify and analyse literature that is used in recall. In addition, different views of researchers about recall are identified.


2015 ◽  
Vol 67 (4) ◽  
pp. 408-421
Author(s):  
Sri Devi Ravana ◽  
MASUMEH SADAT TAHERI ◽  
Prabha Rajagopal

Purpose – The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach – Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document’s weight, which play the role of the mean average precision (MAP) score of the systems as a significance test’s statics. The experiments were conducted using the TREC 9 Web track collection. Findings – The p-values generated through the two types of significance tests, namely the Student’s t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value – Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.


Author(s):  
Theresa Dirndorfer Anderson

This chapter uses a study of human assessments of relevance to demonstrate how individual relevance judgments and retrieval practices embody collaborative elements that contribute to the overall progress of that person’s individual work. After discussing key themes of the conceptual framework, the chapter will discuss two case studies that serve as powerful illustrations of these themes for researchers and practitioners alike. These case studies, outcomes of a two-year ethnographic exploration of research practices, illustrate the theoretical position presented in part one of the chapter, providing lessons for the ways that people work with information systems to generate knowledge and the conditions that will support these practices. The chapter shows that collaboration does not have to be explicit to influence searcher behavior. It seeks to present both a theoretical framework and case studies that can be applied to the design, development and evaluation of collaborative information retrieval systems.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sanaz Manouchehri ◽  
Mahdieh Mirzabeigi ◽  
Tahere Jowkar

PurposeThis paper aims to discover the effectiveness of Farsi-English query using ontology.Design/methodology/approachThe present study is quasi-experimental. The sample consisted of 60 students and graduate and doctoral staff from Shiraz University and the Regional Center for Science and Technology. A researcher-made questionnaire was used to assess the level of English language proficiency of users, background knowledge and their level of satisfaction with search results before and after using ontology. Each user also evaluated the relevance of the top ten results on the Google search engine results page before and after using ontology.FindingsThe findings showed that the level of complexity of the task, the use of ontology, the interactive effect of the level of complexity of the task with the domain knowledge of the users, and the interactive effect of the level of complexity of the task with ontology, influence the effectiveness of retrieval results from the users' point of view. The results of the present study also showed that the level of complexity of the task, the use of ontology, and the interactive effect of the level of complexity of the task and the use of ontology, affect the level of user satisfaction.Originality/valueThe results of this research are significant in both theoretical and practical aspects. Theoretically, given the lack of research in which the interactive effect of the use of ontology has examined the level of complexity of tasks and domain knowledge of users, the present study can be considered as an attempt to improve information retrieval systems. From a practical point of view, the results of this research will help researchers and designers of information retrieval systems to understand that the use of ontologies can be used to retrieve information and improve the query and assess the needs of users and their satisfaction in this field, and ultimately, making the information retrieval process more effective.


2018 ◽  
Vol 36 (1) ◽  
pp. 55-70 ◽  
Author(s):  
Sanjeev K. Sunny ◽  
Mallikarjun Angadi

Purpose The purpose of this study is to carry out a systematic literature review for evidence-based assessment of the effectiveness of thesaurus in digital information retrieval systems. It also aimed to identify the evaluation methods, evaluation measures and data collection tools which may be used in evaluating digital information retrieval systems. Design/methodology/approach A systematic literature review (SLR) of 344 publications from LISA and 238 from Scopus has been carried out to identify the evaluation studies for analysis, and 15 evaluation studies have been analyzed. Findings This study presents evidences for the effectiveness of thesaurus in digital information retrieval systems. Various methods for evaluating digital information systems have been identified. Also, a wide range of evaluation measures and data collection tools have been identified. Research limitations/implications The study was limited to the literature published in English language and indexed in LISA and Scopus. The evaluation methods, evaluation measures and data collection tools identified in this study may be used to design more cognizant evaluation studies for digital information retrieval systems. Practical implications The findings have significant implications for the administrators of any type of digital information retrieval systems in making more informed decisions toward implementation of thesaurus in resource description and access to digital collections. Originality/value This study extends our knowledge on the potentials of thesauri in digital information retrieval systems. It also provides cues for designing more cognizant evaluation studies for digital information systems.


2018 ◽  
Vol 36 (3) ◽  
pp. 430-444
Author(s):  
Sholeh Arastoopoor

Purpose The degree to which a text is considered readable depends on the capability of the reader. This assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. This paper aims to examine the potential use of concept-based readability measures along with classic measures for re-ranking search results in information retrieval systems, specifically in the Persian language. Design/methodology/approach Flesch–Dayani as a classic readability measure along with document scope (DS) and document cohesion (DC) as domain-specific measures have been applied for scoring the retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the field of computer science and information technology (IT). The re-ranked result has been compared with the ranking of potential users regarding their readability. Findings The results show that there is a difference among subcategories of the computer science and IT field according to their readability and understandability. This study also shows that it is possible to develop a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents, the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants in both groups. Practical implications The findings of this study would foster a new option in re-ranking search results based on their difficulty for experts and non-experts in different fields. Originality/value The findings and the two-mode re-ranking model proposed in this paper along with its primary focus on domain-specific readability in the Persian language would help Web search engines and online databases in further refining the search results in pursuit of retrieving useful texts for users with differing expertise.


Sign in / Sign up

Export Citation Format

Share Document