scholarly journals Crowd Worker Strategies in Relevance Judgment Tasks

Author(s):  
Lei Han ◽  
Eddy Maddalena ◽  
Alessandro Checco ◽  
Cristina Sarasua ◽  
Ujwal Gadiraju ◽  
...  
Keyword(s):  
2020 ◽  
Vol 28 (3) ◽  
pp. 148-168
Author(s):  
Jin Zhang ◽  
Yuehua Zhao ◽  
Xin Cai ◽  
Taowen Le ◽  
Wei Fei ◽  
...  

Relevance judgment plays an extremely significant role in information retrieval. This study investigates the differences between American users and Chinese users in relevance judgment during the information retrieval process. 384 sets of relevance scores with 50 scores in each set were collected from 16 American users and 16 Chinese users as they judged retrieval records from two major search engines based on 24 predefined search tasks from 4 domain categories. Statistical analyses reveal that there are significant differences between American assessors and Chinese assessors in relevance judgments. Significant gender differences also appear within both the American and the Chinese assessor groups. The study also revealed significant interactions among cultures, genders, and subject categories. These findings can enhance the understanding of cultural impact on information retrieval and can assist in the design of effective cross-language information retrieval systems.


2010 ◽  
Vol 36 (6) ◽  
pp. 780-797 ◽  
Author(s):  
Panos Balatsoukas ◽  
Ann O'Brien ◽  
Anne Morris

2015 ◽  
Vol 67 (6) ◽  
pp. 700-714 ◽  
Author(s):  
Sri Devi Ravana ◽  
Prabha Rajagopal ◽  
Vimala Balakrishnan

Purpose – In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach – This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings – The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value – Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.


Author(s):  
Shaiful Bakhtiar bin Rodzman ◽  
Normaly Kamal Ismail ◽  
Nurazzah Abd Rahman ◽  
Syed Ahmad Aljunid ◽  
Zulhilmi Mohamed Nor ◽  
...  

<p>In this article, the researchers main contribution is to investigate three factors which may correlate in implementation of Expert Judgment Z-Numbers as new Fuzzy Logic Ranking Indicator such as: expert relevance judgment or score, the expert confidence and the level of expertise. The Expert Judgment Z-Numbers then will be an input to the Hierarchical Fuzzy Logic System of Domain Specific Text Retrieval, along with other indicators such as Ontology BM25 Score, Fabrication Rate, Shia Rate and Positive Rate of hadith document. The results showed, the proposed system, with the additional new indicator of Expert Judgment Z-Numbers, may improve the original BM25 ranking function, by yielding better results on 26 queries, on all evaluation metrics that are measured in this research such as P@10, %no measures and MAP, and has achieved better results in 28 queries on P@10 alone, compared to the BM25 original score, that only yield better results in 2 queries on all evaluation metrics, and also yield better results in 4 queries on the MAP alone. The results proved that the proposed system has a capability to utilize the expert confidence and their relevant judgment that are represented in Z-Number, as an indicator to optimize the existing ranking function system and has a potential for a further research to be conducted on these domains. For the future works, the researchers would like to enhance this research by including a variety of expert’s level confidence and their judgment, also a new calculation to represent the value of Z-Numbers.</p>


Sign in / Sign up

Export Citation Format

Share Document