scholarly journals Building High-Quality Datasets for Information Retrieval Evaluation at a Reduced Cost

Proceedings ◽  
2019 ◽  
Vol 21 (1) ◽  
pp. 33
Author(s):  
David Otero ◽  
Daniel Valcarce ◽  
Javier Parapar ◽  
Álvaro Barreiro

Information Retrieval is not any more exclusively about document ranking. Continuously new tasks are proposed on this and sibling fields. With this proliferation of tasks, it becomes crucial to have a cheap way of constructing test collections to evaluate the new developments. Building test collections is time and resource consuming: it requires time to obtain the documents, to define the user needs and it requires the assessors to judge a lot of documents. To reduce the latest, pooling strategies aim to decrease the assessment effort by presenting to the assessors a sample of documents in the corpus with the maximum number of relevant documents in it. In this paper, we propose the preliminary design of different techniques to easily and cheapily build high-quality test collections without the need of having participants systems.

2020 ◽  
Vol 54 (2) ◽  
pp. 1-2
Author(s):  
Dan Li

The availability of test collections in Cranfield paradigm has significantly benefited the development of models, methods and tools in information retrieval. Such test collections typically consist of a set of topics, a document collection and a set of relevance assessments. Constructing these test collections requires effort of various perspectives such as topic selection, document selection, relevance assessment, and relevance label aggregation etc. The work in the thesis provides a fundamental way of constructing and utilizing test collections in information retrieval in an effective, efficient and reliable manner. To that end, we have focused on four aspects. We first study the document selection issue when building test collections. We devise an active sampling method for efficient large-scale evaluation [Li and Kanoulas, 2017]. Different from past sampling-based approaches, we account for the fact that some systems are of higher quality than others, and we design the sampling distribution to over-sample documents from these systems. At the same time, the estimated evaluation measures are unbiased, and assessments can be used to evaluate new, novel systems without introducing any systematic error. Then a natural further step is determining when to stop the document selection and assessment procedure. This is an important but understudied problem in the construction of test collections. We consider both the gain of identifying relevant documents and the cost of assessing documents as the optimization goals. We handle the problem under the continuous active learning framework by jointly training a ranking model to rank documents, and estimating the total number of relevant documents in the collection using a "greedy" sampling method [Li and Kanoulas, 2020]. The next stage of constructing a test collection is assessing relevance. We study how to denoise relevance assessments by aggregating from multiple crowd annotation sources to obtain high-quality relevance assessments. This helps to boost the quality of relevance assessments acquired in a crowdsourcing manner. We assume a Gaussian process prior on query-document pairs to model their correlation. The proposed model shows good performance in terms of interring true relevance labels. Besides, it allows predicting relevance labels for new tasks that has no crowd annotations, which is a new functionality of CrowdGP. Ablation studies demonstrate that the effectiveness is attributed to the modelling of task correlation based on the axillary information of tasks and the prior relevance information of documents to queries. After a test collection is constructed, it can be used to either evaluate retrieval systems or train a ranking model. We propose to use it to optimize the configuration of retrieval systems. We use Bayesian optimization approach to model the effect of a δ -step in the configuration space to the effectiveness of the retrieval system, by suggesting to use different similarity functions (covariance functions) for continuous and categorical values, and examine their ability to effectively and efficiently guide the search in the configuration space [Li and Kanoulas, 2018]. Beyond the algorithmic and empirical contributions, work done as part of this thesis also contributed to the research community as the CLEF Technology Assisted Reviews in Empirical Medicine Tracks in 2017, 2018, and 2019 [Kanoulas et al., 2017, 2018, 2019]. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Evangelos Kanoulas. Available at: https://dare.uva.nl/search?identifier=3438a2b6-9271-4f2c-add5-3c811cc48d42.


2016 ◽  
Vol 19 (3) ◽  
pp. 225-229 ◽  
Author(s):  
Falk Scholer ◽  
Diane Kelly ◽  
Ben Carterette

2014 ◽  
Vol 48 (1) ◽  
pp. 21-28 ◽  
Author(s):  
Krisztian Balog ◽  
David Elsweiler ◽  
Evangelos Kanoulas ◽  
Liadh Kelly ◽  
Mark D. Smucker

2021 ◽  
Vol 11 (10) ◽  
pp. 4617
Author(s):  
Daehee Park ◽  
Cheoljun Lee

Because smartphones support various functions, they are carried by users everywhere. Whenever a user believes that a moment is interesting, important, or meaningful to them, they can record a video to preserve such memories. The main problem with video recording an important moment is the fact that the user needs to look at the scene through the mobile phone screen rather than seeing the actual real-world event. This occurs owing to uncertainty the user might feel when recording the video. For example, the user might not be sure if the recording is of high-quality and might worry about missing the target object. To overcome this, we developed a new camera application that utilizes two main algorithms, the minimum output sum of squared error and the histograms of oriented gradient algorithms, to track the target object and recognize the direction of the user’s head. We assumed that the functions of the new camera application can solve the user’s anxiety while recording a video. To test the effectiveness of the proposed application, we conducted a case study and measured the emotional responses of users and the error rates based on a comparison with the use of a regular camera application. The results indicate that the new camera application induces greater feelings of pleasure, excitement, and independence than a regular camera application. Furthermore, it effectively reduces the error rates during video recording.


Sign in / Sign up

Export Citation Format

Share Document