2AIRTC: The Amharic Adhoc Information Retrieval Test Collection

This paper presents a new possibilistic information retrieval system using semantic query expansion. The work is involved in query expansion strategies based on external linguistic resources. In this case, the authors exploited the French dictionary “Le Grand Robert”. First, they model the dictionary as a graph and compute similarities between query terms by exploiting the circuits in the graph. Second, the possibility theory is used by taking advantage of a double relevance measure (possibility and necessity) between the articles of the dictionary and query terms. Third, these two approaches are combined by using two different aggregation methods. The authors also benefit from an existing approach for reweighting query terms in the possibilistic matching model to improve the expansion process. In order to assess and compare the approaches, the authors performed experiments on the standard ‘LeMonde94’ test collection.

Download Full-text

TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa091 ◽

2020 ◽

Vol 27 (9) ◽

pp. 1431-1436 ◽

Cited By ~ 11

Author(s):

Kirk Roberts ◽

Tasmeer Alam ◽

Steven Bedrick ◽

Dina Demner-Fushman ◽

Kyle Lo ◽

...

Keyword(s):

Information Retrieval ◽

Clinical Research ◽

Data Use ◽

Iteration Process ◽

Assessment Process ◽

Evaluation Metrics ◽

Test Collection ◽

Shared Task ◽

Relevance Criteria ◽

Research Questions

Abstract TREC-COVID is an information retrieval (IR) shared task initiated to support clinicians and clinical research during the COVID-19 pandemic. IR for pandemics breaks many normal assumptions, which can be seen by examining 9 important basic IR research questions related to pandemic situations. TREC-COVID differs from traditional IR shared task evaluations with special considerations for the expected users, IR modality considerations, topic development, participant requirements, assessment process, relevance criteria, evaluation metrics, iteration process, projected timeline, and the implications of data use as a post-task test collection. This article describes how all these were addressed for the particular requirements of developing IR systems under a pandemic situation. Finally, initial participation numbers are also provided, which demonstrate the tremendous interest the IR community has in this effort.

Download Full-text

Mahak: A Test Collection for Evaluation of Farsi Information Retrieval Systems

2007 IEEE/ACS International Conference on Computer Systems and Applications ◽

10.1109/aiccsa.2007.370697 ◽

2007 ◽

Cited By ~ 8

Author(s):

Kyumars Sheykh Esmaili ◽

Hassan Abolhassani ◽

Mahmood Neshati ◽

Ehsan Behrangi ◽

Asreen Rostami ◽

...

Keyword(s):

Information Retrieval ◽

Test Collection ◽

Retrieval Systems ◽

Information Retrieval Systems

Download Full-text

Deriving a test collection for clinical information retrieval from systematic reviews

Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics - DTMBIO '10 ◽

10.1145/1871871.1871882 ◽

2010 ◽

Cited By ~ 1

Author(s):

Florian Boudin ◽

Jian-Yun Nie ◽

Martin Dawes

Keyword(s):

Information Retrieval ◽

Systematic Reviews ◽

Clinical Information ◽

Test Collection

Download Full-text

A Combined Method of Naïve-Bayes and Pooling Strategy for Building Test Collection for Arabic/English Information Retrieval

International Journal of Computing and Digital Systems ◽

10.12785/ijcds/100161 ◽

2021 ◽

Vol 10 (1) ◽

pp. 629-638

Author(s):

Ahmed Cherif Mazari ◽

Abdelhamid Djeffal

Keyword(s):

Information Retrieval ◽

Naive Bayes ◽

Naïve Bayes ◽

Test Collection ◽

Combined Method ◽

Pooling Strategy

Download Full-text

Effective collection construction for information retrieval evaluation and optimization

ACM SIGIR Forum ◽

10.1145/3483382.3483401 ◽

2020 ◽

Vol 54 (2) ◽

pp. 1-2

Author(s):

Dan Li

Keyword(s):

Information Retrieval ◽

Configuration Space ◽

Sampling Method ◽

Bayesian Optimization ◽

Test Collection ◽

Test Collections ◽

Information Retrieval Evaluation ◽

Retrieval Systems ◽

Ranking Model ◽

Relevance Assessments

The availability of test collections in Cranfield paradigm has significantly benefited the development of models, methods and tools in information retrieval. Such test collections typically consist of a set of topics, a document collection and a set of relevance assessments. Constructing these test collections requires effort of various perspectives such as topic selection, document selection, relevance assessment, and relevance label aggregation etc. The work in the thesis provides a fundamental way of constructing and utilizing test collections in information retrieval in an effective, efficient and reliable manner. To that end, we have focused on four aspects. We first study the document selection issue when building test collections. We devise an active sampling method for efficient large-scale evaluation [Li and Kanoulas, 2017]. Different from past sampling-based approaches, we account for the fact that some systems are of higher quality than others, and we design the sampling distribution to over-sample documents from these systems. At the same time, the estimated evaluation measures are unbiased, and assessments can be used to evaluate new, novel systems without introducing any systematic error. Then a natural further step is determining when to stop the document selection and assessment procedure. This is an important but understudied problem in the construction of test collections. We consider both the gain of identifying relevant documents and the cost of assessing documents as the optimization goals. We handle the problem under the continuous active learning framework by jointly training a ranking model to rank documents, and estimating the total number of relevant documents in the collection using a "greedy" sampling method [Li and Kanoulas, 2020]. The next stage of constructing a test collection is assessing relevance. We study how to denoise relevance assessments by aggregating from multiple crowd annotation sources to obtain high-quality relevance assessments. This helps to boost the quality of relevance assessments acquired in a crowdsourcing manner. We assume a Gaussian process prior on query-document pairs to model their correlation. The proposed model shows good performance in terms of interring true relevance labels. Besides, it allows predicting relevance labels for new tasks that has no crowd annotations, which is a new functionality of CrowdGP. Ablation studies demonstrate that the effectiveness is attributed to the modelling of task correlation based on the axillary information of tasks and the prior relevance information of documents to queries. After a test collection is constructed, it can be used to either evaluate retrieval systems or train a ranking model. We propose to use it to optimize the configuration of retrieval systems. We use Bayesian optimization approach to model the effect of a δ -step in the configuration space to the effectiveness of the retrieval system, by suggesting to use different similarity functions (covariance functions) for continuous and categorical values, and examine their ability to effectively and efficiently guide the search in the configuration space [Li and Kanoulas, 2018]. Beyond the algorithmic and empirical contributions, work done as part of this thesis also contributed to the research community as the CLEF Technology Assisted Reviews in Empirical Medicine Tracks in 2017, 2018, and 2019 [Kanoulas et al., 2017, 2018, 2019]. Awarded by: University of Amsterdam, Amsterdam, The Netherlands. Supervised by: Evangelos Kanoulas. Available at: https://dare.uva.nl/search?identifier=3438a2b6-9271-4f2c-add5-3c811cc48d42.

Download Full-text

Towards a New Standard Arabic Test Collection for Mono- and Cross-Language Information Retrieval

Natural Language Processing and Information Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-07983-7_23 ◽

2014 ◽

pp. 168-171 ◽

Cited By ~ 3

Author(s):

Oussama Ben Khiroun ◽

Raja Ayed ◽

Bilel Elayeb ◽

Ibrahim Bounhas ◽

Narjès Bellamine Ben Saoud ◽

...

Keyword(s):

Information Retrieval ◽

Test Collection ◽

Standard Arabic ◽

Cross Language Information Retrieval ◽

Cross Language

Download Full-text

Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems

Aslib Journal of Information Management ◽

10.1108/ajim-12-2014-0171 ◽

2015 ◽

Vol 67 (4) ◽

pp. 408-421

Author(s):

Sri Devi Ravana ◽

MASUMEH SADAT TAHERI ◽

Prabha Rajagopal

Keyword(s):

Information Retrieval ◽

Pairwise Comparison ◽

Current Method ◽

Statistical Testing ◽

Test Collection ◽

Content Type ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

The Mean ◽

The Difference

Purpose – The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach – Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document’s weight, which play the role of the mean average precision (MAP) score of the systems as a significance test’s statics. The experiments were conducted using the TREC 9 Web track collection. Findings – The p-values generated through the two types of significance tests, namely the Student’s t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value – Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.

Download Full-text

VBS Stemmer: A vocabulary-based stemmer

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.9192 ◽

2018 ◽

Vol 7 (2.14) ◽

pp. 551

Author(s):

Hamed Zakeri Rad ◽

Sabrina Tiun ◽

Saidah Saad

Keyword(s):

Information Retrieval ◽

Computational Linguistics ◽

Semantic Relation ◽

Alternative Solution ◽

Test Collection ◽

Morphological Variants

Stemming is referred to a procedure of reducing all words appearing in different morphological variants to a common form. As a matter of fact, it is considered as a functional way in various areas of information-retrieval work and computational linguistics. In this paper, we introduced the Vocabulary Based Stemmer (VBS) as the alternative solution to the stemming problem for the applications which are based on the semantic relation between words or dictionary based and need valid words. The Vocabulary part of VBS stemmer is generated based on WordNet. To validate the VBS Stemmer, part of “Cranfield 1400” test collection being used, and the result shows significant improvements over the previous stemmers.

Download Full-text

MeSH-Based Semantic Indexing Approach to Enhance Biomedical Information Retrieval

The Computer Journal ◽

10.1093/comjnl/bxaa073 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hager Kammoun ◽

Imen Gabsi ◽

Ikram Amous

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Research Paper ◽

The State ◽

Test Collection ◽

Semantic Indexing ◽

Effective Strategy ◽

Biomedical Information Retrieval ◽

Content Representation

Abstract Owing to the tremendous size of electronic biomedical documents, users encounter difficulties in seeking useful biomedical information. An efficient and smart access to the relevant biomedical information has become a fundamental need. In this research paper, we set forward a novel biomedical MeSH-based semantic indexing approach to enhance biomedical information retrieval. The proposed semantic indexing approach attempts to strengthen the content representation of both documents and queries by incorporating unambiguous MeSH concepts as well as the adequate senses of ambiguous MeSH concepts. For this purpose, our proposed approach relies on a disambiguation method to identify the adequate senses of ambiguous MeSH concepts and introduces four representation enrichment strategies so as to identify the best appropriate representatives of the adequate sense in the textual entities representation. To prove its effectiveness, the proposed semantic indexing approach was evaluated by intensive experiments. These experiments were carried out on OHSUMED test collection. The results reveal that our proposal outperforms the state-of-the-art approaches and allow us to highlight the most effective strategy.

Download Full-text

2AIRTC: The Amharic Adhoc Information Retrieval Test Collection

Towards a Possibilistic Information Retrieval System Using Semantic Query Expansion

TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19

Mahak: A Test Collection for Evaluation of Farsi Information Retrieval Systems

Deriving a test collection for clinical information retrieval from systematic reviews

A Combined Method of Naïve-Bayes and Pooling Strategy for Building Test Collection for Arabic/English Information Retrieval

Effective collection construction for information retrieval evaluation and optimization

Towards a New Standard Arabic Test Collection for Mono- and Cross-Language Information Retrieval

Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems

VBS Stemmer: A vocabulary-based stemmer

MeSH-Based Semantic Indexing Approach to Enhance Biomedical Information Retrieval

Export Citation Format