Does relevance feedback improve document retrieval performance?

AbstractThis paper introduces a novel ranking refinement approach based on relevance feedback for the task of document retrieval. We focus on the problem of ranking refinement since recent evaluation results from Information Retrieval (IR) systems indicate that current methods are effective retrieving most of the relevant documents for different sets of queries, but they have severe difficulties to generate a pertinent ranking of them. Motivated by these results, we propose a novel method to re-rank the list of documents returned by an IR system. The proposed method is based on a Markov Random Field (MRF) model that classifies the retrieved documents as relevant or irrelevant. The proposed MRF combines: (i) information provided by the base IR system, (ii) similarities among documents in the retrieved list, and (iii) relevance feedback information. Thus, the problem of ranking refinement is reduced to that of minimising an energy function that represents a trade-off between document relevance and inter-document similarity. Experiments were conducted using resources from four different tasks of the Cross Language Evaluation Forum (CLEF) forum as well as from one task of the Text Retrieval Conference (TREC) forum. The obtained results show the feasibility of the method for re-ranking documents in IR and also depict an improvement in mean average precision compared to a state of the art retrieval machine.

Download Full-text

A Graph-based approach for text query expansion using pseudo relevance feedback and association rules mining

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i6.pp5016-5023 ◽

2019 ◽

Vol 9 (6) ◽

pp. 5016

Author(s):

Siham Jabri ◽

Azzeddine Dahbi ◽

Taoufiq Gadi

Keyword(s):

Association Rules ◽

Relevance Feedback ◽

Query Expansion ◽

Weighted Graph ◽

Strong Correlations ◽

Retrieval Performance ◽

Dominance Relations ◽

Baseline System ◽

Text Query ◽

Pseudo Relevance Feedback

Pseudo-relevance feedback is a query expansion approach whose terms are selected from a set of top ranked retrieved documents in response to the original query. However, the selected terms will not be related to the query if the top retrieved documents are irrelevant. As a result, retrieval performance for the expanded query is not improved, compared to the original one. This paper suggests the use of documents selected using Pseudo Relevance Feedback for generating association rules. Thus, an algorithm based on dominance relations is applied. Then the strong correlations between query and other terms are detected, and an oriented and weighted graph called Pseudo-Graph Feedback is constructed. This graph serves for expanding original queries by terms related semantically and selected by the user. The results of the experiments on Text Retrieval Conference (TREC) collection are very significant, and best results are achieved by the proposed approach compared to both the baseline system and an existing technique.

Download Full-text

Optimasi Pembobotan pada Query Expansion dengan Term Relatedness to Query-Entropy based (TRQE)

Jurnal Buana Informatika ◽

10.24002/jbi.v6i3.433 ◽

2015 ◽

Vol 6 (3) ◽

Author(s):

Resti Ludviani ◽

Khadijah F. Hayati ◽

Agus Zainal Arifin ◽

Diana Purwitasari

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Document Retrieval ◽

Retrieval Performance ◽

Term Weighting ◽

New Approach ◽

Term Selection ◽

Relevance Evaluation ◽

Feedback Module ◽

Pseudo Feedback

Abstract. An appropriate selection term for expanding a query is very important in query expansion. Therefore, term selection optimization is added to improve query expansion performance on document retrieval system. This study proposes a new approach named Term Relatedness to Query-Entropy based (TRQE) to optimize weight in query expansion by considering semantic and statistic aspects from relevance evaluation of pseudo feedback to improve document retrieval performance. The proposed method has 3 main modules, they are relevace feedback, pseudo feedback, and document retrieval. TRQE is implemented in pseudo feedback module to optimize weighting term in query expansion. The evaluation result shows that TRQE can retrieve document with the highest result at precission of 100% and recall of 22,22%. TRQE for weighting optimization of query expansion is proven to improve retrieval document.Â Â Â Â Keywords: TRQE, query expansion, term weighting, term relatedness to query, relevance feedbackÂ Abstrak..Pemilihan term yang tepat untuk memperluas queri merupakan hal yang penting pada query expansion. Oleh karena itu, perlu dilakukan optimasi penentuan term yang sesuai sehingga mampu meningkatkan performa query expansion pada system temu kembali dokumen. Penelitian ini mengajukan metode Term Relatedness to Query-Entropy based (TRQE), sebuah metode untuk mengoptimasi pembobotan pada query expansion dengan memperhatikan aspek semantic dan statistic dari penilaian relevansi suatu pseudo feedback sehingga mampu meningkatkan performa temukembali dokumen. Metode yang diusulkan memiliki 3 modul utama yaitu relevan feedback, pseudo feedback, dan document retrieval. TRQE diimplementasikan pada modul pseudo feedback untuk optimasi pembobotan term pada ekspansi query. Evaluasi hasil uji coba menunjukkan bahwa metode TRQE dapat melakukan temukembali dokumen dengan hasil terbaik pada precisionÂ 100% dan recall sebesar 22,22%.Metode TRQE untuk optimasi pembobotan pada query expansion terbukti memberikan pengaruh untuk meningkatkan relevansi pencarian dokumen.Kata Kunci: TRQE, ekspansi query, pembobotan term, term relatedness to query, relevance feedback

Download Full-text

Figure Based Biomedical Document Retrieval System using Structural Image Features

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2012010103 ◽

2012 ◽

Vol 3 (1) ◽

pp. 39-58

Author(s):

Harikrishna G. N. Rai ◽

K Sai Deepak ◽

P. Radha Krishna

Keyword(s):

Structural Properties ◽

Retrieval System ◽

Document Retrieval ◽

Image Features ◽

Biomedical Literature ◽

Feature Descriptor ◽

Retrieval Performance ◽

Retrieval Task ◽

Edge Information ◽

Structural Image

Multi-modal and Unstructured nature of documents make their retrieval from healthcare document repositories a challenging task. Text based retrieval is the conventional approach used for solving this problem. In this paper, the authors explore an alternate avenue of using embedded figures for the retrieval task. Usually, context of a document is directly reflected in the associated figures, therefore embedded text within these figures along with image features have been used for similarity based retrieval of figures. The present work demonstrates that image features describing the structural properties of figures are sufficient for the figure retrieval task. First, the authors analyze the problem of figure retrieval from biomedical literature and identify significant classes of figures. Second, they use edge information as a means to discriminate between structural properties of each figure category. Finally, the authors present a methodology using a novel feature descriptor namely Fourier Edge Orientation Autocorrelogram (FEOAC) to describe structural properties of figures and build an effective Biomedical document retrieval system. The experimental results demonstrate the better retrieval performance and overall improvement of FEOAC for figure retrieval task, especially when most of the edge information is retained. Apart from invariance to scale, rotation and non-uniform illumination, the proposed feature descriptor is shown to be relatively robust to noisy edges.

Download Full-text