Optimasi Pembobotan pada Query Expansion dengan Term Relatedness to Query-Entropy based (TRQE)

Resti Ludviani; Khadijah F. Hayati; Agus Zainal Arifin; Diana Purwitasari

doi:10.24002/jbi.v6i3.433

Optimasi Pembobotan pada Query Expansion dengan Term Relatedness to Query-Entropy based (TRQE)

Jurnal Buana Informatika ◽

10.24002/jbi.v6i3.433 ◽

2015 ◽

Vol 6 (3) ◽

Author(s):

Resti Ludviani ◽

Khadijah F. Hayati ◽

Agus Zainal Arifin ◽

Diana Purwitasari

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Document Retrieval ◽

Retrieval Performance ◽

Term Weighting ◽

New Approach ◽

Term Selection ◽

Relevance Evaluation ◽

Feedback Module ◽

Pseudo Feedback

Abstract. An appropriate selection term for expanding a query is very important in query expansion. Therefore, term selection optimization is added to improve query expansion performance on document retrieval system. This study proposes a new approach named Term Relatedness to Query-Entropy based (TRQE) to optimize weight in query expansion by considering semantic and statistic aspects from relevance evaluation of pseudo feedback to improve document retrieval performance. The proposed method has 3 main modules, they are relevace feedback, pseudo feedback, and document retrieval. TRQE is implemented in pseudo feedback module to optimize weighting term in query expansion. The evaluation result shows that TRQE can retrieve document with the highest result at precission of 100% and recall of 22,22%. TRQE for weighting optimization of query expansion is proven to improve retrieval document.Â Â Â Â Keywords: TRQE, query expansion, term weighting, term relatedness to query, relevance feedbackÂ Abstrak..Pemilihan term yang tepat untuk memperluas queri merupakan hal yang penting pada query expansion. Oleh karena itu, perlu dilakukan optimasi penentuan term yang sesuai sehingga mampu meningkatkan performa query expansion pada system temu kembali dokumen. Penelitian ini mengajukan metode Term Relatedness to Query-Entropy based (TRQE), sebuah metode untuk mengoptimasi pembobotan pada query expansion dengan memperhatikan aspek semantic dan statistic dari penilaian relevansi suatu pseudo feedback sehingga mampu meningkatkan performa temukembali dokumen. Metode yang diusulkan memiliki 3 modul utama yaitu relevan feedback, pseudo feedback, dan document retrieval. TRQE diimplementasikan pada modul pseudo feedback untuk optimasi pembobotan term pada ekspansi query. Evaluasi hasil uji coba menunjukkan bahwa metode TRQE dapat melakukan temukembali dokumen dengan hasil terbaik pada precisionÂ 100% dan recall sebesar 22,22%.Metode TRQE untuk optimasi pembobotan pada query expansion terbukti memberikan pengaruh untuk meningkatkan relevansi pencarian dokumen.Kata Kunci: TRQE, ekspansi query, pembobotan term, term relatedness to query, relevance feedback

Download Full-text

INDONESIAN-TRANSLATED HADITH CONTENT WEIGHTING IN PSEUDO-RELEVANCE FEEDBACK QUERY EXPANSION

Kursor ◽

10.21107/kursor.v11i1.249 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ivanda Zevi Amalia ◽

Akbar Noto Ponco Bimantoro ◽

Agus Zainal Arifin ◽

Maryamah Faisol ◽

Rarasmaya Indraswari ◽

...

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Named Entity Recognition ◽

Entity Recognition ◽

Retrieval Process ◽

Retrieval Performance ◽

Additional Information ◽

Named Entity ◽

Story Content ◽

Test Scenarios

In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.

Download Full-text

A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-019-01247-9 ◽

2019 ◽

Cited By ~ 7

Author(s):

Dilip Kumar Sharma ◽

Rajendra Pamula ◽

D. S. Chauhan

Keyword(s):

Evolutionary Algorithm ◽

Query Expansion ◽

Retrieval System ◽

Document Retrieval ◽

Hybrid Evolutionary Algorithm

Download Full-text

Figure Based Biomedical Document Retrieval System using Structural Image Features

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2012010103 ◽

2012 ◽

Vol 3 (1) ◽

pp. 39-58

Author(s):

Harikrishna G. N. Rai ◽

K Sai Deepak ◽

P. Radha Krishna

Keyword(s):

Structural Properties ◽

Retrieval System ◽

Document Retrieval ◽

Image Features ◽

Biomedical Literature ◽

Feature Descriptor ◽

Retrieval Performance ◽

Retrieval Task ◽

Edge Information ◽

Structural Image

Multi-modal and Unstructured nature of documents make their retrieval from healthcare document repositories a challenging task. Text based retrieval is the conventional approach used for solving this problem. In this paper, the authors explore an alternate avenue of using embedded figures for the retrieval task. Usually, context of a document is directly reflected in the associated figures, therefore embedded text within these figures along with image features have been used for similarity based retrieval of figures. The present work demonstrates that image features describing the structural properties of figures are sufficient for the figure retrieval task. First, the authors analyze the problem of figure retrieval from biomedical literature and identify significant classes of figures. Second, they use edge information as a means to discriminate between structural properties of each figure category. Finally, the authors present a methodology using a novel feature descriptor namely Fourier Edge Orientation Autocorrelogram (FEOAC) to describe structural properties of figures and build an effective Biomedical document retrieval system. The experimental results demonstrate the better retrieval performance and overall improvement of FEOAC for figure retrieval task, especially when most of the edge information is retained. Apart from invariance to scale, rotation and non-uniform illumination, the proposed feature descriptor is shown to be relatively robust to noisy edges.

Download Full-text

The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System

The Computer Journal ◽

10.1093/comjnl/26.3.239 ◽

1983 ◽

Vol 26 (3) ◽

pp. 239-246 ◽

Cited By ~ 62

Author(s):

A. F. Smeaton

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Document Retrieval

Download Full-text

Lexical Co-Occurrence and Contextual Window-Based Approach with Semantic Similarity for Query Expansion

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2017070104 ◽

2017 ◽

Vol 13 (3) ◽

pp. 57-78 ◽

Cited By ~ 5

Author(s):

Jagendra Singh ◽

Rakesh Kumar

Keyword(s):

Semantic Similarity ◽

Query Expansion ◽

Ad Hoc ◽

Retrieval System ◽

Hybrid Approach ◽

Information Retrieval System ◽

Query Reformulation ◽

Baseline Method ◽

Benchmark Datasets ◽

Pseudo Feedback

Query expansion (QE) is an efficient method for enhancing the efficiency of information retrieval system. In this work, we try to capture the limitations of pseudo-feedback based QE approach and propose a hybrid approach for enhancing the efficiency of feedback based QE by combining corpus-based, contextual based information of query terms, and semantic based knowledge of query terms. First of all, this paper explores the use of different corpus-based lexical co-occurrence approaches to select an optimal combination of query terms from a pool of terms obtained using pseudo-feedback based QE. Next, we explore semantic similarity approach based on word2vec for ranking the QE terms obtained from top pseudo-feedback documents. Further, we combine co-occurrence statistics, contextual window statistics, and semantic similarity based approaches together to select the best expansion terms for query reformulation. The experiments were performed on FIRE ad-hoc and TREC-3 benchmark datasets. The statistics of our proposed experimental results show significant improvement over baseline method.

Download Full-text

An evaluation of query expansion by the addition of clustered terms for a document retrieval system

Information Storage and Retrieval ◽

10.1016/0020-0271(72)90021-6 ◽

1972 ◽

Vol 8 (6) ◽

pp. 329-348 ◽

Cited By ~ 45

Author(s):

Jack Minker ◽

Gerald A. Wilson ◽

Barbara H. Zimmerman

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Document Retrieval

Download Full-text

A Semantic Retrieval System for Case Law

Applied Computer Systems ◽

10.2478/acss-2019-0006 ◽

2019 ◽

Vol 24 (1) ◽

pp. 38-48

Author(s):

Esingbemi Princewill Ebietomere ◽

Godspower Osaretin Ekuobase

Keyword(s):

Evaluation Method ◽

Retrieval System ◽

Case Law ◽

Document Retrieval ◽

Semantic Retrieval ◽

Retrieval Performance ◽

Stare Decisis ◽

Analysis And Design ◽

Relevant Case ◽

Search And Retrieval

Abstract Legal reasoning, the core of legal practice in many countries, is “stare decisis” and its soundness is usually strengthened by relevant case law consulted. However, the task of relevant case law access and retrieval is tiring to legal practitioners and constitutes a serious drain on their productivity. Existing efforts at addressing this problem are conceptional, restrictive or unreliable. Specifically, existing semantic retrieval (SR) systems for case law are desirous of exceptional retrieval precision. Ontology promises to meet this desire, if introduced to the SR system. As a consequence, an ontology-based SR system for case law has been built using the systems analysis and design methodology. In particular, the component-based software engineering and the agile methodologies are employed to implement the system. Finally, the search and retrieval performance of the resultant SR system has been evaluated using the heuristics evaluation method. The retrieval system has shown to have a search and retrieval performance of about 94 % precision, 80 % recall and 84 % F-measure. Overall, the paper implements the SR system for case law with excellent precision and affirms the superiority of ontology approach over other semantic approaches to SR systems for document retrieval in the legal domain.

Download Full-text

Comment on “an evaluation of query expansion by the addition of clustered terms for a document retrieval system”

Information Storage and Retrieval ◽

10.1016/0020-0271(72)90022-8 ◽

1972 ◽

Vol 8 (6) ◽

pp. 349

Author(s):

G. Salton

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Document Retrieval

Download Full-text

A New Hybrid Document Clustering for PRF-Based Automatic Query Expansion Approach for Effective IR

International Journal of e-Collaboration ◽

10.4018/ijec.2020070105 ◽

2020 ◽

Vol 16 (3) ◽

pp. 73-95

Author(s):

Yogesh Gupta ◽

Ashish Saini

Keyword(s):

Fuzzy Logic ◽

Query Expansion ◽

Document Clustering ◽

Superior Performance ◽

Data Sets ◽

Retrieval Performance ◽

Term Weighting ◽

Effective Measure ◽

User Query ◽

Weighted Score

Automatic query expansion (AQE) is an effective measure to improve information retrieval performance by including additional terms in a user query. The pseudo relevance feedback (PRF) method employed for AQE so far has suffered from a major problem of query drift. Therefore, keeping it in view, a new hybrid document clustering for PRF based AQE approach is proposed in the present article. In this, Fuzzy logic and Particle Swarm Optimization (PSO) are used to construct document clusters. Further, a new and effective hybrid PSO and Fuzzy logic-based term weighting approach is followed to find more suitable additional query terms using a weighted score of four IR evidences which is considered maximized. Moreover, a combined semantic filtering method along with query terms re-weighting algorithms are also used to remove noisy or irrelevant terms semantically. The performance of the presented approaches in this article is tested and compared with other approaches on three benchmark data sets. The comparative analysis of all the tested approaches illustrates the superior performance of the proposed approach.

Download Full-text

Combined techniques based query expansion approach for document retrieval system

2019 International Conference on contemporary Computing and Informatics (IC3I) ◽

10.1109/ic3i46837.2019.9055709 ◽

2019 ◽

Author(s):

Dilip Kumar Sharma ◽

Rajendra Pamula ◽

D.S. Chauhan

Keyword(s):

Query Expansion ◽

Retrieval System ◽

Document Retrieval

Download Full-text