scholarly journals Optimasi Pembobotan pada Query Expansion dengan Term Relatedness to Query-Entropy based (TRQE)

2015 ◽  
Vol 6 (3) ◽  
Author(s):  
Resti Ludviani ◽  
Khadijah F. Hayati ◽  
Agus Zainal Arifin ◽  
Diana Purwitasari

Abstract. An appropriate selection term for expanding a query is very important in query expansion. Therefore, term selection optimization is added to improve query expansion performance on document retrieval system. This study proposes a new approach named Term Relatedness to Query-Entropy based (TRQE) to optimize weight in query expansion by considering semantic and statistic aspects from relevance evaluation of pseudo feedback to improve document retrieval performance. The proposed method has 3 main modules, they are relevace feedback, pseudo feedback, and document retrieval. TRQE is implemented in pseudo feedback module to optimize weighting term in query expansion. The evaluation result shows that TRQE can retrieve document with the highest result at precission of 100% and recall of 22,22%. TRQE for weighting optimization of query expansion is proven to improve retrieval document.     Keywords: TRQE, query expansion, term weighting, term relatedness to query, relevance feedback Abstrak..Pemilihan term yang tepat untuk memperluas queri merupakan hal yang penting pada query expansion. Oleh karena itu, perlu dilakukan optimasi penentuan term yang sesuai sehingga mampu meningkatkan performa query expansion pada system temu kembali dokumen. Penelitian ini mengajukan metode Term Relatedness to Query-Entropy based (TRQE), sebuah metode untuk mengoptimasi pembobotan pada query expansion dengan memperhatikan aspek semantic dan statistic dari penilaian relevansi suatu pseudo feedback sehingga mampu meningkatkan performa temukembali dokumen. Metode yang diusulkan memiliki 3 modul utama yaitu relevan feedback, pseudo feedback, dan document retrieval. TRQE diimplementasikan pada modul pseudo feedback untuk optimasi pembobotan term pada ekspansi query. Evaluasi hasil uji coba menunjukkan bahwa metode TRQE dapat melakukan temukembali dokumen dengan hasil terbaik pada precision  100% dan recall sebesar 22,22%.Metode TRQE untuk optimasi pembobotan pada query expansion terbukti memberikan pengaruh untuk meningkatkan relevansi pencarian dokumen.Kata Kunci: TRQE, ekspansi query, pembobotan term, term relatedness to query, relevance feedback

Kursor ◽  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ivanda Zevi Amalia ◽  
Akbar Noto Ponco Bimantoro ◽  
Agus Zainal Arifin ◽  
Maryamah Faisol ◽  
Rarasmaya Indraswari ◽  
...  

In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.


Author(s):  
Harikrishna G. N. Rai ◽  
K Sai Deepak ◽  
P. Radha Krishna

Multi-modal and Unstructured nature of documents make their retrieval from healthcare document repositories a challenging task. Text based retrieval is the conventional approach used for solving this problem. In this paper, the authors explore an alternate avenue of using embedded figures for the retrieval task. Usually, context of a document is directly reflected in the associated figures, therefore embedded text within these figures along with image features have been used for similarity based retrieval of figures. The present work demonstrates that image features describing the structural properties of figures are sufficient for the figure retrieval task. First, the authors analyze the problem of figure retrieval from biomedical literature and identify significant classes of figures. Second, they use edge information as a means to discriminate between structural properties of each figure category. Finally, the authors present a methodology using a novel feature descriptor namely Fourier Edge Orientation Autocorrelogram (FEOAC) to describe structural properties of figures and build an effective Biomedical document retrieval system. The experimental results demonstrate the better retrieval performance and overall improvement of FEOAC for figure retrieval task, especially when most of the edge information is retained. Apart from invariance to scale, rotation and non-uniform illumination, the proposed feature descriptor is shown to be relatively robust to noisy edges.


2017 ◽  
Vol 13 (3) ◽  
pp. 57-78 ◽  
Author(s):  
Jagendra Singh ◽  
Rakesh Kumar

Query expansion (QE) is an efficient method for enhancing the efficiency of information retrieval system. In this work, we try to capture the limitations of pseudo-feedback based QE approach and propose a hybrid approach for enhancing the efficiency of feedback based QE by combining corpus-based, contextual based information of query terms, and semantic based knowledge of query terms. First of all, this paper explores the use of different corpus-based lexical co-occurrence approaches to select an optimal combination of query terms from a pool of terms obtained using pseudo-feedback based QE. Next, we explore semantic similarity approach based on word2vec for ranking the QE terms obtained from top pseudo-feedback documents. Further, we combine co-occurrence statistics, contextual window statistics, and semantic similarity based approaches together to select the best expansion terms for query reformulation. The experiments were performed on FIRE ad-hoc and TREC-3 benchmark datasets. The statistics of our proposed experimental results show significant improvement over baseline method.


2019 ◽  
Vol 24 (1) ◽  
pp. 38-48
Author(s):  
Esingbemi Princewill Ebietomere ◽  
Godspower Osaretin Ekuobase

Abstract Legal reasoning, the core of legal practice in many countries, is “stare decisis” and its soundness is usually strengthened by relevant case law consulted. However, the task of relevant case law access and retrieval is tiring to legal practitioners and constitutes a serious drain on their productivity. Existing efforts at addressing this problem are conceptional, restrictive or unreliable. Specifically, existing semantic retrieval (SR) systems for case law are desirous of exceptional retrieval precision. Ontology promises to meet this desire, if introduced to the SR system. As a consequence, an ontology-based SR system for case law has been built using the systems analysis and design methodology. In particular, the component-based software engineering and the agile methodologies are employed to implement the system. Finally, the search and retrieval performance of the resultant SR system has been evaluated using the heuristics evaluation method. The retrieval system has shown to have a search and retrieval performance of about 94 % precision, 80 % recall and 84 % F-measure. Overall, the paper implements the SR system for case law with excellent precision and affirms the superiority of ontology approach over other semantic approaches to SR systems for document retrieval in the legal domain.


2020 ◽  
Vol 16 (3) ◽  
pp. 73-95
Author(s):  
Yogesh Gupta ◽  
Ashish Saini

Automatic query expansion (AQE) is an effective measure to improve information retrieval performance by including additional terms in a user query. The pseudo relevance feedback (PRF) method employed for AQE so far has suffered from a major problem of query drift. Therefore, keeping it in view, a new hybrid document clustering for PRF based AQE approach is proposed in the present article. In this, Fuzzy logic and Particle Swarm Optimization (PSO) are used to construct document clusters. Further, a new and effective hybrid PSO and Fuzzy logic-based term weighting approach is followed to find more suitable additional query terms using a weighted score of four IR evidences which is considered maximized. Moreover, a combined semantic filtering method along with query terms re-weighting algorithms are also used to remove noisy or irrelevant terms semantically. The performance of the presented approaches in this article is tested and compared with other approaches on three benchmark data sets. The comparative analysis of all the tested approaches illustrates the superior performance of the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document