Research of Inverted Index Method Based on Block Organizing Technology

2012 ◽  
Vol 468-471 ◽  
pp. 2836-2841
Author(s):  
Xiao Bo Yang

In order to further improve the overall efficiency of retrieval system, it proposes a method of inverted index based on block organizing technology. The specific studying process is as follows. Firstly, retrieval performance model of inverted index is generated based on data statistics, and then analyze the organizational strategy of inverted file block index, finally, retrieval performance model is verified through simulation experiment. The result shows that the method of inverted file block organization can get higher algorithm efficiency under the condition of less cycle numbers in the search algorithm, and also reduce the execution time of search algorithm significantly, which can verify the feasibility of inverted file block index method.

Author(s):  
Alex Kohn ◽  
François Bry ◽  
Alexander Manta

Studies agree that searchers are often not satisfied with the performance of current enterprise search engines. As a consequence, more scientists worldwide are actively investigating new avenues for searching to improve retrieval performance. This paper contributes to YASA (Your Adaptive Search Agent), a fully implemented and thoroughly evaluated ontology-based information retrieval system for the enterprise. A salient particularity of YASA is that large parts of the ontology are automatically filled with facts by recycling and transforming existing data. YASA offers context-based personalization, faceted navigation, as well as semantic search capabilities. YASA has been deployed and evaluated in the pharmaceutical research department of Roche, Penzberg, and results show that already semantically simple ontologies suffice to considerably improve search performance.


Kursor ◽  
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ivanda Zevi Amalia ◽  
Akbar Noto Ponco Bimantoro ◽  
Agus Zainal Arifin ◽  
Maryamah Faisol ◽  
Rarasmaya Indraswari ◽  
...  

In general, hadith consists of isnad and matan (content). Matan can be separated into several components for example a story, main content, and some additional information. Other texts besides main content, such as isnad and story can interfere the retrieval process of relevant documents because most users typically use simple queries. Thus, in this paper, we proposed a Named Entity Recognition (NER) component weighting model in improving the Indonesian hadith retrieval system. We did 3 test scenarios, the first scenario (S1) did not separate the hadith into several components, the second scenario (S2) separated the hadith into 2 components, isnad and matan, and the third scenario separated the hadith into 4 components, isnad, background story, content, and additional information. From the experimental results, it is found that the TF-IDF with rocchio algorithm in query expansion outperforms DocVec. Also, separation and weighting of the hadith components affect the retrieval performance because isnad can be considered as noise in a query. Separation of 2 separate components had the best overall results in general although 4 separate components showed better results in some cases with precision up to 100% and 70% recall.


With an advent of technologya huge collection of digital images is formed as repositories on world wide web (WWW). The task of searching for similar images in the repository is difficult. In this paper, retrieval of similar images from www is demonstrated with the help of combination of image features as color and shape and then using Siamese neural network which is constructed to the requirement as a novel approach. Here, one-shot learning technique is used to test the Siamese Neural Network model for retrieval performance. Various experiments are conducted with both the methods and results obtained are tabulated. The performance of the system is evaluated with precision parameter and which is found to be high.Also, relative study is made with existing works.


Author(s):  
Kanji Tanaka ◽  
◽  
Kensuke Kondo

Retrieving a large collection of environment maps built by mapper robots is a key problem in mobile robot self-localization. The map retrieval problem is studied from the novel perspective of the multi-scale Bag-Of-Features (BOF) approach in this paper. In general, the multi-scale approach is advantageous in capturing both the global structure and the local details of a given map. BOF map retrieval is advantageous in its compact map representation as well as the efficient map retrieval using an inverted file system. The main contribution of this paper is combining the advantages of both approaches. Our approach is based on multi cue BOF as well as packing BOF, and achieves the efficiency and compactness of the map retrieval system. Experiments evaluate the effectiveness of the techniques presented using a large collection of environment maps.


2015 ◽  
Vol 6 (3) ◽  
Author(s):  
Resti Ludviani ◽  
Khadijah F. Hayati ◽  
Agus Zainal Arifin ◽  
Diana Purwitasari

Abstract. An appropriate selection term for expanding a query is very important in query expansion. Therefore, term selection optimization is added to improve query expansion performance on document retrieval system. This study proposes a new approach named Term Relatedness to Query-Entropy based (TRQE) to optimize weight in query expansion by considering semantic and statistic aspects from relevance evaluation of pseudo feedback to improve document retrieval performance. The proposed method has 3 main modules, they are relevace feedback, pseudo feedback, and document retrieval. TRQE is implemented in pseudo feedback module to optimize weighting term in query expansion. The evaluation result shows that TRQE can retrieve document with the highest result at precission of 100% and recall of 22,22%. TRQE for weighting optimization of query expansion is proven to improve retrieval document.     Keywords: TRQE, query expansion, term weighting, term relatedness to query, relevance feedback Abstrak..Pemilihan term yang tepat untuk memperluas queri merupakan hal yang penting pada query expansion. Oleh karena itu, perlu dilakukan optimasi penentuan term yang sesuai sehingga mampu meningkatkan performa query expansion pada system temu kembali dokumen. Penelitian ini mengajukan metode Term Relatedness to Query-Entropy based (TRQE), sebuah metode untuk mengoptimasi pembobotan pada query expansion dengan memperhatikan aspek semantic dan statistic dari penilaian relevansi suatu pseudo feedback sehingga mampu meningkatkan performa temukembali dokumen. Metode yang diusulkan memiliki 3 modul utama yaitu relevan feedback, pseudo feedback, dan document retrieval. TRQE diimplementasikan pada modul pseudo feedback untuk optimasi pembobotan term pada ekspansi query. Evaluasi hasil uji coba menunjukkan bahwa metode TRQE dapat melakukan temukembali dokumen dengan hasil terbaik pada precision  100% dan recall sebesar 22,22%.Metode TRQE untuk optimasi pembobotan pada query expansion terbukti memberikan pengaruh untuk meningkatkan relevansi pencarian dokumen.Kata Kunci: TRQE, ekspansi query, pembobotan term, term relatedness to query, relevance feedback


Author(s):  
Harikrishna G. N. Rai ◽  
K Sai Deepak ◽  
P. Radha Krishna

Multi-modal and Unstructured nature of documents make their retrieval from healthcare document repositories a challenging task. Text based retrieval is the conventional approach used for solving this problem. In this paper, the authors explore an alternate avenue of using embedded figures for the retrieval task. Usually, context of a document is directly reflected in the associated figures, therefore embedded text within these figures along with image features have been used for similarity based retrieval of figures. The present work demonstrates that image features describing the structural properties of figures are sufficient for the figure retrieval task. First, the authors analyze the problem of figure retrieval from biomedical literature and identify significant classes of figures. Second, they use edge information as a means to discriminate between structural properties of each figure category. Finally, the authors present a methodology using a novel feature descriptor namely Fourier Edge Orientation Autocorrelogram (FEOAC) to describe structural properties of figures and build an effective Biomedical document retrieval system. The experimental results demonstrate the better retrieval performance and overall improvement of FEOAC for figure retrieval task, especially when most of the edge information is retained. Apart from invariance to scale, rotation and non-uniform illumination, the proposed feature descriptor is shown to be relatively robust to noisy edges.


Sign in / Sign up

Export Citation Format

Share Document