indexing technique
Recently Published Documents


TOTAL DOCUMENTS

157
(FIVE YEARS 34)

H-INDEX

10
(FIVE YEARS 1)

2022 ◽  
Vol 31 (1) ◽  
pp. 1-37
Author(s):  
Chao Liu ◽  
Xin Xia ◽  
David Lo ◽  
Zhiwe Liu ◽  
Ahmed E. Hassan ◽  
...  

To accelerate software development, developers frequently search and reuse existing code snippets from a large-scale codebase, e.g., GitHub. Over the years, researchers proposed many information retrieval (IR)-based models for code search, but they fail to connect the semantic gap between query and code. An early successful deep learning (DL)-based model DeepCS solved this issue by learning the relationship between pairs of code methods and corresponding natural language descriptions. Two major advantages of DeepCS are the capability of understanding irrelevant/noisy keywords and capturing sequential relationships between words in query and code. In this article, we proposed an IR-based model CodeMatcher that inherits the advantages of DeepCS (i.e., the capability of understanding the sequential semantics in important query words), while it can leverage the indexing technique in the IR-based model to accelerate the search response time substantially. CodeMatcher first collects metadata for query words to identify irrelevant/noisy ones, then iteratively performs fuzzy search with important query words on the codebase that is indexed by the Elasticsearch tool and finally reranks a set of returned candidate code according to how the tokens in the candidate code snippet sequentially matched the important words in a query. We verified its effectiveness on a large-scale codebase with ~41K repositories. Experimental results showed that CodeMatcher achieves an MRR (a widely used accuracy measure for code search) of 0.60, outperforming DeepCS, CodeHow, and UNIF by 82%, 62%, and 46%, respectively. Our proposed model is over 1.2K times faster than DeepCS. Moreover, CodeMatcher outperforms two existing online search engines (GitHub and Google search) by 46% and 33%, respectively, in terms of MRR. We also observed that: fusing the advantages of IR-based and DL-based models is promising; improving the quality of method naming helps code search, since method name plays an important role in connecting query and code.


2022 ◽  
Vol 14 (1) ◽  
pp. 0-0

Residing in the data age, researchers inferred that huge amount of geo-tagged data is available and identified the importance of Spatial Skyline queries. Spatial or geographic location in conjunction with textual relevance plays a key role in searching Point of Interest (POI) of the user. Efficient indexing techniques like R-Tree, Quad Tree, Z-order curve and variants of these trees are widely available in terms of spatial context. Inverted file is the popular indexing technique for textual data. As Spatial skyline query aims at analyzing both spatial and skyline dominance, there is a necessity for a hybrid indexing technique. This article presents the review of spatial skyline queries evaluation that include a range of indexing techniques which concentrates on disk access, I/O time, CPU time. The investigation and analysis of studies related to skyline queries based upon the indexing model and research gaps are presented in this review.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8013
Author(s):  
Muhammad Habibur Rahman ◽  
Bonghee Hong ◽  
Hari Setiawan ◽  
Sanghyun Lee ◽  
Dongjun Lim ◽  
...  

Real-time performance is important in rule-based continuous spatiotemporal query processing for risk analysis and decision making of target objects collected by sensors of combat vessels. The existing Rete algorithm, which creates a compiled node link structure for executing rules, is known to be the best. However, when a large number of rules are to be processed and the stream data to be performed are large, the Rete technique has an overhead of searching for rules to be bound. This paper proposes a hashing indexing technique for Rete nodes to the overhead of searching for spatiotemporal condition rules that must be bound when rules are expressed in a node link structure. A performance comparison evaluation experiment was conducted with Drool, which implemented the Rete method, and the method that implemented the hash index method presented in this paper. For performance measurement, processing time was measured for the change in the number of rules, the change in the number of objects, and the distribution of objects. The hash index method presented in this paper improved performance by at least 18% compared to Drool.


2021 ◽  
Vol 25 (6) ◽  
pp. 1629-1666
Author(s):  
Ali Asghar Safaei ◽  
Saeede Habibi-Asl

Retrieving required medical images from a huge amount of images is one of the most widely used features in medical information systems, including medical imaging search engines. For example, diagnostic decision making has traditionally been accompanied by patient data (image or non-image) and previous medical experiences from similar cases. Indexing as part of search engines (or retrieval system), increases the speed of a search. The goal of this study, is to provide an effective and efficient indexing technique for medical images search engines. In this paper, in order to archive this goal, a multidimensional indexing technique for medical images is designed using the normalization technique that is used to reduce redundancy in relational database design. Data structure of the proposed multidimensional index and also different required operations are designed to create and handle such a multidimensional index. Time complexity of each operation is analyzed and also average memory space required to store any medical image (along with its related metadata) is calculated as the space complexity analysis of the proposed indexing technique. The results show that the proposed indexing technique has a good performance in terms of memory usage, as well as execution time for the usual operations. Moreover, and may be more important, the proposed indexing techniques improves the precision and recall of the information retrieval system (i.e., search engine) which uses this technique for indexing medical images. Besides, a user of such search engine can retrieve medical images which s/he has specified its attributes is some different aspects (dimensions), e.g., tissue, image modality and format, sickness and trauma, etc. So, the proposed multidimensional indexing techniques can improve effectiveness of a medical image information retrieval system (in terms of precision and recall), while having a proper efficiency (in terms of execution time and memory usage), and can improve the information retrieval process for healthcare search engines.


2021 ◽  
Vol 15 (1) ◽  
pp. 98-111
Author(s):  
Dong He ◽  
Maureen Daum ◽  
Walter Cai ◽  
Magdalena Balazinska

We design, implement, and evaluate DeepEverest, a system for the efficient execution of interpretation by example queries over the activation values of a deep neural network. DeepEverest consists of an efficient indexing technique and a query execution algorithm with various optimizations. We prove that the proposed query execution algorithm is instance optimal. Experiments with our prototype show that DeepEverest, using less than 20% of the storage of full materialization, significantly accelerates individual queries by up to 63X and consistently outperforms other methods on multi-query workloads that simulate DNN interpretation processes.


Author(s):  
Sugamya Katta

A new approach is proposed to index images in database using features generated from the HODBTC compressed data stream. This indexing technique can be extended for CBIR. HODBTC compresses an image into a set of color quantizers and a bitmap image. The proposed image retrieval system generates two image features namely CCF and BPF from the minimum quantizer, maximum quantizer and bitmap image respectively by involving the visual codebook.


Sign in / Sign up

Export Citation Format

Share Document