inverted file
Recently Published Documents


TOTAL DOCUMENTS

79
(FIVE YEARS 4)

H-INDEX

9
(FIVE YEARS 0)

2022 ◽  
Vol 14 (1) ◽  
pp. 0-0

Residing in the data age, researchers inferred that huge amount of geo-tagged data is available and identified the importance of Spatial Skyline queries. Spatial or geographic location in conjunction with textual relevance plays a key role in searching Point of Interest (POI) of the user. Efficient indexing techniques like R-Tree, Quad Tree, Z-order curve and variants of these trees are widely available in terms of spatial context. Inverted file is the popular indexing technique for textual data. As Spatial skyline query aims at analyzing both spatial and skyline dominance, there is a necessity for a hybrid indexing technique. This article presents the review of spatial skyline queries evaluation that include a range of indexing techniques which concentrates on disk access, I/O time, CPU time. The investigation and analysis of studies related to skyline queries based upon the indexing model and research gaps are presented in this review.


2021 ◽  
Vol 15 (1) ◽  
pp. 1-10
Author(s):  
Kang Zhao ◽  
Liuyihan Song ◽  
Yingya Zhang ◽  
Pan Pan ◽  
Yinghui Xu ◽  
...  

Thanks to the popularity of GPU and the growth of its computational power, more and more deep learning tasks, such as face recognition, image retrieval and word embedding, can take advantage of extreme classification to improve accuracy. However, it remains a big challenge to train a deep model with millions of classes efficiently due to the huge memory and computation consumption in the last layer. By sampling a small set of classes to avoid the total classes calculation, sampling-based approaches have been proved to be an effective solution. But most of them suffer from the following two issues: i) the important classes are ignored or only partly sampled, such as the methods using random sampling scheme or retrieval techniques of low recall (e.g., locality-sensitive hashing), resulting in the degradation of accuracy; ii) inefficient implementation owing to incompatibility with GPU, like selective softmax. It uses hashing forest to help select classes, but the search process is implemented in CPU. To address the above issues, we propose a new sampling-based softmax called ANN Softmax in this paper. Specifically, we employ binary quantization with inverted file system to improve the recall of important classes. With the help of dedicated kernel design, it can be totally parallelized in mainstream training framework. Then, we find the size of important classes that are recalled by each training sample has a great impact on the final accuracy, so we introduce sample grouping optimization to well approximate the full classes training. Experimental evaluations on two tasks (Embedding Learning and Classification) and ten datasets (e.g., MegaFace, ImageNet, SKU datasets) demonstrate our proposed method maintains the same precision as Full Softmax for different loss objectives, including cross entropy loss, ArcFace, CosFace and D-Softmax loss, with only 1/10 sampled classes, which outperforms the state-of-the-art techniques. Moreover, we implement ANN Soft-max in a complete GPU pipeline that can accelerate the training more than 4.3X. Equipped our method with a 256 GPUs cluster, the time of training a classifier of 300 million classes on our SKU-300M dataset can be reduced to ten days.


2019 ◽  
Author(s):  
Tiago Silveira ◽  
Felipe Soares ◽  
Wladmir Brandão ◽  
Henrique Cota Freitas

The amount of data generated on the Web has increased dramatically, as well as the need for computational power to prepare this information. In particular, indexers process these data to extract terms and their occurrences, storing them in an inverted file, a compact data structure that provides quick search. However, this task involves processing of a large amount of data, requiring high computational power. In this article, we present a heterogeneous parallel architecture that uses CPU and GPU in a cluster to accelerate inverted index generation. Experimental results show that the proposed architecture provides faster execution times, up to 60 times in classification and 23 times in the compression of 1 million elements.


2018 ◽  
Vol 7 (2.19) ◽  
pp. 17
Author(s):  
B.A. Vishnupriya ◽  
N. Senthamarai ◽  
S. Bharathi

"Spatial information mining", or learning revelation in spatial database, alludes to the illustration out of concealed information, spatial relations, or different examples that are not unequivocally put away in spatial databases. To get to the spatial database alongside the catchphrase another kind of inquiry called spatial watchword question is utilized. A spatial watchword inquiry get client area and client given catchphrases as contentions and gives web protests that are spatially and literarily material to these information. The current answers for such inquiries depend on IR2-tree that has a couple of inadequacies as space utilization and event of false hit is extremely huge when the question of the last outcome is far from the inquiry point .To beat this issue a novel file structure called Spatial Inverted file is proposed. Presently a-days use of portable is expanding enormously .In the versatile system an intermediary is set between base station and Location Based Server (LBS).This intermediary utilizes the Spatial modified file procedure to answer the SK inquiry by utilizing spatial data from the base station and printed data from the client question. The outcome from the SI record is given to two file structure in the intermediary called EVR Tree and Grid list. The Estimated Valid Region (EVR) for the present area of the client and required spatial articles are produced and come back to the client. On the off chance that the EVR is absent in the two file structure of intermediary it offer question to LBS. In the event that the client given inquiry is miss written or miss spelled it can be oversee by SI record utilizing n gram/2L Approximation file.


Author(s):  
V. Glory ◽  
S. Domnic

Inverted index is used in most Information Retrieval Systems (IRS) to achieve the fast query response time. In inverted index, compression schemes are used to improve the efficiency of IRS. In this chapter, the authors study and analyze various compression techniques that are used for indexing. They also present a new compression technique that is based on FastPFOR called New FastPFOR. The storage structure and the integers' representation of the proposed method can improve its performances both in compression and decompression. The study on existing works shows that the recent research works provide good results either in compression or in decoding, but not in both. Hence, their decompression performance is not fair. To achieve better performance in decompression, the authors propose New FastPFOR in this chapter. To evaluate the performance of the proposed method, they experiment with TREC collections. The results show that the proposed method could achieve better decompression performance than the existing techniques.


Author(s):  
Mohammed Erritali

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user.  Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word. In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.


Author(s):  
Kanji Tanaka ◽  
◽  
Masatoshi Ando ◽  
Yousuke Inagaki

We propose a novel bag-of-words (BoW) framework for building and retrieving a compact database of view images for use in robotic localization, mapping, and SLAM applications. Unlike most previous methods, our method does not describe an image based on its many small local features (e.g., bag-of-SIFT-features). Instead, the proposed bag-of-bounding-boxes (BoBB) approach attempts to describe an image based on fewer larger object patterns, which leads to a semantic and compact image descriptor. To make the view retrieval systemmore practical and autonomous, the object pattern discovery process is unsupervised through a common pattern discovery (CPD) between the input and known reference images without requiring the use of a pre-trained object detector. Moreover, our CPD subtask does not rely on good image segmentation techniques and is able to handle scale variations by exploiting the recently developed CPD technique, i.e., a spatial randompartition. Following a traditional bounding-box based object annotation and knowledge transfer, we compactly describe an image in a BoBB form. Using a slightly modified inverted file system, we efficiently index and/or search for the BoBB descriptors. Experiments using the publicly available “Robot-Car” dataset show that the proposed method achieves accurate object-level view image retrieval using significantly compact image descriptors, e.g., 20 words per image.


2014 ◽  
Vol 631-632 ◽  
pp. 171-174
Author(s):  
Zhen Quan Wu ◽  
Bing Pan

Combined with the Map/Reduce programming model, the Hadoop distributed file system, Lucene inverted file indexing technology and ICTCLAS Chinese word segmentation technology, we designed and implemented a distributed search engine system based on Hadoop. By testing of the system in the four-node Hadoop cluster environment, experimental results show that Hadoop platform can be used in search engines to improve system performance, reliability and scalability.


Sign in / Sign up

Export Citation Format

Share Document