inverted index
Recently Published Documents


TOTAL DOCUMENTS

198
(FIVE YEARS 35)

H-INDEX

14
(FIVE YEARS 2)

2021 ◽  
Vol 2022 (1) ◽  
pp. 28-48
Author(s):  
Jiafan Wang ◽  
Sherman S. M. Chow

Abstract Dynamic searchable symmetric encryption (DSSE) allows a client to query or update an outsourced encrypted database. Range queries are commonly needed. Previous range-searchable schemes either do not support updates natively (SIGMOD’16) or use file indexes of many long bit-vectors for distinct keywords, which only support toggling updates via homomorphically flipping the presence bit. (ESORICS’18). We propose a generic upgrade of any (inverted-index) DSSE to support range queries (a.k.a. range DSSE), without homomorphic encryption, and a specific instantiation with a new trade-off reducing client-side storage. Our schemes achieve forward security, an important property that mitigates file injection attacks. Moreover, we identify a variant of injection attacks against the first somewhat dynamic scheme (ESORICS’18). We also extend the definition of backward security to range DSSE and show that our schemes are compatible with a generic upgrade of backward security (CCS’17). We comprehensively analyze the computation and communication overheads, including implementation details of client-side index-related operations omitted by prior schemes. We show high empirical efficiency for million-scale databases over a million-scale keyword space.


2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Johnny Marcos S. Soares ◽  
Luciano Barbosa ◽  
Paulo Antonio Leal Rego ◽  
Regis Pires Magalhães ◽  
Jose Antônio F. de Macêdo

Fingerprints are the most used biometric information for identifying people. With the increase in fingerprint data, indexing techniques are essential to perform an efficient search. In this work, we devise a solution that applies traditional inverted index, widely used in textual information retrieval, for fingerprint search. For that, it first converts fingerprints to text documents using techniques, such as Minutia Cylinder-Code and Locality-Sensitive Hashing, and then indexes them in inverted files. In the experimental evaluation, our approach obtained 0.42% of error rate with 10% of penetration rate in the FVC2002 DB1a data set, surpassing some established methods.


2021 ◽  
pp. 102255
Author(s):  
Yanrong Liang ◽  
Yanping Li ◽  
Kai Zhang ◽  
Lina Ma

2021 ◽  
Author(s):  
kanji tanaka

Landmark-based robot self-localization has attracted recent research interest as an efficient maintenance-free approach to visual place recognition (VPR) across domains (e.g., times of the day, weathers, seasons). However, landmark-based self-localization can be an ill-posed problem for a passive observer (e.g., manual robot control), as many viewpoints may not provide effective landmark view. Here, we consider active self-localization task by an active observer, and present a novel reinforcement-learning (RL) -based next-best-view (NBV) planner. Our contributions are summarized as follows. (1) SIMBAD-based VPR: We present a landmark ranking -based compact scene descriptor by introducing a deep-learning extension of similarity-based pattern recognition (SIMBAD). (2) VPR-to-NBV knowledge transfer: We tackle the challenge of RL under uncertainty (i.e., active self-localization) by transferring the VPR's state recognition ability to NBV. (3) NNQL-based NBV: We view the available VPR as the experience database by adapting a nearest-neighbor -based approximation of Q-learning (NNQL). The result is an extremely compact data structure that compresses both the VPR and NBV modules into a single incremental inverted index. Experiments using public NCLT dataset validate the effectiveness of the proposed approach.


2021 ◽  
Vol 297 (3) ◽  
pp. 39-45
Author(s):  
N. PRAVORSKA ◽  
О. BARMAC ◽  
D. MEDZATIY ◽  
T. SHESTAKEVYCH ◽  
◽  
...  

To avoid malfunctions of the developed software caused by errors, even when developed by professionals, a number of automated tools are used, which allow to evaluate the software code. A variety of detectors are commonly used to detect errors that occur due to duplicate blocks of executable code. The importance of developing such detectors is that the product is not dependent on the programming language and has a simple algorithm for finding cloned blocks of code. The approach of the language-independent repetition detector is based on a method based on the use of the clone index. It is a global data structure that resembles a typical inverted index. This approach is based on the text, ie the method becomes the basis for research independent of language. In recent years, additional methods have become increasingly popular, which analyze the source and executable code at a smaller level, and there are attempts to avoid unnecessary recalculations, by transferring information between versions. Reviewing the research presented in the works of scientists dealing with this problem, it was decided to propose an approach to improve methods for detecting repetitions and redundancy of program code based on language-independent incremental repetition detector (MNIDP). Most additional research is based on tree-like and graphical methods, ie they are strictly dependent on the programming language. The solution in the MNIDP campaign is to take the text as a basis, ie the method becomes the basis for research independent of language. This technique is not strictly language-independent, but due to the fact that the tokenization stage will be included, with the help of minor adjustments the desired result has been achieved. This provides a detailed analysis of the internal composition (namely, elements) of the detector and explanations of the work at different stages of the detection process.


2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


Entropy ◽  
2021 ◽  
Vol 23 (3) ◽  
pp. 296
Author(s):  
Andrzej Chmielowiec ◽  
Paweł Litwin

This article deals with compression of binary sequences with a given number of ones, which can also be considered as a list of indexes of a given length. The first part of the article shows that the entropy H of random n-element binary sequences with exactly k elements equal one satisfies the inequalities klog2(0.48·n/k)<H<klog2(2.72·n/k). Based on this result, we propose a simple coding using fixed length words. Its main application is the compression of random binary sequences with a large disproportion between the number of zeros and the number of ones. Importantly, the proposed solution allows for a much faster decompression compared with the Golomb-Rice coding with a relatively small decrease in the efficiency of compression. The proposed algorithm can be particularly useful for database applications for which the speed of decompression is much more important than the degree of index list compression.


Sign in / Sign up

Export Citation Format

Share Document