Distributed similarity search in high dimensions using locality sensitive hashing

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the rnear neighbor (r-NN) problem asks for a data structure that, given any query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee.

Download Full-text

Fast Graph Similarity Search via Locality Sensitive Hashing

Lecture Notes in Computer Science - Advances in Multimedia Information Processing -- PCM 2015 ◽

10.1007/978-3-319-24075-6_60 ◽

2015 ◽

pp. 623-633 ◽

Cited By ~ 6

Author(s):

Boyu Zhang ◽

Xianglong Liu ◽

Bo Lang

Keyword(s):

Similarity Search ◽

Locality Sensitive Hashing ◽

Graph Similarity

Download Full-text

GLDH: Toward more efficient global low-density locality-sensitive hashing for high dimensions

Information Sciences ◽

10.1016/j.ins.2020.04.046 ◽

2020 ◽

Vol 533 ◽

pp. 43-59

Author(s):

Yiqi Li ◽

Ruliang Xiao ◽

Xin Wei ◽

Huakun Liu ◽

Shi Zhang ◽

...

Keyword(s):

Locality Sensitive Hashing ◽

Low Density ◽

High Dimensions

Download Full-text

Bayesian locality sensitive hashing for fast similarity search

Proceedings of the VLDB Endowment ◽

10.14778/2140436.2140440 ◽

2012 ◽

Vol 5 (5) ◽

pp. 430-441 ◽

Cited By ~ 65

Author(s):

Venu Satuluri ◽

Srinivasan Parthasarathy

Keyword(s):

Similarity Search ◽

Locality Sensitive Hashing

Download Full-text

Lower Bounds on Performance of Metric Tree Indexing Schemes for Exact Similarity Search in High Dimensions

Algorithmica ◽

10.1007/s00453-012-9638-2 ◽

2012 ◽

Vol 66 (2) ◽

pp. 310-328 ◽

Cited By ~ 6

Author(s):

Vladimir Pestov

Keyword(s):

Lower Bounds ◽

Similarity Search ◽

High Dimensions ◽

Metric Tree

Download Full-text

Secure Multi-keyword Similarity Search Over Encrypted Data With Security Improvement

Iraqi Journal for Electrical And Electronic Engineering ◽

10.37917/ijeee.17.2.1 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-10

Author(s):

Hussein Mohammed ◽

Ayad Abdulsada

Keyword(s):

Real World ◽

Similarity Search ◽

Keyword Search ◽

Bloom Filter ◽

Locality Sensitive Hashing ◽

Real World Data ◽

Encrypted Data ◽

Search Results ◽

Security Properties ◽

Cloud Servers

Searchable encryption (SE) is an interesting tool that enables clients to outsource their encrypted data into external cloud servers with unlimited storage and computing power and gives them the ability to search their data without decryption. The current solutions of SE support single-keyword search making them impractical in real-world scenarios. In this paper, we design and implement a multi-keyword similarity search scheme over encrypted data by using locality-sensitive hashing functions and Bloom filter. The proposed scheme can recover common spelling mistakes and enjoys enhanced security properties such as hiding the access and search patterns but with costly latency. To support similarity search, we utilize an efficient bi-gram-based method for keyword transformation. Such a method improves the search results accuracy. Our scheme employs two non-colluding servers to break the correlation between search queries and search results. Experiments using real-world data illustrate that our scheme is practically efficient, secure, and retains high accuracy.

Download Full-text

A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters

Computational Intelligence and Neuroscience ◽

10.1155/2016/4075257 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12

Author(s):

Chunyan Shuai ◽

Hengcheng Yang ◽

Xin Ouyang ◽

Siqi Li ◽

Zheng Chen

Keyword(s):

Similarity Search ◽

Hamming Distance ◽

Bloom Filters ◽

Hash Tables ◽

High Dimensions ◽

Data Formats ◽

Overall Ratio ◽

Search Structure ◽

And Storage ◽

False Positive Probability

In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.

Download Full-text

A neural algorithm for a fundamental computing problem

10.1101/180471 ◽

2017 ◽

Author(s):

Sanjoy Dasgupta ◽

Charles F. Stevens ◽

Saket Navlakha

Keyword(s):

Similarity Search ◽

Large Scale ◽

Activity Patterns ◽

Locality Sensitive Hashing ◽

Sensory Function ◽

Information Retrieval Systems ◽

Novel Variant ◽

Benchmark Datasets ◽

Similar Images ◽

Traditional Approaches

Similarity search, such as identifying similar images in a database or similar documents on the Web, is a fundamental computing problem faced by many large-scale information retrieval systems. We discovered that the fly’s olfac-tory circuit solves this problem using a novel variant of a traditional computer science algorithm (called locality-sensitive hashing). The fly’s circuit assigns similar neural activity patterns to similar input stimuli (odors), so that behav-iors learned from one odor can be applied when a similar odor is experienced. The fly’s algorithm, however, uses three new computational ingredients that depart from traditional approaches. We show that these ingredients can be translated to improve the performance of similarity search compared to tra-ditional algorithms when evaluated on several benchmark datasets. Overall, this perspective helps illuminate the logic supporting an important sensory function (olfaction), and it provides a conceptually new algorithm for solving a fundamental computational problem.

Download Full-text