Distributed similarity search in high dimensions using locality sensitive hashing

Author(s):  
Parisa Haghani ◽  
Sebastian Michel ◽  
Karl Aberer
2021 ◽  
Vol 50 (1) ◽  
pp. 42-49
Author(s):  
Martin Aumuller ◽  
Sariel Har-Peled ◽  
Sepideh Mahabadi ◽  
Rasmus Pagh ◽  
Francesco Silvestri

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points S and a radius parameter r > 0, the rnear neighbor (r-NN) problem asks for a data structure that, given any query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance r from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee.


2020 ◽  
Vol 533 ◽  
pp. 43-59
Author(s):  
Yiqi Li ◽  
Ruliang Xiao ◽  
Xin Wei ◽  
Huakun Liu ◽  
Shi Zhang ◽  
...  

2012 ◽  
Vol 5 (5) ◽  
pp. 430-441 ◽  
Author(s):  
Venu Satuluri ◽  
Srinivasan Parthasarathy

2021 ◽  
Vol 17 (2) ◽  
pp. 1-10
Author(s):  
Hussein Mohammed ◽  
Ayad Abdulsada

Searchable encryption (SE) is an interesting tool that enables clients to outsource their encrypted data into external cloud servers with unlimited storage and computing power and gives them the ability to search their data without decryption. The current solutions of SE support single-keyword search making them impractical in real-world scenarios. In this paper, we design and implement a multi-keyword similarity search scheme over encrypted data by using locality-sensitive hashing functions and Bloom filter. The proposed scheme can recover common spelling mistakes and enjoys enhanced security properties such as hiding the access and search patterns but with costly latency. To support similarity search, we utilize an efficient bi-gram-based method for keyword transformation. Such a method improves the search results accuracy. Our scheme employs two non-colluding servers to break the correlation between search queries and search results. Experiments using real-world data illustrate that our scheme is practically efficient, secure, and retains high accuracy.


2016 ◽  
Vol 2016 ◽  
pp. 1-12
Author(s):  
Chunyan Shuai ◽  
Hengcheng Yang ◽  
Xin Ouyang ◽  
Siqi Li ◽  
Zheng Chen

In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.


2017 ◽  
Author(s):  
Sanjoy Dasgupta ◽  
Charles F. Stevens ◽  
Saket Navlakha

Similarity search, such as identifying similar images in a database or similar documents on the Web, is a fundamental computing problem faced by many large-scale information retrieval systems. We discovered that the fly’s olfac-tory circuit solves this problem using a novel variant of a traditional computer science algorithm (called locality-sensitive hashing). The fly’s circuit assigns similar neural activity patterns to similar input stimuli (odors), so that behav-iors learned from one odor can be applied when a similar odor is experienced. The fly’s algorithm, however, uses three new computational ingredients that depart from traditional approaches. We show that these ingredients can be translated to improve the performance of similarity search compared to tra-ditional algorithms when evaluated on several benchmark datasets. Overall, this perspective helps illuminate the logic supporting an important sensory function (olfaction), and it provides a conceptually new algorithm for solving a fundamental computational problem.


Author(s):  
Rahul Malik ◽  
Sangkyum Kim ◽  
Xin Jin ◽  
Chandrasekar Ramachandran ◽  
Jiawei Han ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document