Bi-Level Locality Sensitive Hashing Index Based on Clustering

2014 ◽  
Vol 556-562 ◽  
pp. 3804-3808
Author(s):  
Peng Wang ◽  
Dong Yin ◽  
Tao Sun

Locality sensitive hashing is the most popular algorithm for approximate nearest neighbor search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited search performance. In this paper, we propose a new Bi-level locality sensitive hashing algorithm, which has two-level structures to perform approximate nearest neighbor search in high dimensional spaces. In the first level, we train a number of cluster centers, then use the cluster centers to divide the dataset into many clusters and the vectors in each cluster has near uniform distribution. In the second level, we construct locality sensitive hashing tables for each cluster. Given a query, we determine a few clusters that it belongs to with high probability, and then perform approximate nearest neighbor search in the corresponding locality sensitive hash tables. Experimental results on the dataset of 1,000,000 vectors show that the search speed can be increased by 48 times compared to Euclidean locality sensitive hashing, while keeping high search precision.

Sign in / Sign up

Export Citation Format

Share Document