Multi-View Complementary Hash Tables for Nearest Neighbor Search

Building multiple hash tables has been proven a successful technique for indexing massive databases, which can guarantee a desired level of overall performance. However, existing hash based multi-indexing methods suffer from the heavy redundancy, without strong table complementarity and effective hash code learning. To address the problems, this paper proposes a complementary binary quantization (CBQ) method to jointly learning multiple hash tables. It exploits the power of incomplete binary coding based on prototypes to align the original space and the Hamming space, and further utilizes the nature of multi-indexing search to jointly reduce the quantization loss based on the prototype based hash function. Our alternating optimization adaptively discovers the complementary prototype sets and the corresponding code sets of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes. Extensive experiments carried out on two popular large-scale tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed CBQ method enjoys the strong table complementarity and significantly outperforms the state-of-the-art, with up to 57.76\% performance gains relatively.

Download Full-text

MP-RW-LSH

Proceedings of the VLDB Endowment ◽

10.14778/3484224.3484226 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3267-3280

Author(s):

Huayi Wang ◽

Jingfan Meng ◽

Long Gong ◽

Jun Xu ◽

Mitsunori Ogihara

Keyword(s):

Nearest Neighbor ◽

Edit Distance ◽

State Of The Art ◽

Hash Table ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Algorithmic Problem ◽

Use Case ◽

Hash Tables ◽

Neighbor Search

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-Sensitive Hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS- L 2 , a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L 1 distance, and show that it achieves a better tradeoff between scalability and query efficiency than all existing LSH-based solutions. We also explain why a state-of-the-art ANNS -L 1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. Finally, as a use case, we construct, using MP-RW-LSH as the underlying "ANNS- L 1 engine", a new ANNS-E (E for edit distance) solution that beats the state of the art.

Download Full-text

Bi-Level Locality Sensitive Hashing Index Based on Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3804 ◽

2014 ◽

Vol 556-562 ◽

pp. 3804-3808

Author(s):

Peng Wang ◽

Dong Yin ◽

Tao Sun

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Search Performance ◽

Hash Tables ◽

Approximate Nearest Neighbor Search ◽

Approximate Nearest Neighbor ◽

Neighbor Search ◽

Search Speed ◽

Locality Sensitive Hash

Locality sensitive hashing is the most popular algorithm for approximate nearest neighbor search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited search performance. In this paper, we propose a new Bi-level locality sensitive hashing algorithm, which has two-level structures to perform approximate nearest neighbor search in high dimensional spaces. In the first level, we train a number of cluster centers, then use the cluster centers to divide the dataset into many clusters and the vectors in each cluster has near uniform distribution. In the second level, we construct locality sensitive hashing tables for each cluster. Given a query, we determine a few clusters that it belongs to with high probability, and then perform approximate nearest neighbor search in the corresponding locality sensitive hash tables. Experimental results on the dataset of 1,000,000 vectors show that the search speed can be increased by 48 times compared to Euclidean locality sensitive hashing, while keeping high search precision.

Download Full-text

The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions

10.26434/chemrxiv.12777566.v1 ◽

2020 ◽

Author(s):

Cameron Hargreaves ◽

Matthew Dyer ◽

Michael Gaultois ◽

Vitaliy Kurlin ◽

Matthew J Rosseinsky

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Inorganic Crystal Structure Database ◽

Earth Mover’S Distance ◽

Chemical Similarity ◽

Earth Mover's Distance ◽

Neighbor Search ◽

The Earth ◽

Binary Compounds

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.

Download Full-text

Adaptive bit allocation hashing for approximate nearest neighbor search

Neurocomputing ◽

10.1016/j.neucom.2014.10.042 ◽

2015 ◽

Vol 151 ◽

pp. 719-728 ◽

Cited By ~ 4

Author(s):

Qin-Zhen Guo ◽

Zhi Zeng ◽

Shuwu Zhang

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Bit Allocation ◽

Approximate Nearest Neighbor Search ◽

Approximate Nearest Neighbor ◽

Neighbor Search

Download Full-text

PCT: Point cloud transformer

Computational Visual Media ◽

10.1007/s41095-021-0229-5 ◽

2021 ◽

Vol 7 (2) ◽

pp. 187-199

Author(s):

Meng-Hao Guo ◽

Jun-Xiong Cai ◽

Zheng-Ning Liu ◽

Tai-Jiang Mu ◽

Ralph R. Martin ◽

...

Keyword(s):

Language Processing ◽

Point Cloud ◽

Nearest Neighbor ◽

Semantic Segmentation ◽

Nearest Neighbor Search ◽

Local Context ◽

Irregular Domain ◽

Cloud Processing ◽

Neighbor Search ◽

Farthest Point

AbstractThe irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.

Download Full-text