Multi-View Complementary Hash Tables for Nearest Neighbor Search

Author(s):  
Xianglong Liu ◽  
Lei Huang ◽  
Cheng Deng ◽  
Jiwen Lu ◽  
Bo Lang
2016 ◽  
Vol 25 (2) ◽  
pp. 907-919 ◽  
Author(s):  
Xianglong Liu ◽  
Cheng Deng ◽  
Bo Lang ◽  
Dacheng Tao ◽  
Xuelong Li

Author(s):  
Qiang Fu ◽  
Xu Han ◽  
Xianglong Liu ◽  
Jingkuan Song ◽  
Cheng Deng

Building multiple hash tables has been proven a successful technique for indexing massive databases, which can guarantee a desired level of overall performance. However, existing hash based multi-indexing methods suffer from the heavy redundancy, without strong table complementarity and effective hash code learning. To address the problems, this paper proposes a complementary binary quantization (CBQ) method to jointly learning multiple hash tables. It exploits the power of incomplete binary coding based on prototypes to align the original space and the Hamming space, and further utilizes the nature of multi-indexing search to jointly reduce the quantization loss based on the prototype based hash function. Our alternating optimization adaptively discovers the complementary prototype sets and the corresponding code sets of a varying size in an efficient way, which together robustly approximate the data relations. Our method can be naturally generalized to the product space for long hash codes. Extensive experiments carried out on two popular large-scale tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed CBQ method enjoys the strong table complementarity and significantly outperforms the state-of-the-art, with up to 57.76\% performance gains relatively.


2021 ◽  
Vol 14 (13) ◽  
pp. 3267-3280
Author(s):  
Huayi Wang ◽  
Jingfan Meng ◽  
Long Gong ◽  
Jun Xu ◽  
Mitsunori Ogihara

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-Sensitive Hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS- L 2 , a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L 1 distance, and show that it achieves a better tradeoff between scalability and query efficiency than all existing LSH-based solutions. We also explain why a state-of-the-art ANNS -L 1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. Finally, as a use case, we construct, using MP-RW-LSH as the underlying "ANNS- L 1 engine", a new ANNS-E (E for edit distance) solution that beats the state of the art.


2014 ◽  
Vol 556-562 ◽  
pp. 3804-3808
Author(s):  
Peng Wang ◽  
Dong Yin ◽  
Tao Sun

Locality sensitive hashing is the most popular algorithm for approximate nearest neighbor search. As LSH partitions vector space uniformly and the distribution of vectors is usually non-uniform, it poorly fits real dataset and has limited search performance. In this paper, we propose a new Bi-level locality sensitive hashing algorithm, which has two-level structures to perform approximate nearest neighbor search in high dimensional spaces. In the first level, we train a number of cluster centers, then use the cluster centers to divide the dataset into many clusters and the vectors in each cluster has near uniform distribution. In the second level, we construct locality sensitive hashing tables for each cluster. Given a query, we determine a few clusters that it belongs to with high probability, and then perform approximate nearest neighbor search in the corresponding locality sensitive hash tables. Experimental results on the dataset of 1,000,000 vectors show that the search speed can be increased by 48 times compared to Euclidean locality sensitive hashing, while keeping high search precision.


2020 ◽  
Author(s):  
Cameron Hargreaves ◽  
Matthew Dyer ◽  
Michael Gaultois ◽  
Vitaliy Kurlin ◽  
Matthew J Rosseinsky

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.


2021 ◽  
Vol 7 (2) ◽  
pp. 187-199
Author(s):  
Meng-Hao Guo ◽  
Jun-Xiong Cai ◽  
Zheng-Ning Liu ◽  
Tai-Jiang Mu ◽  
Ralph R. Martin ◽  
...  

AbstractThe irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.


Sign in / Sign up

Export Citation Format

Share Document