Fast Nearest Neighbor Search in the Hamming Space

Author(s):  
Zhansheng Jiang ◽  
Lingxi Xie ◽  
Xiaotie Deng ◽  
Weiwei Xu ◽  
Jingdong Wang
2014 ◽  
Vol 651-653 ◽  
pp. 2168-2171
Author(s):  
Qin Zhen Guo ◽  
Zhi Zeng ◽  
Shu Wu Zhang ◽  
Yuan Zhang ◽  
Gui Xuan Zhang

Hashing which maps data into binary codes in Hamming space has attracted more and more attentions for approximate nearest neighbor search due to its high efficiency and reduced storage cost. K-means hashing (KH) is a novel hashing method which firstly quantizes the data by codewords and then uses the indices of codewords to encode the data. However, in KH, only the codewords are updated to minimize the quantization error and affinity error while the indices of codewords remain the same after they are initialized. In this paper, we propose an optimized k-means hashing (OKH) method to encode data by binary codes. In our method, we simultaneously optimize the codewords and the indices of them to minimize the quantization error and the affinity error. Our OKH method can find both the optimal codewords and the optiaml indices, and the resulting binary codes in Hamming space can better preserve the original neighborhood structure of the data. Besides, OKH can further be generalized to a product space. Extensive experiments have verified the superiority of OKH over KH and other state-of-the-art hashing methods.


2020 ◽  
Vol 99 ◽  
pp. 107082 ◽  
Author(s):  
Bin Fan ◽  
Qingqun Kong ◽  
Baoqian Zhang ◽  
Hongmin Liu ◽  
Chunhong Pan ◽  
...  

2020 ◽  
Author(s):  
Cameron Hargreaves ◽  
Matthew Dyer ◽  
Michael Gaultois ◽  
Vitaliy Kurlin ◽  
Matthew J Rosseinsky

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.


2021 ◽  
Vol 7 (2) ◽  
pp. 187-199
Author(s):  
Meng-Hao Guo ◽  
Jun-Xiong Cai ◽  
Zheng-Ning Liu ◽  
Tai-Jiang Mu ◽  
Ralph R. Martin ◽  
...  

AbstractThe irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.


Sign in / Sign up

Export Citation Format

Share Document