Speeding up probabilistic roadmap planners with locality-sensitive hashing

Locality Sensitive Hashing (LSH) has become one of the most commonly used approximate nearest neighbor search techniques to avoid the prohibitive cost of scanning through all data points. For recommender systems, LSH achieves efficient recommendation retrieval by encoding user and item vectors into binary hash codes, reducing the cost of exhaustively examining all the item vectors to identify the top-k items. However, conventional matrix factorization models may suffer from performance degeneration caused by randomly-drawn LSH hash functions, directly affecting the ultimate quality of the recommendations. In this paper, we propose a framework named øurmodel, which factors in the stochasticity of LSH hash functions when learning real-valued user and item latent vectors, eventually improving the recommendation accuracy after LSH indexing. Experiments on publicly available datasets show that the proposed framework not only effectively learns user's preferences for prediction, but also achieves high compatibility with LSH stochasticity, producing superior post-LSH indexing performances as compared to state-of-the-art baselines.

Download Full-text

A Hybrid Spatial Indexing Structure of Massive Point Cloud Based on Octree and 3D R*-Tree

Applied Sciences ◽

10.3390/app11209581 ◽

2021 ◽

Vol 11 (20) ◽

pp. 9581

Author(s):

Wei Wang ◽

Yi Zhang ◽

Genyu Ge ◽

Qin Jiang ◽

Yang Wang ◽

...

Keyword(s):

Point Cloud ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Index Structure ◽

Spatial Indexing ◽

Cloud Data ◽

Neighbor Search ◽

Nearest Neighbor Searching ◽

Indexing Structure ◽

Balanced Tree

The spatial index structure is one of the most important research topics for organizing and managing massive 3D Point Cloud. As a point in Point Cloud consists of Cartesian coordinates (x,y,z), the common method to explore geometric information and features is nearest neighbor searching. An efficient spatial indexing structure directly affects the speed of the nearest neighbor search. Octree and kd-tree are the most used for Point Cloud data. However, Octree or KD-tree do not perform best in nearest neighbor searching. A highly balanced tree, 3D R*-tree is considered the most effective method so far. So, a hybrid spatial indexing structure is proposed based on Octree and 3D R*-tree. In this paper, we discussed how thresholds influence the performance of nearest neighbor searching and constructing the tree. Finally, an adaptive way method adopted to set thresholds. Furthermore, we obtained a better performance in tree construction and nearest neighbor searching than Octree and 3D R*-tree.

Download Full-text

Fast document summarization using locality sensitive hashing and memory access efficient node ranking

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i3.9030 ◽

2016 ◽

Vol 6 (3) ◽

pp. 945

Author(s):

Ercan Canhasi

Keyword(s):

Time Complexity ◽

Nearest Neighbor ◽

Linear Time ◽

Nearest Neighbor Search ◽

Memory Access ◽

Locality Sensitive Hashing ◽

Document Summarization ◽

Neighbor Search ◽

Node Ranking ◽

Similarity Graph

Text modeling and sentence selection are the fundamental steps of a typical extractive document summarization algorithm. The common text modeling method connects a pair of sentences based on their similarities. Even thought it can effectively represent the sentence similarity graph of given document(s) its big drawback is a large time complexity of $O(n^2)$, where n represents the number of sentences. The quadratic time complexity makes it impractical for large documents. In this paper we propose the fast approximation algorithms for the text modeling and the sentence selection. Our text modeling algorithm reduces the time complexity to near-linear time by rapidly finding the most similar sentences to form the sentences similarity graph. In doing so we utilized Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. For the sentence selection step we propose a simple memory-access-efficient node ranking method based on the idea of scanning sequentially only the neighborhood arrays. Experimentally, we show that sacrificing a rather small percentage of recall and precision in the quality of the produced summary can reduce the quadratic to sub-linear time complexity. We see the big potential of proposed method in text summarization for mobile devices and big text data summarization for internet of things on cloud. In our experiments, beside evaluating the presented method on the standard general and query multi-document summarization tasks, we also tested it on few alternative summarization tasks including general and query, timeline, and comparative summarization.

Download Full-text

A Probabilistic Molecular Fingerprint for Big Data Settings

10.26434/chemrxiv.7176350.v1 ◽

2018 ◽

Author(s):

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Molecular Fingerprint ◽

Molecular Fingerprints ◽

Approximate Nearest Neighbor ◽

Neighbor Search ◽

Large Databases ◽

Nearest Neighbor Searches ◽

Extended Connectivity

Background: Among the various molecular fingerprints available to describe small organic molecules, ECFP4 (extended connectivity fingerprint, up to four bonds) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥1,024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. <a></a><a></a> Results: Herein we report a new fingerprint, called MHFP6 (MinHash fingerprint, up to six bonds), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. Furthermore, MHFP6 outperforms ECFP4 in approximate nearest neighbor searches by two orders of magnitude in terms of speed, while decreasing the error rate. Conclusion<a></a><a>: MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (</a><a href="https://github.com/reymond-group/mhfp">https://github.com/reymond-group/mhfp</a>).<a></a>

Download Full-text

An Improved Hierarchical Datastructure for Nearest Neighbor Search

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110001 ◽

2019 ◽

Vol 33 ◽

pp. 10001-10002

Author(s):

Mengdie Nie ◽

Zhi-Jie Wang ◽

Chunjing Gan ◽

Zhe Quan ◽

Bin Yao ◽

...

Keyword(s):

High Dimension ◽

Nearest Neighbor ◽

State Of The Art ◽

Nearest Neighbor Search ◽

Computational Tool ◽

Neighbor Search ◽

Art Methods ◽

Speed Up ◽

Run Time

Nearest neighbor search is a fundamental computational tool and has wide applications. In past decades, many datastructures have been developed to speed up this operation. In this paper, we propose a novel hierarchical datastructure for nearest neighbor search in moderately high dimension. Our proposed method maintains good run time guarantees, and it outperforms several state-of-the-art methods in practice.

Download Full-text

A Probabilistic Molecular Fingerprint for Big Data Settings

10.26434/chemrxiv.7176350 ◽

2018 ◽

Author(s):

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Molecular Fingerprint ◽

Molecular Fingerprints ◽

Approximate Nearest Neighbor ◽

Neighbor Search ◽

Large Databases ◽

Nearest Neighbor Searches ◽

Extended Connectivity

Background: Among the various molecular fingerprints available to describe small organic molecules, ECFP4 (extended connectivity fingerprint, up to four bonds) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥1,024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. <a></a><a></a> Results: Herein we report a new fingerprint, called MHFP6 (MinHash fingerprint, up to six bonds), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. Furthermore, MHFP6 outperforms ECFP4 in approximate nearest neighbor searches by two orders of magnitude in terms of speed, while decreasing the error rate. Conclusion<a></a><a>: MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (</a><a href="https://github.com/reymond-group/mhfp">https://github.com/reymond-group/mhfp</a>).<a></a>

Download Full-text

Fast document summarization using locality sensitive hashing and memory access efficient node ranking

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i3.pp945-954 ◽

2016 ◽

Vol 6 (3) ◽

pp. 945

Author(s):

Ercan Canhasi

Keyword(s):

Time Complexity ◽

Nearest Neighbor ◽

Linear Time ◽

Nearest Neighbor Search ◽

Memory Access ◽

Locality Sensitive Hashing ◽

Document Summarization ◽

Neighbor Search ◽

Node Ranking ◽

Similarity Graph

Text modeling and sentence selection are the fundamental steps of a typical extractive document summarization algorithm. The common text modeling method connects a pair of sentences based on their similarities. Even thought it can effectively represent the sentence similarity graph of given document(s) its big drawback is a large time complexity of $O(n^2)$, where n represents the number of sentences. The quadratic time complexity makes it impractical for large documents. In this paper we propose the fast approximation algorithms for the text modeling and the sentence selection. Our text modeling algorithm reduces the time complexity to near-linear time by rapidly finding the most similar sentences to form the sentences similarity graph. In doing so we utilized Locality-Sensitive Hashing, a fast algorithm for the approximate nearest neighbor search. For the sentence selection step we propose a simple memory-access-efficient node ranking method based on the idea of scanning sequentially only the neighborhood arrays. Experimentally, we show that sacrificing a rather small percentage of recall and precision in the quality of the produced summary can reduce the quadratic to sub-linear time complexity. We see the big potential of proposed method in text summarization for mobile devices and big text data summarization for internet of things on cloud. In our experiments, beside evaluating the presented method on the standard general and query multi-document summarization tasks, we also tested it on few alternative summarization tasks including general and query, timeline, and comparative summarization.

Download Full-text

Locality-Sensitive Hashing Techniques for Nearest Neighbor Search

International Journal of Fuzzy Logic and Intelligent Systems ◽

10.5391/ijfis.2012.12.4.300 ◽

2012 ◽

Vol 12 (4) ◽

pp. 300-307 ◽

Cited By ~ 11

Author(s):

Keon Myung Lee

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Neighbor Search

Download Full-text

MP-RW-LSH

Proceedings of the VLDB Endowment ◽

10.14778/3484224.3484226 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3267-3280

Author(s):

Huayi Wang ◽

Jingfan Meng ◽

Long Gong ◽

Jun Xu ◽

Mitsunori Ogihara

Keyword(s):

Nearest Neighbor ◽

Edit Distance ◽

State Of The Art ◽

Hash Table ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Algorithmic Problem ◽

Use Case ◽

Hash Tables ◽

Neighbor Search

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic problem, with numerous applications in many areas of computer science. Locality-Sensitive Hashing (LSH) is one of the most popular solution approaches for ANNS. A common shortcoming of many LSH schemes is that since they probe only a single bucket in a hash table, they need to use a large number of hash tables to achieve a high query accuracy. For ANNS- L 2 , a multi-probe scheme was proposed to overcome this drawback by strategically probing multiple buckets in a hash table. In this work, we propose MP-RW-LSH, the first and so far only multi-probe LSH solution to ANNS in L 1 distance, and show that it achieves a better tradeoff between scalability and query efficiency than all existing LSH-based solutions. We also explain why a state-of-the-art ANNS -L 1 solution called Cauchy projection LSH (CP-LSH) is fundamentally not suitable for multi-probe extension. Finally, as a use case, we construct, using MP-RW-LSH as the underlying "ANNS- L 1 engine", a new ANNS-E (E for edit distance) solution that beats the state of the art.

Download Full-text