nearest neighbor searches Latest Research Papers

Hybrid KNN-join: Parallel nearest neighbor searches exploiting CPU and GPU architectural features

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2020.11.004 ◽

2021 ◽

Vol 149 ◽

pp. 119-137

Author(s):

Michael Gowanlock

Keyword(s):

Nearest Neighbor ◽

Architectural Features ◽

Nearest Neighbor Searches

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Lecture Notes in Computer Science - Databases Theory and Applications ◽

10.1007/978-3-030-69377-0_6 ◽

2021 ◽

pp. 62-73

Author(s):

Omid Jafari ◽

Parth Nagarkar

Keyword(s):

Experimental Analysis ◽

Nearest Neighbor ◽

Locality Sensitive Hashing ◽

High Dimensional ◽

Approximate Nearest Neighbor ◽

Nearest Neighbor Searches

Tree-Based Algorithm for Stable and Efficient Data Clustering

Informatics ◽

10.3390/informatics7040038 ◽

2020 ◽

Vol 7 (4) ◽

pp. 38

Author(s):

Hasan Aljabbouli ◽

Abdullah Albizri ◽

Antoine Harfouche

Keyword(s):

Data Structure ◽

Data Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Convergence Properties ◽

Insertion Technique ◽

Efficient Data ◽

Tree Data ◽

Tree Data Structure ◽

Nearest Neighbor Searches

The K-means algorithm is a well-known and widely used clustering algorithm due to its simplicity and convergence properties. However, one of the drawbacks of the algorithm is its instability. This paper presents improvements to the K-means algorithm using a K-dimensional tree (Kd-tree) data structure. The proposed Kd-tree is utilized as a data structure to enhance the choice of initial centers of the clusters and to reduce the number of the nearest neighbor searches required by the algorithm. The developed framework also includes an efficient center insertion technique leading to an incremental operation that overcomes the instability problem of the K-means algorithm. The results of the proposed algorithm were compared with those obtained from the K-means algorithm, K-medoids, and K-means++ in an experiment using six different datasets. The results demonstrated that the proposed algorithm provides superior and more stable clustering solutions.

Proximity Preserving Binary Code Using Signed Graph-Cut

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5882 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4535-4544

Author(s):

Inbal Lavi ◽

Shai Avidan ◽

Yoram Singer ◽

Yacov Hel-Or

Keyword(s):

Graph Partitioning ◽

Nearest Neighbor ◽

Binary Code ◽

Graph Cut ◽

Signed Graph ◽

Data Points ◽

Repulsive Forces ◽

Nearest Neighbor Searches ◽

Efficient Approximation ◽

Memory Efficient

We introduce a binary embedding framework, called Proximity Preserving Code (PPC), which learns similarity and dissimilarity between data points to create a compact and affinity-preserving binary code. This code can be used to apply fast and memory-efficient approximation to nearest-neighbor searches. Our framework is flexible, enabling different proximity definitions between data points. In contrast to previous methods that extract binary codes based on unsigned graph partitioning, our system models the attractive and repulsive forces in the data by incorporating positive and negative graph weights. The proposed framework is shown to boil down to finding the minimal cut of a signed graph, a problem known to be NP-hard. We offer an efficient approximation and achieve superior results by constructing the code bit after bit. We show that the proposed approximation is superior to the commonly used spectral methods with respect to both accuracy and complexity. Thus, it is useful for many other problems that can be translated into signed graph cut.

PubChem and ChEMBL Beyond Lipinski

10.26434/chemrxiv.7650071 ◽

2019 ◽

Author(s):

Alice Capecchi ◽

Mahendra Awale ◽

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Nearest Neighbor ◽

Molecular Shape ◽

Biological Properties ◽

Web Based ◽

Large Molecules ◽

Interactive Tools ◽

Small Molecule Drugs ◽

Pubchem Database ◽

Nearest Neighbor Searches ◽

Insight Into

Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.

PubChem and ChEMBL Beyond Lipinski

10.26434/chemrxiv.7650071.v2 ◽

2019 ◽

Author(s):

Alice Capecchi ◽

Mahendra Awale ◽

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Nearest Neighbor ◽

Molecular Shape ◽

Biological Properties ◽

Web Based ◽

Large Molecules ◽

Interactive Tools ◽

Small Molecule Drugs ◽

Pubchem Database ◽

Nearest Neighbor Searches ◽

Insight Into

Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.

PubChem and ChEMBL Beyond Lipinski

10.26434/chemrxiv.7650071.v1 ◽

2019 ◽

Author(s):

Jean-Louis Reymond ◽

Mahendra Awale ◽

Daniel Probst ◽

Alice Capecchi

Keyword(s):

Nearest Neighbor ◽

Molecular Shape ◽

Biological Properties ◽

Web Based ◽

Large Molecules ◽

Interactive Tools ◽

Small Molecule Drugs ◽

Pubchem Database ◽

Nearest Neighbor Searches ◽

Insight Into

Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.

Spatial air index with neighbor information for processing k-nearest neighbor searches in IoT mobile computing

The Journal of Supercomputing ◽

10.1007/s11227-019-02753-5 ◽

2019 ◽

Vol 76 (8) ◽

pp. 6177-6194

Author(s):

Jun-Hong Shen ◽

Cheng-Jung Yu ◽

Ching-Ta Lu ◽

WenYen Lin ◽

Neil Y. Yen ◽

...

Keyword(s):

Mobile Computing ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Air Index ◽

Nearest Neighbor Searches

On approximate k-nearest neighbor searches based on the earth mover’s distance for efficient content-based multimedia information retrieval

Computer Science and Information Systems ◽

10.2298/csis181010012j ◽

2019 ◽

pp. 12-12

Author(s):

Min-Hee Jang ◽

Sang-Wook Kim ◽

Woong-Kee Loh ◽

Jung-Im Won

Keyword(s):

Information Retrieval ◽

Multimedia Information ◽

Nearest Neighbor ◽

Earth Mover’S Distance ◽

Multimedia Information Retrieval ◽

K Nearest Neighbor ◽

Earth Mover's Distance ◽

The Earth ◽

Nearest Neighbor Searches

A Probabilistic Molecular Fingerprint for Big Data Settings

10.26434/chemrxiv.7176350.v1 ◽

2018 ◽

Author(s):

Daniel Probst ◽

Jean-Louis Reymond

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Molecular Fingerprint ◽

Molecular Fingerprints ◽

Approximate Nearest Neighbor ◽

Neighbor Search ◽

Large Databases ◽

Nearest Neighbor Searches ◽

Extended Connectivity

Background: Among the various molecular fingerprints available to describe small organic molecules, ECFP4 (extended connectivity fingerprint, up to four bonds) performs best in benchmarking drug analog recovery studies as it encodes substructures with a high level of detail. Unfortunately, ECFP4 requires high dimensional representations (≥1,024D) to perform well, resulting in ECFP4 nearest neighbor searches in very large databases such as GDB, PubChem or ZINC to perform very slowly due to the curse of dimensionality. <a></a><a></a> Results: Herein we report a new fingerprint, called MHFP6 (MinHash fingerprint, up to six bonds), which encodes detailed substructures using the extended connectivity principle of ECFP in a fundamentally different manner, increasing the performance of exact nearest neighbor searches in benchmarking studies and enabling the application of locality sensitive hashing (LSH) approximate nearest neighbor search algorithms. To describe a molecule, MHFP6 extracts the SMILES of all circular substructures around each atom up to a diameter of six bonds and applies the MinHash method to the resulting set. MHFP6 outperforms ECFP4 in benchmarking analog recovery studies. Furthermore, MHFP6 outperforms ECFP4 in approximate nearest neighbor searches by two orders of magnitude in terms of speed, while decreasing the error rate. Conclusion<a></a><a>: MHFP6 is a new molecular fingerprint, encoding circular substructures, which outperforms ECFP4 for analog searches while allowing the direct application of locality sensitive hashing algorithms. It should be well suited for the analysis of large databases. The source code for MHFP6 is available on GitHub (</a><a href="https://github.com/reymond-group/mhfp">https://github.com/reymond-group/mhfp</a>).<a></a>

nearest neighbor searches
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hybrid KNN-join: Parallel nearest neighbor searches exploiting CPU and GPU architectural features

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Tree-Based Algorithm for Stable and Efficient Data Clustering

Proximity Preserving Binary Code Using Signed Graph-Cut

PubChem and ChEMBL Beyond Lipinski

PubChem and ChEMBL Beyond Lipinski

PubChem and ChEMBL Beyond Lipinski

Spatial air index with neighbor information for processing k-nearest neighbor searches in IoT mobile computing

On approximate k-nearest neighbor searches based on the earth mover’s distance for efficient content-based multimedia information retrieval

A Probabilistic Molecular Fingerprint for Big Data Settings

Export Citation Format

nearest neighbor searchesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Hybrid KNN-join: Parallel nearest neighbor searches exploiting CPU and GPU architectural features

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Tree-Based Algorithm for Stable and Efficient Data Clustering

Proximity Preserving Binary Code Using Signed Graph-Cut

PubChem and ChEMBL Beyond Lipinski

PubChem and ChEMBL Beyond Lipinski

PubChem and ChEMBL Beyond Lipinski

Spatial air index with neighbor information for processing k-nearest neighbor searches in IoT mobile computing

On approximate k-nearest neighbor searches based on the earth mover’s distance for efficient content-based multimedia information retrieval

A Probabilistic Molecular Fingerprint for Big Data Settings

nearest neighbor searches
Recently Published Documents