Fast algorithm for anchor graph hashing

Anchor graph hashing is used in many applications such as cancer detection, web page classification, and drug discovery. It computes the hash codes from the eigenvectors of the matrix representing the similarities between data points and anchor points; anchors refer to the points representing the data distribution. In performing an approximate nearest neighbor search, the hash codes of a query data point are determined by identifying its closest anchor points. Anchor graph hashing, however, incurs high computation cost since (1) the computation cost of obtaining the eigenvectors is quadratic to the number of anchor points, and (2) the similarities of the query data point to all the anchor points must be computed. Our proposal, Tridiagonal hashing , increases the efficiency of anchor graph hashing because of its two advances: (1) we apply a graph clustering algorithm to compute the eigenvectors from the tridiagonal matrix obtained from the similarities between data points and anchor points, and (2) we detect anchor points closest to the query data point by using a dimensionality reduction approach. Experiments show that our approach is several orders of magnitude faster than the previous approaches. Besides, it yields high search accuracy than the original anchor graph hashing approach.

Download Full-text

Approximate Nearest Neighbor Search using a Single Space-filling Curve and Multiple Representations of the Data Points

18th International Conference on Pattern Recognition (ICPR'06) ◽

10.1109/icpr.2006.275 ◽

2006 ◽

Cited By ~ 20

Author(s):

G. Mainar-Ruiz ◽

J. Perez-Cortes

Keyword(s):

Nearest Neighbor ◽

Multiple Representations ◽

Nearest Neighbor Search ◽

Space Filling ◽

Approximate Nearest Neighbor Search ◽

Space Filling Curve ◽

Approximate Nearest Neighbor ◽

Neighbor Search ◽

Data Points ◽

Filling Curve

Download Full-text

Symmetry Based Automatic Evolution of Clusters: A New Approach to Data Clustering

Computational Intelligence and Neuroscience ◽

10.1155/2015/796276 ◽

2015 ◽

Vol 2015 ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

Singh Vijendra ◽

Sahoo Laxman

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

A Priori ◽

Clustering Algorithms ◽

Real Data ◽

Nearest Neighbor Search ◽

Objective Functions ◽

Neighbor Search ◽

Genetic Clustering ◽

Symmetric Points

We present a multiobjective genetic clustering approach, in which data points are assigned to clusters based on new line symmetry distance. The proposed algorithm is called multiobjective line symmetry based genetic clustering (MOLGC). Two objective functions, first the Davies-Bouldin (DB) index and second the line symmetry distance based objective functions, are used. The proposed algorithm evolves near-optimal clustering solutions using multiple clustering criteria, without a priori knowledge of the actual number of clusters. The multiple randomizedKdimensional (Kd) trees based nearest neighbor search is used to reduce the complexity of finding the closest symmetric points. Experimental results based on several artificial and real data sets show that proposed clustering algorithm can obtain optimal clustering solutions in terms of different cluster quality measures in comparison to existing SBKM and MOCK clustering algorithms.

Download Full-text

Stochastically Robust Personalized Ranking for LSH Recommendation Retrieval

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5889 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4594-4601

Author(s):

Dung D. Le ◽

Hady W. Lauw

Keyword(s):

Nearest Neighbor ◽

Hash Functions ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Neighbor Search ◽

Data Points ◽

Recommendation Accuracy ◽

Prohibitive Cost ◽

The Cost

Locality Sensitive Hashing (LSH) has become one of the most commonly used approximate nearest neighbor search techniques to avoid the prohibitive cost of scanning through all data points. For recommender systems, LSH achieves efficient recommendation retrieval by encoding user and item vectors into binary hash codes, reducing the cost of exhaustively examining all the item vectors to identify the top-k items. However, conventional matrix factorization models may suffer from performance degeneration caused by randomly-drawn LSH hash functions, directly affecting the ultimate quality of the recommendations. In this paper, we propose a framework named øurmodel, which factors in the stochasticity of LSH hash functions when learning real-valued user and item latent vectors, eventually improving the recommendation accuracy after LSH indexing. Experiments on publicly available datasets show that the proposed framework not only effectively learns user's preferences for prediction, but also achieves high compatibility with LSH stochasticity, producing superior post-LSH indexing performances as compared to state-of-the-art baselines.

Download Full-text

Automated Positioning of Anchors for Personal Fall Arrest Systems for Steep-Sloped Roofs

Buildings ◽

10.3390/buildings11010010 ◽

2020 ◽

Vol 11 (1) ◽

pp. 10

Author(s):

Azin Heidari ◽

Svetlana Olbina ◽

Scott Glick

Keyword(s):

Optimization Algorithm ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

K Nearest Neighbor ◽

Neighbor Search ◽

Anchor Points ◽

Fall Protection ◽

Fall Arrest ◽

Python Programming ◽

Machine Readable

Falls account for about one-third of all construction fatalities with most fatalities in the roofing trade. Even though a personal fall arrest system (PFAS) is required for fall protection, proper placement of PFAS anchor points is an issue evidenced by the high number of fatalities caused by incorrect anchor positioning. The research goal was to proof the concept of optimizing the location of the PFAS anchor points on steep-sloped roofs. This goal was achieved by: (1) Developing an algorithm for converting the required local jurisdiction construction regulations and standards for PFAS anchor positioning into machine-readable rules; and (2) Developing and validating an algorithm for optimizing the location of PFAS anchor points. The K-Nearest Neighbor Search (KNNS) optimization algorithm was selected in this research and was implemented into a standalone computer tool using Python programming language. The tool calculates the potential anchor locations that satisfy the fall clearance and swing hazard requirements and then displays the anchor locations both graphically and numerically. The optimization algorithm was validated using the K-fold Cross-Validation method, which proved the algorithm was adequately accurate and consistent. The research contribution is the proof of the concept that the development of an optimization algorithm and automated field-level tool for optimal selection of PFAS anchor points is possible, further research and refinement could help steep-sloped roofing companies improve their safety practices.

Download Full-text

CIMON: Towards High-quality Hash Codes

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/125 ◽

2021 ◽

Author(s):

Xiao Luo ◽

Daqing Wu ◽

Zeyu Ma ◽

Chong Chen ◽

Minghua Deng ◽

...

Keyword(s):

Semantic Similarity ◽

Nearest Neighbor ◽

Feature Space ◽

Nearest Neighbor Search ◽

Point Pair ◽

Neighbor Search ◽

Wide Range ◽

Benchmark Datasets ◽

Similarity Structure ◽

Hash Codes

Recently, hashing is widely used in approximate nearest neighbor search for its storage and computational efficiency. Most of the unsupervised hashing methods learn to map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure from the pre-trained model as the guiding information, i.e., treating each point pair similar if their distance is small in feature space. However, due to the inefficient representation ability of the pre-trained model, many false positives and negatives in local semantic similarity will be introduced and lead to error propagation during the hash code learning. Moreover, few of the methods consider the robustness of models, which will cause instability of hash codes to disturbance. In this paper, we propose a new method named Comprehensive sImilarity Mining and cOnsistency learNing (CIMON). First, we use global refinement and similarity statistical distribution to obtain reliable and smooth guidance. Second, both semantic and contrastive consistency learning are introduced to derive both disturb-invariant and discriminative hash codes. Extensive experiments on several benchmark datasets show that the proposed method outperforms a wide range of state-of-the-art methods in both retrieval performance and robustness.

Download Full-text

NBC: An Efficient Hierarchical Clustering Algorithm for Large Datasets

International Journal of Semantic Computing ◽

10.1142/s1793351x15400085 ◽

2015 ◽

Vol 09 (03) ◽

pp. 307-331 ◽

Cited By ~ 1

Author(s):

Wei Zhang ◽

Gongxuan Zhang ◽

Yongli Wang ◽

Zhaomeng Zhu ◽

Tao Li

Keyword(s):

Hierarchical Clustering ◽

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Large Datasets ◽

Nearest Neighbor Search ◽

Large Dataset ◽

Neighbor Search ◽

Hierarchical Clustering Algorithm

Nearest neighbor search is a key technique used in hierarchical clustering and its computing complexity decides the performance of the hierarchical clustering algorithm. The time complexity of standard agglomerative hierarchical clustering is O(n3), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain, SLINK and CLINK) is O(n2). This paper presents a new nearest neighbor search method called nearest neighbor boundary (NNB), which first divides a large dataset into independent subset and then finds nearest neighbor of each point in subset. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log 2n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering (NBC), and the proposed algorithm can be adapted to the parallel and distributed computing framework. The experimental results demonstrate that our algorithm is practical for large datasets.

Download Full-text

The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions

10.26434/chemrxiv.12777566.v1 ◽

2020 ◽

Author(s):

Cameron Hargreaves ◽

Matthew Dyer ◽

Michael Gaultois ◽

Vitaliy Kurlin ◽

Matthew J Rosseinsky

Keyword(s):

Euclidean Distance ◽

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Inorganic Crystal Structure Database ◽

Earth Mover’S Distance ◽

Chemical Similarity ◽

Earth Mover's Distance ◽

Neighbor Search ◽

The Earth ◽

Binary Compounds

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.

Download Full-text

Adaptive bit allocation hashing for approximate nearest neighbor search

Neurocomputing ◽

10.1016/j.neucom.2014.10.042 ◽

2015 ◽

Vol 151 ◽

pp. 719-728 ◽

Cited By ~ 4

Author(s):

Qin-Zhen Guo ◽

Zhi Zeng ◽

Shuwu Zhang

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Bit Allocation ◽

Approximate Nearest Neighbor Search ◽

Approximate Nearest Neighbor ◽

Neighbor Search

Download Full-text

PCT: Point cloud transformer

Computational Visual Media ◽

10.1007/s41095-021-0229-5 ◽

2021 ◽

Vol 7 (2) ◽

pp. 187-199

Author(s):

Meng-Hao Guo ◽

Jun-Xiong Cai ◽

Zheng-Ning Liu ◽

Tai-Jiang Mu ◽

Ralph R. Martin ◽

...

Keyword(s):

Language Processing ◽

Point Cloud ◽

Nearest Neighbor ◽

Semantic Segmentation ◽

Nearest Neighbor Search ◽

Local Context ◽

Irregular Domain ◽

Cloud Processing ◽

Neighbor Search ◽

Farthest Point

AbstractThe irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer (PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.

Download Full-text

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3380600 ◽

2020 ◽

Author(s):

Conglong Li ◽

Minjia Zhang ◽

David G. Andersen ◽

Yuxiong He

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbor Search ◽

Early Termination ◽

Approximate Nearest Neighbor Search ◽

Approximate Nearest Neighbor ◽

Neighbor Search

Download Full-text