A Large-Scale k -Nearest Neighbor Classification Algorithm Based on Neighbor Relationship Preservation

Owing to the absence of hypotheses of the underlying distributions of the data and the strong generation ability, the k -nearest neighbor (kNN) classification algorithm is widely used to face recognition, text classification, emotional analysis, and other fields. However, kNN needs to compute the similarity between the unlabeled instance and all the training instances during the prediction process; it is difficult to deal with large-scale data. To overcome this difficulty, an increasing number of acceleration algorithms based on data partition are proposed. However, they lack theoretical analysis about the effect of data partition on classification performance. This paper has made a theoretical analysis of the effect using empirical risk minimization and proposed a large-scale k -nearest neighbor classification algorithm based on neighbor relationship preservation. The process of searching the nearest neighbors is converted to a constrained optimization problem. Then, it gives the estimation of the difference on the objective function value under the optimal solution with data partition and without data partition. According to the obtained estimation, minimizing the similarity of the instances in the different divided subsets can largely reduce the effect of data partition. The minibatch k -means clustering algorithm is chosen to perform data partition for its effectiveness and efficiency. Finally, the nearest neighbors of the test instance are continuously searched from the set generated by successively merging the candidate subsets until they do not change anymore, where the candidate subsets are selected based on the similarity between the test instance and cluster centers. Experiment results on public datasets show that the proposed algorithm can largely keep the same nearest neighbors and no significant difference in classification accuracy as the original kNN classification algorithm and better results than two state-of-the-art algorithms.

Download Full-text

Gear crack level identification based on weighted K nearest neighbor classification algorithm

Mechanical Systems and Signal Processing ◽

10.1016/j.ymssp.2009.01.009 ◽

2009 ◽

Vol 23 (5) ◽

pp. 1535-1547 ◽

Cited By ~ 146

Author(s):

Yaguo Lei ◽

Ming J. Zuo

Keyword(s):

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Gear Crack ◽

Neighbor Classification ◽

Level Identification

Download Full-text

A Speed-up K-Nearest Neighbor Classification Algorithm for Trojan Detection

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advanced Hybrid Information Processing ◽

10.1007/978-3-030-19086-6_24 ◽

2019 ◽

pp. 214-224

Author(s):

Tianshuang Li ◽

Xiang Ji ◽

Jingmei Li

Keyword(s):

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Trojan Detection ◽

Speed Up ◽

Neighbor Classification

Download Full-text

An Adaptive Nearest Neighbor Classification Algorithm for Data Streams

Knowledge Discovery in Databases: PKDD 2005 - Lecture Notes in Computer Science ◽

10.1007/11564126_15 ◽

2005 ◽

pp. 108-120 ◽

Cited By ~ 42

Author(s):

Yan-Nei Law ◽

Carlo Zaniolo

Keyword(s):

Data Streams ◽

Nearest Neighbor ◽

Classification Algorithm ◽

Nearest Neighbor Classification ◽

Neighbor Classification

Download Full-text

Weighted K-Nearest Neighbor Classification Algorithm Based on Genetic Algorithm

TELKOMNIKA Indonesian Journal of Electrical Engineering ◽

10.11591/telkomnika.v11i10.2534 ◽

2013 ◽

Vol 11 (10) ◽

Cited By ~ 3

Author(s):

Xuesong Yan

Keyword(s):

Genetic Algorithm ◽

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Neighbor Classification

Download Full-text

A Weighted Fuzzy Rough Nearest Neighbor Classification Algorithm Based on Multiple Interpolation and Similarity Attribute Analysis

2018 IEEE International Conference of Safety Produce Informatization (IICSPI) ◽

10.1109/iicspi.2018.8690500 ◽

2018 ◽

Author(s):

Chao Xu ◽

Daiwei Li ◽

Haiqing Zhang ◽

Wenfeng Hou ◽

Tianrui Li

Keyword(s):

Nearest Neighbor ◽

Classification Algorithm ◽

Nearest Neighbor Classification ◽

Attribute Analysis ◽

Multiple Interpolation ◽

Neighbor Classification

Download Full-text

Distributed nearest neighbor classification for large-scale multi-label data on spark

Future Generation Computer Systems ◽

10.1016/j.future.2018.04.094 ◽

2018 ◽

Vol 87 ◽

pp. 66-82 ◽

Cited By ~ 21

Author(s):

Jorge Gonzalez-Lopez ◽

Sebastián Ventura ◽

Alberto Cano

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Label Data ◽

Neighbor Classification

Download Full-text

An Advanced k Nearest Neighbor Classification Algorithm Based on KD-tree

2018 IEEE International Conference of Safety Produce Informatization (IICSPI) ◽

10.1109/iicspi.2018.8690508 ◽

2018 ◽

Cited By ~ 1

Author(s):

Wenfeng Hou ◽

Daiwei Li ◽

Chao Xu ◽

Haiqing Zhang ◽

Tianrui Li

Keyword(s):

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Neighbor Classification

Download Full-text

A Novel K-Nearest Neighbor Classification Algorithm Based on Maximum Entropy

International Journal of Advancements in Computing Technology ◽

10.4156/ijact.vol5.issue5.115 ◽

2013 ◽

Vol 5 (5) ◽

pp. 966-973 ◽

Cited By ~ 2

Author(s):

Xinying Xu ◽

Zhenzhong Liu ◽

Qiufeng Wu

Keyword(s):

Maximum Entropy ◽

Nearest Neighbor ◽

Classification Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Classification ◽

Neighbor Classification

Download Full-text

Nonuniform Granularity-Based Classification in Social Interest Detection

Mathematical Problems in Engineering ◽

10.1155/2017/5054825 ◽

2017 ◽

Vol 2017 ◽

pp. 1-10

Author(s):

Wenjuan Shao ◽

Qingguo Shen ◽

Xianli Jin ◽

Liaoruo Huang ◽

Jingjing Chen

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Social Interest ◽

Classification Performance ◽

Classification Algorithm ◽

Computing Paradigm ◽

The Social ◽

Interest Detection

Social interest detection is a new computing paradigm which processes a great variety of large scale resources. Effective classification of these resources is necessary for the social interest detection. In this paper, we describe some concepts and principles about classification and present a novel classification algorithm based on nonuniform granularity. Clustering algorithm is used to generate a clustering pedigree chart. By using suitable classification cutting values to cut the chart, we can get different branches which are used as categories. The size of cutting value is vital to the performance and can be dynamically adapted in the proposed algorithm. Experiments results carried on the blog posts illustrate the effectiveness of the proposed algorithm. Furthermore, the results for comparing with Naive Bayes, k-nearest neighbor, and so forth validate the better classification performance of the proposed algorithm for large scale resources.

Download Full-text

Analysis of Biological Screening Compounds with Single- or Multi-Target Activity via Diagnostic Machine Learning

Biomolecules ◽

10.3390/biom10121605 ◽

2020 ◽

Vol 10 (12) ◽

pp. 1605

Author(s):

Christian Feldmann ◽

Dimitar Yonchev ◽

Jürgen Bajorath

Keyword(s):

Machine Learning ◽

Large Scale ◽

Nearest Neighbor ◽

Structural Features ◽

Biological Screening ◽

Large Scale Analysis ◽

Structural Relationships ◽

Single Target ◽

Neighbor Relationship ◽

Target Activity

Predicting compounds with single- and multi-target activity and exploring origins of compound specificity and promiscuity is of high interest for chemical biology and drug discovery. We present a large-scale analysis of compound promiscuity including two major components. First, high-confidence datasets of compounds with multi- and corresponding single-target activity were extracted from biological screening data. Positive and negative assay results were taken into account and data completeness was ensured. Second, these datasets were investigated using diagnostic machine learning to systematically distinguish between compounds with multi- and single-target activity. Models built on the basis of chemical structure consistently produced meaningful predictions. These findings provided evidence for the presence of structural features differentiating promiscuous and non-promiscuous compounds. Machine learning under varying conditions using modified datasets revealed a strong influence of nearest neighbor relationship on the predictions. Many multi-target compounds were found to be more similar to other multi-target compounds than single-target compounds and vice versa, which resulted in consistently accurate predictions. The results of our study confirm the presence of structural relationships that differentiate promiscuous and non-promiscuous compounds.

Download Full-text