scholarly journals Pattern and Feature Selection by Genetic Algorithms in Nearest Neighbor Classification

Author(s):  
Hisao Ishibuchi ◽  
◽  
Tomoharu Nakashima

This paper proposes a genetic-algorithm-based approach for finding a compact reference set in nearest neighbor classification. The reference set is designed by selecting a small number of reference patterns from a large number of training patterns using a genetic algorithm. The genetic algorithm also removes unnecessary features. The reference set in our nearest neighbor classification consists of selected patterns with selected features. A binary string is used for representing the inclusion (or exclusion) of each pattern and feature in the reference set. Our goal is to minimize the number of selected patterns, to minimize the number of selected features, and to maximize the classification performance of the reference set. Computer simulations on commonly used data sets examine the effectiveness of our approach.

Author(s):  
SHILIANG SUN ◽  
QIAONA CHEN

Distance metric learning is a powerful tool to improve performance in classification, clustering and regression tasks. Many techniques have been proposed for distance metric learning based on convex programming, kernel learning, dimension reduction and large margin. The recently proposed large margin nearest neighbor classification (LMNN) improves the performance of k-nearest neighbors classification (k-nn) by a learned global distance metric. However, it does not consider the locality of data distributions. We demonstrate a novel local distance metric learning method called hierarchical distance metric learning (HDM) which first builds a hierarchical structure by grouping data points according to the overlapping ratios defined by us and then learns distance metrics sequentially. In this paper, we combine HDM with LMNN and further propose a new method named hierarchical distance metric learning for large margin nearest neighbor classification (HLMNN). Experiments are performed on many artificial and real-world data sets. Comparisons with the traditional k-nn and the state-of-the-art LMNN show the effectiveness of the proposed HLMNN.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 149
Author(s):  
Stephen Whitelam

A conceptually simple way to classify images is to directly compare test-set data and training-set data. The accuracy of this approach is limited by the method of comparison used, and by the extent to which the training-set data cover configuration space. Here we show that this coverage can be substantially increased using coarse-graining (replacing groups of images by their centroids) and stochastic sampling (using distinct sets of centroids in combination). We use the MNIST and Fashion-MNIST data sets to show that a principled coarse-graining algorithm can convert training images into fewer image centroids without loss of accuracy of classification of test-set images by nearest-neighbor classification. Distinct batches of centroids can be used in combination as a means of stochastically sampling configuration space, and can classify test-set data more accurately than can the unaltered training set. On the MNIST and Fashion-MNIST data sets this approach converts nearest-neighbor classification from a mid-ranking- to an upper-ranking member of the set of classical machine-learning techniques.


Sign in / Sign up

Export Citation Format

Share Document