An integrated approach for different attribute types in nearest neighbour classification

1996 ◽  
Vol 11 (3) ◽  
pp. 245-252
Author(s):  
W. Z. Liu

AbstractThe basic nearest neighbour algorithm works by storing the training instances and classifying a new case by predicting that it has the same class as its nearest stored instance. To measure the distance between instances, some distance metric needs to be used. In situations when all attributes have numeric values, the conventional nearest neighbour method treats examples as points in feature spaces and uses Euclidean distance as the distance metric. In tasks with only nominal attributes, the simple “over-lap” metric is usually used. To handle classification tasks that have mixed types of attributes, the two different metrics are simply combined. Work by researchers in the machine learning field has shown that this approach performs poorly. This paper attempts to study a more recently developed distance metric and show that this metric is capable of measuring the importance of different attributes. With the use of discretisation for numeric-valued attributes, this method provides an integrated way in dealing with problem domains with mixtures of attribute types. Through detailed analyses, this paper tries to provide further insights into the understanding of nearest neighbour classification techniques and promote further use of this type of classification algorithm.

Author(s):  
Parag Jain

Most popular machine learning algorithms like k-nearest neighbour, k-means, SVM uses a metric to identify the distance(or similarity) between data instances. It is clear that performances of these algorithm heavily depends on the metric being used. In absence of prior knowledge about data we can only use general purpose metrics like Euclidean distance, Cosine similarity or Manhattan distance etc, but these metric often fail to capture the correct behaviour of data which directly affects the performance of the learning algorithm. Solution to this problem is to tune the metric according to the data and the problem, manually deriving the metric for high dimensional data which is often difficult to even visualize is not only tedious but is extremely difficult. Which leads to put effort on \textit{metric learning} which satisfies the data geometry.Goal of metric learning algorithm is to learn a metric which assigns small distance to similar points and relatively large distance to dissimilar points.


2019 ◽  
Vol 11 (1) ◽  
pp. 16-19
Author(s):  
Felix Indra Kurniadi ◽  
Vinnia Kemala Putri

White blood cells, have a function to protect human body from viruses, bacteria and any other harmful substance. In this research, Local Binary Pattern was proposed for feature extraction using Euclidean distance, Chebyshev distance and Minkowski distance as classifier.


Author(s):  
Hongxing He ◽  
◽  
Simon Hawkins ◽  
Warwick Graco ◽  
Xin Yao ◽  
...  

In the k-Nearest Neighbour (kNN) algorithm, the classification of a new sample is determined by the class of its k nearest neighbours. The performance of the kNN algorithm is influenced by three main factors: (1) the distance metric used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample. Using k = 1, 3, or 5 nearest neighbours, this study uses a Genetic Algorithm (GA) to find the optimal non-Euclidean distance metric in the kNN algorithm and examines two alternative methods (Majority Rule and Bayes Rule) to derive a classification from the k nearest neighbours. This modified algorithm was evaluated on two real-world medical fraud problems. The General Practitioner (GP) database is a 2-class problem in which GPs are classified as either practising appropriately or inappropriately. The ’.Doctor-Shoppers’ database is a 5-class problem in which patients are classified according to the likelihood that they are ’doctor-shoppers’. Doctor-shoppers are patients who consult many physicians in order to obtain multiple prescriptions of drugs of addiction in excess of their own therapeutic need. In both applications, classification accuracy was improved by optimising the distance metric in the kNN algorithm. The agreement rate on the GP dataset improved from around 70% (using Euclidean distance) to 78 % (using an optimised distance metric), and from about 55% to 82% on the Doctor Shopper’s dataset. Differences in either the decision rule or the number of nearest neighbours had little or no impact on the classification performance of the kNN algorithm. The excellent performance of the kNN algorithm when the distance metric is optimised using a genetic algorithm paves the way for its application in the real world fraud detection problems faced by the Health Insurance Commission (HIC).


Foods ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 763
Author(s):  
Ran Yang ◽  
Zhenbo Wang ◽  
Jiajia Chen

Mechanistic-modeling has been a useful tool to help food scientists in understanding complicated microwave-food interactions, but it cannot be directly used by the food developers for food design due to its resource-intensive characteristic. This study developed and validated an integrated approach that coupled mechanistic-modeling and machine-learning to achieve efficient food product design (thickness optimization) with better heating uniformity. The mechanistic-modeling that incorporated electromagnetics and heat transfer was previously developed and validated extensively and was used directly in this study. A Bayesian optimization machine-learning algorithm was developed and integrated with the mechanistic-modeling. The integrated approach was validated by comparing the optimization performance with a parametric sweep approach, which is solely based on mechanistic-modeling. The results showed that the integrated approach had the capability and robustness to optimize the thickness of different-shape products using different initial training datasets with higher efficiency (45.9% to 62.1% improvement) than the parametric sweep approach. Three rectangular-shape trays with one optimized thickness (1.56 cm) and two non-optimized thicknesses (1.20 and 2.00 cm) were 3-D printed and used in microwave heating experiments, which confirmed the feasibility of the integrated approach in thickness optimization. The integrated approach can be further developed and extended as a platform to efficiently design complicated microwavable foods with multiple-parameter optimization.


2020 ◽  
Vol 15 (S359) ◽  
pp. 40-41
Author(s):  
L. M. Izuti Nakazono ◽  
C. Mendes de Oliveira ◽  
N. S. T. Hirata ◽  
S. Jeram ◽  
A. Gonzalez ◽  
...  

AbstractWe present a machine learning methodology to separate quasars from galaxies and stars using data from S-PLUS in the Stripe-82 region. In terms of quasar classification, we achieved 95.49% for precision and 95.26% for recall using a Random Forest algorithm. For photometric redshift estimation, we obtained a precision of 6% using k-Nearest Neighbour.


Procedia CIRP ◽  
2021 ◽  
Vol 96 ◽  
pp. 272-277
Author(s):  
Hannah Lickert ◽  
Aleksandra Wewer ◽  
Sören Dittmann ◽  
Pinar Bilge ◽  
Franz Dietrich

Animals ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 50
Author(s):  
Jennifer Salau ◽  
Jan Henning Haas ◽  
Wolfgang Junge ◽  
Georg Thaller

Machine learning methods have become increasingly important in animal science, and the success of an automated application using machine learning often depends on the right choice of method for the respective problem and data set. The recognition of objects in 3D data is still a widely studied topic and especially challenging when it comes to the partition of objects into predefined segments. In this study, two machine learning approaches were utilized for the recognition of body parts of dairy cows from 3D point clouds, i.e., sets of data points in space. The low cost off-the-shelf depth sensor Microsoft Kinect V1 has been used in various studies related to dairy cows. The 3D data were gathered from a multi-Kinect recording unit which was designed to record Holstein Friesian cows from both sides in free walking from three different camera positions. For the determination of the body parts head, rump, back, legs and udder, five properties of the pixels in the depth maps (row index, column index, depth value, variance, mean curvature) were used as features in the training data set. For each camera positions, a k nearest neighbour classifier and a neural network were trained and compared afterwards. Both methods showed small Hamming losses (between 0.007 and 0.027 for k nearest neighbour (kNN) classification and between 0.045 and 0.079 for neural networks) and could be considered successful regarding the classification of pixel to body parts. However, the kNN classifier was superior, reaching overall accuracies 0.888 to 0.976 varying with the camera position. Precision and recall values associated with individual body parts ranged from 0.84 to 1 and from 0.83 to 1, respectively. Once trained, kNN classification is at runtime prone to higher costs in terms of computational time and memory compared to the neural networks. The cost vs. accuracy ratio for each methodology needs to be taken into account in the decision of which method should be implemented in the application.


Author(s):  
Brian Murphy ◽  
Geraldine B. Boylan ◽  
Gordon Lightbody ◽  
William P. Marnane

Sign in / Sign up

Export Citation Format

Share Document