An integrated approach for different attribute types in nearest neighbour classification

W. Z. Liu

doi:10.1017/s0269888900007906

An integrated approach for different attribute types in nearest neighbour classification

The Knowledge Engineering Review ◽

10.1017/s0269888900007906 ◽

1996 ◽

Vol 11 (3) ◽

pp. 245-252

Author(s):

W. Z. Liu

Keyword(s):

Machine Learning ◽

Euclidean Distance ◽

Integrated Approach ◽

Nearest Neighbour ◽

Distance Metric ◽

Classification Techniques ◽

Feature Spaces ◽

Classification Tasks ◽

Combined Work ◽

Mixed Types

AbstractThe basic nearest neighbour algorithm works by storing the training instances and classifying a new case by predicting that it has the same class as its nearest stored instance. To measure the distance between instances, some distance metric needs to be used. In situations when all attributes have numeric values, the conventional nearest neighbour method treats examples as points in feature spaces and uses Euclidean distance as the distance metric. In tasks with only nominal attributes, the simple “over-lap” metric is usually used. To handle classification tasks that have mixed types of attributes, the two different metrics are simply combined. Work by researchers in the machine learning field has shown that this approach performs poorly. This paper attempts to study a more recently developed distance metric and show that this metric is capable of measuring the importance of different attributes. With the use of discretisation for numeric-valued attributes, this method provides an integrated way in dealing with problem domains with mixtures of attribute types. Through detailed analyses, this paper tries to provide further insights into the understanding of nearest neighbour classification techniques and promote further use of this type of classification algorithm.

Download Full-text

Metric Learning Tutorial

10.20944/preprints201809.0131.v1 ◽

2018 ◽

Author(s):

Parag Jain

Keyword(s):

Machine Learning ◽

Euclidean Distance ◽

Learning Algorithm ◽

Metric Learning ◽

General Purpose ◽

Small Distance ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Manhattan Distance ◽

Nearest Neighbour

Most popular machine learning algorithms like k-nearest neighbour, k-means, SVM uses a metric to identify the distance(or similarity) between data instances. It is clear that performances of these algorithm heavily depends on the metric being used. In absence of prior knowledge about data we can only use general purpose metrics like Euclidean distance, Cosine similarity or Manhattan distance etc, but these metric often fail to capture the correct behaviour of data which directly affects the performance of the learning algorithm. Solution to this problem is to tune the metric according to the data and the problem, manually deriving the metric for high dimensional data which is often difficult to even visualize is not only tedious but is extremely difficult. Which leads to put effort on \textit{metric learning} which satisfies the data geometry.Goal of metric learning algorithm is to learn a metric which assigns small distance to similar points and relatively large distance to dissimilar points.

Download Full-text

Perbandingan Distance Metric pada Nearest Neighbour untuk Klasifikasi Sel Darah Putih

Jurnal ULTIMATICS ◽

10.31937/ti.v11i1.932 ◽

2019 ◽

Vol 11 (1) ◽

pp. 16-19

Author(s):

Felix Indra Kurniadi ◽

Vinnia Kemala Putri

Keyword(s):

Feature Extraction ◽

Human Body ◽

Blood Cells ◽

Euclidean Distance ◽

Local Binary Pattern ◽

White Blood Cells ◽

Nearest Neighbour ◽

Harmful Substance ◽

Distance Metric ◽

Minkowski Distance

White blood cells, have a function to protect human body from viruses, bacteria and any other harmful substance. In this research, Local Binary Pattern was proposed for feature extraction using Euclidean distance, Chebyshev distance and Minkowski distance as classifier.

Download Full-text

Application of Genetic Algorithm and K-Nearest Neighbour Method in Real World Medical Fraud Detection Problem

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2000.p0130 ◽

2000 ◽

Vol 4 (2) ◽

pp. 130-137 ◽

Cited By ~ 11

Author(s):

Hongxing He ◽

◽

Simon Hawkins ◽

Warwick Graco ◽

Xin Yao ◽

...

Keyword(s):

Genetic Algorithm ◽

Decision Rule ◽

Real World ◽

Euclidean Distance ◽

Fraud Detection ◽

Classification Performance ◽

Alternative Methods ◽

Nearest Neighbour ◽

Distance Metric ◽

Nearest Neighbours

In the k-Nearest Neighbour (kNN) algorithm, the classification of a new sample is determined by the class of its k nearest neighbours. The performance of the kNN algorithm is influenced by three main factors: (1) the distance metric used to locate the nearest neighbours; (2) the decision rule used to derive a classification from the k-nearest neighbours; and (3) the number of neighbours used to classify the new sample. Using k = 1, 3, or 5 nearest neighbours, this study uses a Genetic Algorithm (GA) to find the optimal non-Euclidean distance metric in the kNN algorithm and examines two alternative methods (Majority Rule and Bayes Rule) to derive a classification from the k nearest neighbours. This modified algorithm was evaluated on two real-world medical fraud problems. The General Practitioner (GP) database is a 2-class problem in which GPs are classified as either practising appropriately or inappropriately. The ’.Doctor-Shoppers’ database is a 5-class problem in which patients are classified according to the likelihood that they are ’doctor-shoppers’. Doctor-shoppers are patients who consult many physicians in order to obtain multiple prescriptions of drugs of addiction in excess of their own therapeutic need. In both applications, classification accuracy was improved by optimising the distance metric in the kNN algorithm. The agreement rate on the GP dataset improved from around 70% (using Euclidean distance) to 78 % (using an optimised distance metric), and from about 55% to 82% on the Doctor Shopper’s dataset. Differences in either the decision rule or the number of nearest neighbours had little or no impact on the classification performance of the kNN algorithm. The excellent performance of the kNN algorithm when the distance metric is optimised using a genetic algorithm paves the way for its application in the real world fraud detection problems faced by the Health Insurance Commission (HIC).

Download Full-text

An Integrated Approach of Mechanistic-Modeling and Machine-Learning for Thickness Optimization of Frozen Microwaveable Foods

Foods ◽

10.3390/foods10040763 ◽

2021 ◽

Vol 10 (4) ◽

pp. 763

Author(s):

Ran Yang ◽

Zhenbo Wang ◽

Jiajia Chen

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Food Product ◽

Integrated Approach ◽

Mechanistic Modeling ◽

Bayesian Optimization ◽

Initial Training ◽

Thickness Optimization ◽

Heating Uniformity ◽

Food Design

Mechanistic-modeling has been a useful tool to help food scientists in understanding complicated microwave-food interactions, but it cannot be directly used by the food developers for food design due to its resource-intensive characteristic. This study developed and validated an integrated approach that coupled mechanistic-modeling and machine-learning to achieve efficient food product design (thickness optimization) with better heating uniformity. The mechanistic-modeling that incorporated electromagnetics and heat transfer was previously developed and validated extensively and was used directly in this study. A Bayesian optimization machine-learning algorithm was developed and integrated with the mechanistic-modeling. The integrated approach was validated by comparing the optimization performance with a parametric sweep approach, which is solely based on mechanistic-modeling. The results showed that the integrated approach had the capability and robustness to optimize the thickness of different-shape products using different initial training datasets with higher efficiency (45.9% to 62.1% improvement) than the parametric sweep approach. Three rectangular-shape trays with one optimized thickness (1.56 cm) and two non-optimized thicknesses (1.20 and 2.00 cm) were 3-D printed and used in microwave heating experiments, which confirmed the feasibility of the integrated approach in thickness optimization. The integrated approach can be further developed and extended as a platform to efficiently design complicated microwavable foods with multiple-parameter optimization.

Download Full-text

Classification and photometric redshift estimation of quasars in photometric surveys

Proceedings of the International Astronomical Union ◽

10.1017/s1743921320001829 ◽

2020 ◽

Vol 15 (S359) ◽

pp. 40-41

Author(s):

L. M. Izuti Nakazono ◽

C. Mendes de Oliveira ◽

N. S. T. Hirata ◽

S. Jeram ◽

A. Gonzalez ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbour ◽

Random Forest Algorithm ◽

Photometric Redshift ◽

Using Data

AbstractWe present a machine learning methodology to separate quasars from galaxies and stars using data from S-PLUS in the Stripe-82 region. In terms of quasar classification, we achieved 95.49% for precision and 95.26% for recall using a Random Forest algorithm. For photometric redshift estimation, we obtained a precision of 6% using k-Nearest Neighbour.

Download Full-text

Selection of Suitable Machine Learning Algorithms for Classification Tasks in Reverse Logistics

Procedia CIRP ◽

10.1016/j.procir.2021.01.086 ◽

2021 ◽

Vol 96 ◽

pp. 272-277

Author(s):

Hannah Lickert ◽

Aleksandra Wewer ◽

Sören Dittmann ◽

Pinar Bilge ◽

Franz Dietrich

Keyword(s):

Machine Learning ◽

Reverse Logistics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Classification Tasks ◽

Selection Of

Download Full-text

Determination of Body Parts in Holstein Friesian Cows Comparing Neural Networks and k Nearest Neighbour Classification

Animals ◽

10.3390/ani11010050 ◽

2020 ◽

Vol 11 (1) ◽

pp. 50

Author(s):

Jennifer Salau ◽

Jan Henning Haas ◽

Wolfgang Junge ◽

Georg Thaller

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Body Parts ◽

Nearest Neighbour ◽

Data Set ◽

3D Data ◽

Holstein Friesian ◽

Knn Classification ◽

Friesian Cows

Machine learning methods have become increasingly important in animal science, and the success of an automated application using machine learning often depends on the right choice of method for the respective problem and data set. The recognition of objects in 3D data is still a widely studied topic and especially challenging when it comes to the partition of objects into predefined segments. In this study, two machine learning approaches were utilized for the recognition of body parts of dairy cows from 3D point clouds, i.e., sets of data points in space. The low cost off-the-shelf depth sensor Microsoft Kinect V1 has been used in various studies related to dairy cows. The 3D data were gathered from a multi-Kinect recording unit which was designed to record Holstein Friesian cows from both sides in free walking from three different camera positions. For the determination of the body parts head, rump, back, legs and udder, five properties of the pixels in the depth maps (row index, column index, depth value, variance, mean curvature) were used as features in the training data set. For each camera positions, a k nearest neighbour classifier and a neural network were trained and compared afterwards. Both methods showed small Hamming losses (between 0.007 and 0.027 for k nearest neighbour (kNN) classification and between 0.045 and 0.079 for neural networks) and could be considered successful regarding the classification of pixel to body parts. However, the kNN classifier was superior, reaching overall accuracies 0.888 to 0.976 varying with the camera position. Precision and recall values associated with individual body parts ranged from 0.84 to 1 and from 0.83 to 1, respectively. Once trained, kNN classification is at runtime prone to higher costs in terms of computational time and memory compared to the neural networks. The cost vs. accuracy ratio for each methodology needs to be taken into account in the decision of which method should be implemented in the application.

Download Full-text