An integrated approach for different attribute types in nearest neighbour classification
AbstractThe basic nearest neighbour algorithm works by storing the training instances and classifying a new case by predicting that it has the same class as its nearest stored instance. To measure the distance between instances, some distance metric needs to be used. In situations when all attributes have numeric values, the conventional nearest neighbour method treats examples as points in feature spaces and uses Euclidean distance as the distance metric. In tasks with only nominal attributes, the simple “over-lap” metric is usually used. To handle classification tasks that have mixed types of attributes, the two different metrics are simply combined. Work by researchers in the machine learning field has shown that this approach performs poorly. This paper attempts to study a more recently developed distance metric and show that this metric is capable of measuring the importance of different attributes. With the use of discretisation for numeric-valued attributes, this method provides an integrated way in dealing with problem domains with mixtures of attribute types. Through detailed analyses, this paper tries to provide further insights into the understanding of nearest neighbour classification techniques and promote further use of this type of classification algorithm.