Data classification combining Self-Organizing Maps and Informative Nearest Neighbor

Author(s):  
Leandro J. Moreira ◽  
Leandro A. Silva
2017 ◽  
Vol 2017 ◽  
pp. 1-15 ◽  
Author(s):  
Leandro Juvêncio Moreira ◽  
Leandro A. Silva

The k nearest neighbor is one of the most important and simple procedures for data classification task. The kNN, as it is called, requires only two parameters: the number of k and a similarity measure. However, the algorithm has some weaknesses that make it impossible to be used in real problems. Since the algorithm has no model, an exhaustive comparison of the object in classification analysis and all training dataset is necessary. Another weakness is the optimal choice of k parameter when the object analyzed is in an overlap region. To mitigate theses negative aspects, in this work, a hybrid algorithm is proposed which uses the Self-Organizing Maps (SOM) artificial neural network and a classifier that uses similarity measure based on information. Since SOM has the properties of vector quantization, it is used as a Prototype Generation approach to select a reduced training dataset for the classification approach based on the nearest neighbor rule with informativeness measure, named iNN. The SOMiNN combination was exhaustively experimented and the results show that the proposed approach presents important accuracy in databases where the border region does not have the object classes well defined.


Forests ◽  
2014 ◽  
Vol 5 (7) ◽  
pp. 1635-1652 ◽  
Author(s):  
Leonhard Suchenwirth ◽  
Wolfgang Stümer ◽  
Tobias Schmidt ◽  
Michael Förster ◽  
Birgit Kleinschmit

2021 ◽  
Vol 25 (2) ◽  
pp. 321-338
Author(s):  
Leandro A. Silva ◽  
Bruno P. de Vasconcelos ◽  
Emilio Del-Moral-Hernandez

Due to the high accuracy of the K nearest neighbor algorithm in different problems, KNN is one of the most important classifiers used in data mining applications and is recognized in the literature as a benchmark algorithm. Despite its high accuracy, KNN has some weaknesses, such as the time taken by the classification process, which is a disadvantage in many problems, particularly in those that involve a large dataset. The literature presents some approaches to reduce the classification time of KNN by selecting only the most important dataset examples. One of these methods is called Prototype Generation (PG) and the idea is to represent the dataset examples in prototypes. Thus, the classification process occurs in two steps; the first is based on prototypes and the second on the examples represented by the nearest prototypes. The main problem of this approach is a lack of definition about the ideal number of prototypes. This study proposes a model that allows the best grid dimension of Self-Organizing Maps and the ideal number of prototypes to be estimated using the number of dataset examples as a parameter. The approach is contrasted with other PG methods from the literature based on artificial intelligence that propose to automatically define the number of prototypes. The main advantage of the proposed method tested here using eighteen public datasets is that it allows a better relationship between a reduced number of prototypes and accuracy, providing a sufficient number that does not degrade KNN classification performance.


2020 ◽  
Vol 6 (1) ◽  
pp. 53
Author(s):  
Fhatiah Adiba ◽  
Nurul Mukhlisah Abdal ◽  
Andi Akram Nur Risal

This study aims to compare the results of the accuracy and speed of the system in diagnosing skin diseases using the case based reasoning (CBR) method with the indexing method and without using indexing. Self-organizing maps (SOM) are used as an indexing method and the process of finding similarity values uses the nearest neighbor method. Testing is done with two scenarios. The first scenario uses CBR without indexing self-organizing maps, the second scenario uses CBR with indexing self-organizing maps. The accuracy of the diagnosis of skin diseases at a threshold ≥80 for CBR without indexing self-organizing maps is 93.46% with an average retrieve time of 0.469 seconds while CBR testing using SOM indexing is 92.52% with an average retrieve time of 0.155 seconds. The results of comparison of CBR methods without using show higher results than using SOM indexing, but the process of retrieving CBR using SOM is faster than not using indexing


2019 ◽  
Vol 24 (1) ◽  
pp. 87-92 ◽  
Author(s):  
Yvette Reisinger ◽  
Mohamed M. Mostafa ◽  
John P. Hayes

Sign in / Sign up

Export Citation Format

Share Document