Data classification combining Self-Organizing Maps and Informative Nearest Neighbor

The k nearest neighbor is one of the most important and simple procedures for data classification task. The kNN, as it is called, requires only two parameters: the number of k and a similarity measure. However, the algorithm has some weaknesses that make it impossible to be used in real problems. Since the algorithm has no model, an exhaustive comparison of the object in classification analysis and all training dataset is necessary. Another weakness is the optimal choice of k parameter when the object analyzed is in an overlap region. To mitigate theses negative aspects, in this work, a hybrid algorithm is proposed which uses the Self-Organizing Maps (SOM) artificial neural network and a classifier that uses similarity measure based on information. Since SOM has the properties of vector quantization, it is used as a Prototype Generation approach to select a reduced training dataset for the classification approach based on the nearest neighbor rule with informativeness measure, named iNN. The SOMiNN combination was exhaustively experimented and the results show that the proposed approach presents important accuracy in databases where the border region does not have the object classes well defined.

Download Full-text

Automatic data classification by a hierarchy of self-organizing maps

IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028) ◽

10.1109/icsmc.1999.815587 ◽

2003 ◽

Cited By ~ 2

Author(s):

J.A.F. Costa ◽

M.L. de Andrade Netto

Keyword(s):

Data Classification ◽

Self Organizing Maps ◽

Automatic Data ◽

Self Organizing

Download Full-text

Large-Scale Mapping of Carbon Stocks in Riparian Forests with Self-Organizing Maps and the k-Nearest-Neighbor Algorithm

Forests ◽

10.3390/f5071635 ◽

2014 ◽

Vol 5 (7) ◽

pp. 1635-1652 ◽

Cited By ~ 5

Author(s):

Leonhard Suchenwirth ◽

Wolfgang Stümer ◽

Tobias Schmidt ◽

Michael Förster ◽

Birgit Kleinschmit

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Carbon Stocks ◽

Riparian Forests ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

Self Organizing Maps ◽

K Nearest Neighbor Algorithm ◽

Self Organizing

Download Full-text

Ophthalmological Examination Determination Using Data Classification Based on Support Vector Machines and Self-Organizing Maps

Journal of Japan Society for Fuzzy Theory and Intelligent Informatics ◽

10.3156/jsoft.26.559 ◽

2014 ◽

Vol 26 (2) ◽

pp. 559-572 ◽

Cited By ~ 5

Author(s):

Naotake KAMIURA ◽

Ayumu SAITOH ◽

Teijiro ISOKAWA ◽

Nobuyuki MATSUI ◽

Hitoshi TABUCHI

Keyword(s):

Support Vector Machines ◽

Data Classification ◽

Support Vector ◽

Ophthalmological Examination ◽

Self Organizing Maps ◽

Vector Machines ◽

Using Data ◽

Self Organizing

Download Full-text

A model to estimate the Self-Organizing Maps grid dimension for Prototype Generation

Intelligent Data Analysis ◽

10.3233/ida-205123 ◽

2021 ◽

Vol 25 (2) ◽

pp. 321-338

Author(s):

Leandro A. Silva ◽

Bruno P. de Vasconcelos ◽

Emilio Del-Moral-Hernandez

Keyword(s):

Nearest Neighbor ◽

High Accuracy ◽

Classification Performance ◽

K Nearest Neighbor ◽

Self Organizing Maps ◽

Ideal Number ◽

K Nearest Neighbor Algorithm ◽

Public Datasets ◽

The Ideal ◽

Self Organizing

Due to the high accuracy of the K nearest neighbor algorithm in different problems, KNN is one of the most important classifiers used in data mining applications and is recognized in the literature as a benchmark algorithm. Despite its high accuracy, KNN has some weaknesses, such as the time taken by the classification process, which is a disadvantage in many problems, particularly in those that involve a large dataset. The literature presents some approaches to reduce the classification time of KNN by selecting only the most important dataset examples. One of these methods is called Prototype Generation (PG) and the idea is to represent the dataset examples in prototypes. Thus, the classification process occurs in two steps; the first is based on prototypes and the second on the examples represented by the nearest prototypes. The main problem of this approach is a lack of definition about the ideal number of prototypes. This study proposes a model that allows the best grid dimension of Self-Organizing Maps and the ideal number of prototypes to be estimated using the number of dataset examples as a parameter. The approach is contrasted with other PG methods from the literature based on artificial intelligence that propose to automatically define the number of prototypes. The main advantage of the proposed method tested here using eighteen public datasets is that it allows a better relationship between a reduced number of prototypes and accuracy, providing a sufficient number that does not degrade KNN classification performance.

Download Full-text

On selection of intraocular power formula based on data classification using self-organizing maps

2010 IEEE International Conference on Systems, Man and Cybernetics ◽

10.1109/icsmc.2010.5642359 ◽

2010 ◽

Cited By ~ 3

Author(s):

Naotake Kamiura ◽

Nariaki Takehara ◽

Ayumu Saitoh ◽

Teijiro Isokawa ◽

Nobuyuki Matsui ◽

...

Keyword(s):

Data Classification ◽

Self Organizing Maps ◽

Selection Of ◽

Self Organizing

Download Full-text

Application of self-organizing maps to data classification and data prediction for female subjects with unhealthy-level visceral fat

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc.2016.7844501 ◽

2016 ◽

Cited By ~ 1

Author(s):

Naotake Kamiura ◽

Shoji Kobashi ◽

Manabu Nii ◽

Takayuki Yumoto ◽

Ken-ichi Sorachi

Keyword(s):

Visceral Fat ◽

Data Classification ◽

Self Organizing Maps ◽

Data Prediction ◽

Female Subjects ◽

Self Organizing

Download Full-text

Diagnosa Penyakit Kulit Menggunakan Case Based-Reasoning dan Self Organizing Maps

Indonesian Journal of Fundamental Sciences ◽

10.26858/ijfs.v6i1.13967 ◽

2020 ◽

Vol 6 (1) ◽

pp. 53

Author(s):

Fhatiah Adiba ◽

Nurul Mukhlisah Abdal ◽

Andi Akram Nur Risal

Keyword(s):

Nearest Neighbor ◽

Skin Diseases ◽

Case Based Reasoning ◽

Self Organizing Maps ◽

Indexing Method ◽

Case Based ◽

Self Organizing

This study aims to compare the results of the accuracy and speed of the system in diagnosing skin diseases using the case based reasoning (CBR) method with the indexing method and without using indexing. Self-organizing maps (SOM) are used as an indexing method and the process of finding similarity values uses the nearest neighbor method. Testing is done with two scenarios. The first scenario uses CBR without indexing self-organizing maps, the second scenario uses CBR with indexing self-organizing maps. The accuracy of the diagnosis of skin diseases at a threshold ≥80 for CBR without indexing self-organizing maps is 93.46% with an average retrieve time of 0.469 seconds while CBR testing using SOM indexing is 92.52% with an average retrieve time of 0.155 seconds. The results of comparison of CBR methods without using show higher results than using SOM indexing, but the process of retrieving CBR using SOM is faster than not using indexing

Download Full-text