Learning the Number of Clusters in Self Organizing Map

2021 ◽

Vol 7 (2) ◽

Author(s):

Arif Fajar Solikin ◽

Kusrini Kusrini ◽

Ferry Wahyu Wibowo

Keyword(s):

Data Mining ◽

Statistical Test ◽

Optimum Number ◽

Data Normalization ◽

Self Organizing Map ◽

Self Organizing Maps ◽

Number Of Clusters ◽

Map Algorithm ◽

Cluster Data ◽

Self Organizing

Intercomparison was conducted to determine the ability and the performance of the laboratory. Intercomparison results are usually expressed in the range of En ratio values (En ?|1|) which express the equivalence of one laboratory with other laboratories. If the laboratory is declared unequal, then it needs to identify the source of the problem by itself. To make it easier, it can be done by Clustering which is one of the data mining techniques. Clustering is done by applying a self organizing map algorithm on the KNIME (Konstanz Information Miner) analytic tools. Several experiments were carried out with different layer size and data normalization status from one experiment to another experiment. The results were analyzed through pseudo F statistical test and icdrate test. The largest pseudo F statistic value was obtained from the 8th experiment (setting the layer size 2x2 without data normalization) with a pseudo F statistic value of 167.53 for 1kg artifacts and a Pseudo F statistic value of 104.86 for 200 g artifacts where the optimum number of clusters are 4. The smallest icdrate value was obtained from the 5th experiment (setting the 2x3 layer size without data normalization) with an icdrate value of 0.0713 for 1kg artifacts and icdrate value of 0.2889 for 200g artifacts with the best number of clusters being 6. From 12 laboratories can be grouped into 6 groups where each group has the same identification. There are groups 1, 3 and 6 have 1 member, while groups 2, 4 and 5 have 3 members.

Download Full-text

Statistical analysis of rainfall event features using the Self Organizing Map with application to Northern Tunisia

10.5194/egusphere-egu21-9090 ◽

2021 ◽

Author(s):

Sabrine Derouiche ◽

Cécile Mallet ◽

Zoubeida Bargaoui ◽

Abdelwahab Hannachi

Keyword(s):

Learning Algorithm ◽

Global Climate ◽

Winter Precipitation ◽

Rainfall Event ◽

Multidimensional Data ◽

Rain Event ◽

Self Organizing Map ◽

Number Of Clusters ◽

Northern Tunisia ◽

Self Organizing

The use of artificial neural networks in problems related to water resources, hydrology and meteorology has received steadily increasing interest over the last decade or so. In this study, the methodology proposed to analyse rainfall features and to investigate the relationships with global climate change is based on&#160; the use of Self-Organizing Map (SOM) and presents a generic character.As a first step, daily winter precipitation of northern Tunisia, collected between 1960-2009 over 70 rain gauge stations, are transformed into separate events. This separation is based on the determination of the minimun inter-event time (dry interval) between two independent and consecutive rain events. Six rainfall event features (i.e., average rain event accululation, average event duration, seasonnal accumulation, number of rainy day&#8230;) are thus extracted for each of the (70 stations x 50 winter seasons).In the second step, SOM is applied to analyse the six rainfall features. The SOM is an unsupervised learning algorithm, used as a technique vector quantization, allowing the modeling of probability density functions. It divides the set of multidimensional data (vectors of six features in our case) into clusters. As in k-means, rainfall stations and years with similar characteristics are grouped in a cluster represented by its centroid point named referent. SOM enables moreover the projection of high-dimensional data onto a low dimensional (usually two-dimensional) discrete lattice of neurons as an output layer (map space). The structure of the neurons in the map and the cost function used for its training, ensure that neighboring neurons in the map space are associated with neighboring referents in the initial space. This conservation of the topology allows the analysis of multidimensional nonlinear relationships between the six selected descriptors by visualizing their projection in the map space.For a better representation of the input dataset a 16&#215;20 neurons map is used. But a such number may complicate the synthesis of some spatial or temporal specificities. So, this large number of neurons is aggregated into a smaller number of clusters. For that an hierarchical agglomerative clustering (HAC) &#160;is applied in the third step. This hierachical process is initiated by accepting each neuron as a separate cluster. Then, at each stage of the algorithm, similar clusters, using Ward distance, are combined in pairs.The fourth step allows to determine the final number of clusters by using visually-based method known as data image. This consists of mapping the dissimilarity matrix of the referents into an image framework where each pixel re&#64258;ects the magnitude of each value. Here rows and columns can be reordered based on hierarchical clustering of the referents The blocs observed along the diagonal of each image represents the clusters.Finaly the northern Tunisia winter precipitation are classified into four rainfall situations from the driest to the wettest while also taking into account the rainfall day frequency during the season and rainfall event types. The projection of external climatic variables on the map will make it possible to analyse the links between the four observed rain regimes and the global climate.

Download Full-text

Validity Test of Self-Organizing Map (SOM) and K-Means Algorithm for Employee Grouping

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i6.2492 ◽

2020 ◽

Vol 4 (6) ◽

Author(s):

Titik Susilowati ◽

Dedy Sugiarto ◽

Is Mardianto

Keyword(s):

Clustering Algorithm ◽

The Self ◽

Self Organizing Map ◽

Number Of Clusters ◽

Internal Validation ◽

Validity Test ◽

Validation Test ◽

Silhouette Index ◽

Employee Attendance ◽

Self Organizing

Managing employee work discipline needs to be done to support the development of an organization. One way to make it easier to manage employee work discipline is to group employees based on their level of discipline. This study aims to group employees based on their level of discipline using the Self Organizing Map (SOM) and K-Means algorithm. This grouping begins with collecting employee attendance data, then processing attendance data where one of them is determining the parameters to be used, then ending by implementing the clustering algorithm using the SOM and K-Means algorithms. The results of grouping that have been obtained from the implementation of the SOM and K-Means algorithms are then validated using an internal validation test consisting of the Dunn Index, the Silhouette Index and the Connectivity Index to obtain the best number of clusters and algorithms. The results of the validation test obtained 3 best clusters for the level of discipline, namely the disciplinary cluster, the moderate cluster and the undisciplined cluster.

Download Full-text

Clustering gene expression data using adaptive double self-organizing map

Physiological Genomics ◽

10.1152/physiolgenomics.00138.2002 ◽

2003 ◽

Vol 14 (1) ◽

pp. 35-46 ◽

Cited By ~ 15

Author(s):

Habtom Ressom ◽

Dali Wang ◽

Padma Natarajan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Human Error ◽

A Priori ◽

Self Organizing Map ◽

Expression Data ◽

Number Of Clusters ◽

Model Based Clustering ◽

Free Parameters ◽

Self Organizing

This paper presents a novel clustering technique known as adaptive double self-organizing map (ADSOM). ADSOM has a flexible topology and performs clustering and cluster visualization simultaneously, thereby requiring no a priori knowledge about the number of clusters. ADSOM is developed based on a recently introduced technique known as double self-organizing map (DSOM). DSOM combines features of the popular self-organizing map (SOM) with two-dimensional position vectors, which serve as a visualization tool to decide how many clusters are needed. Although DSOM addresses the problem of identifying unknown number of clusters, its free parameters are difficult to control to guarantee correct results and convergence. ADSOM updates its free parameters during training, and it allows convergence of its position vectors to a fairly consistent number of clusters provided that its initial number of nodes is greater than the expected number of clusters. The number of clusters can be identified by visually counting the clusters formed by the position vectors after training. A novel index is introduced based on hierarchical clustering of the final locations of position vectors. The index allows automated detection of the number of clusters, thereby reducing human error that could be incurred from counting clusters visually. The reliance of ADSOM in identifying the number of clusters is proven by applying it to publicly available gene expression data from multiple biological systems such as yeast, human, and mouse. ADSOM’s performance in detecting number of clusters is compared with a model-based clustering method.

Download Full-text