Effective Clustering Analysis Based on New Designed Clustering Validity Index and Revised K-Means Algorithm for Big Data

For the shortcoming of fuzzyc-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rulenand obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result.

Download Full-text

Ensemble of HMM classifiers based on the clustering validity index for a handwritten numeral recognizer

Pattern Analysis and Applications ◽

10.1007/s10044-007-0094-6 ◽

2007 ◽

Vol 12 (1) ◽

pp. 21-35 ◽

Cited By ~ 6

Author(s):

Albert Hung-Ren Ko ◽

Robert Sabourin ◽

Alceu de Souza Britto

Keyword(s):

Validity Index ◽

Clustering Validity Index ◽

Clustering Validity

Download Full-text

Research on a New Clustering Validity Index Based on Data Mining

Lecture Notes in Electrical Engineering - Frontier Computing ◽

10.1007/978-981-13-3648-5_220 ◽

2019 ◽

pp. 1700-1704

Author(s):

Chaobo Zhang

Keyword(s):

Data Mining ◽

Validity Index ◽

Clustering Validity Index ◽

Clustering Validity

Download Full-text

Color image segmentation using genetic algorithm with aggregation-based clustering validity index (CVI)

Signal Image and Video Processing ◽

10.1007/s11760-019-01419-2 ◽

2019 ◽

Vol 13 (5) ◽

pp. 833-841 ◽

Cited By ~ 6

Author(s):

Ahmad Khan ◽

Zia ur Rehman ◽

Muhammad Arfan Jaffar ◽

Javid Ullah ◽

Ahmad Din ◽

...

Keyword(s):

Genetic Algorithm ◽

Image Segmentation ◽

Color Image ◽

Color Image Segmentation ◽

Validity Index ◽

Clustering Validity Index ◽

Clustering Validity

Download Full-text

Automatic Optimization Algorithm of Clusters Number Based on Maximum Distance

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.951.231 ◽

2014 ◽

Vol 951 ◽

pp. 231-234

Author(s):

Hong Bo Zhou ◽

Jun Tao Gao

Keyword(s):

Optimization Algorithm ◽

Clustering Algorithm ◽

New Method ◽

Maximum Distance ◽

Validity Index ◽

Clustering Validity Index ◽

Automatic Optimization ◽

Clustering Validity

K-means clustering algorithm clusters datasets according to the certain clustering number k．However k cannot be confirmed beforehand.A new clustering validity index was designed from the standpoint of sample geometry.Based on the index a new method for determining the optimal clustering number in K-means clustering algorithm was proposed．

Download Full-text