A Hybrid Clustering Algorithm Based on Dimensional Reduction and K-Harmonic Means

Author(s):  
Chonghui Guo ◽  
Li Peng
2021 ◽  
pp. 1-14
Author(s):  
Feng Xue ◽  
Yongbo Liu ◽  
Xiaochen Ma ◽  
Bharat Pathak ◽  
Peng Liang

To solve the problem that the K-means algorithm is sensitive to the initial clustering centers and easily falls into local optima, we propose a new hybrid clustering algorithm called the IGWOKHM algorithm. In this paper, we first propose an improved strategy based on a nonlinear convergence factor, an inertial step size, and a dynamic weight to improve the search ability of the traditional grey wolf optimization (GWO) algorithm. Then, the improved GWO (IGWO) algorithm and the K-harmonic means (KHM) algorithm are fused to solve the clustering problem. This fusion clustering algorithm is called IGWOKHM, and it combines the global search ability of IGWO with the local fast optimization ability of KHM to both solve the problem of the K-means algorithm’s sensitivity to the initial clustering centers and address the shortcomings of KHM. The experimental results on 8 test functions and 4 University of California Irvine (UCI) datasets show that the IGWO algorithm greatly improves the efficiency of the model while ensuring the stability of the algorithm. The fusion clustering algorithm can effectively overcome the inadequacies of the K-means algorithm and has a good global optimization ability.


2021 ◽  
Vol 10 (4) ◽  
pp. 2170-2180
Author(s):  
Untari N. Wisesty ◽  
Tati Rajab Mengko

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.


Sign in / Sign up

Export Citation Format

Share Document