scholarly journals Research on Density-Based K-means Clustering Algorithm

2021 ◽  
Vol 2137 (1) ◽  
pp. 012071
Author(s):  
Shuxin Liu ◽  
Xiangdong Liu

Abstract Cluster analysis is an unsupervised learning process, and its most classic algorithm K-means has the advantages of simple principle and easy implementation. In view of the K-means algorithm’s shortcoming, where is arbitrary processing of clusters k value, initial cluster center and outlier points. This paper discusses the improvement of traditional K-means algorithm and puts forward an improved algorithm with density clustering algorithm. First, it describes the basic principles and process of the K-means algorithm and the DBSCAN algorithm. Then summarizes improvement methods with the three aspects and their advantages and disadvantages, at the same time proposes a new density-based K-means improved algorithm. Finally, it prospects the development direction and trend of the density-based K-means clustering algorithm.

The proposed research work aims to perform the cluster analysis in the field of Precision Agriculture. The k-means technique is implemented to cluster the agriculture data. Selecting K value plays a major role in k-mean algorithm. Different techniques are used to identify the number of cluster value (k-value). Identification of suitable initial centroid has an important role in k-means algorithm. In general it will be selected randomly. In the proposed work to get the stability in the result Hybrid K-Mean clustering is used to identify the initial centroids. Since initial cluster centers are well defined Hybrid K-Means acts as a stable clustering technique.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Ziqi Jia ◽  
Ling Song

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.


2014 ◽  
Vol 472 ◽  
pp. 427-431
Author(s):  
Zong Lin Ye ◽  
Hui Cao ◽  
Li Xin Jia ◽  
Yan Bin Zhang ◽  
Gang Quan Si

This paper proposes a novel multi-radius density clustering algorithm based on outlier factor. The algorithm first calculates the density-similar-neighbor-based outlier factor (DSNOF) for each point in the dataset according to the relationship of the density of the point and its neighbors, and then treats the point whose DSNOF is smaller than 1 as a core point. Second, the core points are used for clustering by the similar process of the density based spatial clustering application with noise (DBSCAN) to get some sub-clusters. Third, the proposed algorithm merges the obtained sub-clusters into some clusters. Finally, the points whose DSNOF are larger than 1 are assigned into these clusters. Experiments are performed on some real datasets of the UCI Machine Learning Repository and the experiments results verify that the effectiveness of the proposed model is higher than the DBSCAN algorithm and k-means algorithm and would not be affected by the parameter greatly.


2013 ◽  
Vol 380-384 ◽  
pp. 1290-1293
Author(s):  
Qing Ju Guo ◽  
Wen Tian Ji ◽  
Sheng Zhong

Lots of research findings have been made from home and abroad on clustering algorithm in recent years. In view of the traditional partition clustering method K-means algorithm, this paper, after analyzing its advantages and disadvantages, combines it with ontology-based data set to establish a semantic web model. It improves the existing clustering algorithm in various constraint conditions with the aim of demonstrating that the improved algorithm has better efficiency and accuracy under semantic web.


2016 ◽  
Vol 2016 ◽  
pp. 1-10
Author(s):  
Ning Li ◽  
Yunxia Gu ◽  
Zhongliang Deng

A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.


2010 ◽  
Vol 29-32 ◽  
pp. 802-808
Author(s):  
Min Min

On analyzing the common problems in fuzzy clustering algorithms, we put forward the combined fuzzy clustering one, which will automatically generate a reasonable clustering numbers and initial cluster center. This clustering algorithm has been tested by real evaluation data of teaching designs. The result proves that the combined fuzzy clustering based on F-statistic is more effective.


2019 ◽  
Vol 13 (4) ◽  
pp. 403-409
Author(s):  
Hui Qi ◽  
Jinqing Li ◽  
Xiaoqiang Di ◽  
Weiwu Ren ◽  
Fengrong Zhang

Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as twodimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest variance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for highdimensional data.


2014 ◽  
Vol 998-999 ◽  
pp. 873-877
Author(s):  
Zhen Bo Wang ◽  
Bao Zhi Qiu

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.


J ◽  
2019 ◽  
Vol 2 (2) ◽  
pp. 226-235 ◽  
Author(s):  
Chunhui Yuan ◽  
Haitao Yang

Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, the K-value of clustering needs to be given in advance and the choice of K-value directly affect the convergence result. To solve this problem, we mainly analyze four K-value selection algorithms, namely Elbow Method, Gap Statistic, Silhouette Coefficient, and Canopy; give the pseudo code of the algorithm; and use the standard data set Iris for experimental verification. Finally, the verification results are evaluated, the advantages and disadvantages of the above four algorithms in a K-value selection are given, and the clustering range of the data set is pointed out.


Sign in / Sign up

Export Citation Format

Share Document