Research on Density-Based K-means Clustering Algorithm

Abstract Cluster analysis is an unsupervised learning process, and its most classic algorithm K-means has the advantages of simple principle and easy implementation. In view of the K-means algorithm’s shortcoming, where is arbitrary processing of clusters k value, initial cluster center and outlier points. This paper discusses the improvement of traditional K-means algorithm and puts forward an improved algorithm with density clustering algorithm. First, it describes the basic principles and process of the K-means algorithm and the DBSCAN algorithm. Then summarizes improvement methods with the three aspects and their advantages and disadvantages, at the same time proposes a new density-based K-means improved algorithm. Finally, it prospects the development direction and trend of the density-based K-means clustering algorithm.

Download Full-text

Hybrid K Mean Clustering Algorithm for Crop Production Analysis in Agriculture

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1002.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 9-13

Keyword(s):

Cluster Analysis ◽

Precision Agriculture ◽

Crop Production ◽

Clustering Algorithm ◽

Research Work ◽

K Value ◽

Clustering Technique ◽

Production Analysis ◽

Initial Cluster ◽

The Stability

The proposed research work aims to perform the cluster analysis in the field of Precision Agriculture. The k-means technique is implemented to cluster the agriculture data. Selecting K value plays a major role in k-mean algorithm. Different techniques are used to identify the number of cluster value (k-value). Identification of suitable initial centroid has an important role in k-means algorithm. In general it will be selected randomly. In the proposed work to get the stability in the result Hybrid K-Mean clustering is used to identify the initial centroids. Since initial cluster centers are well defined Hybrid K-Means acts as a stable clustering technique.

Download Full-text

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Mathematical Problems in Engineering ◽

10.1155/2020/5143797 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Ziqi Jia ◽

Ling Song

Keyword(s):

Categorical Data ◽

Clustering Algorithm ◽

Numerical Data ◽

Experimental Results ◽

Cluster Center ◽

Real Dataset ◽

Dissimilarity Coefficient ◽

Initial Cluster ◽

Data Objects ◽

Selection Of

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.

Download Full-text

Multi-Radius Density Clustering Algorithm Based on Outlier Factor

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.472.427 ◽

2014 ◽

Vol 472 ◽

pp. 427-431

Author(s):

Zong Lin Ye ◽

Hui Cao ◽

Li Xin Jia ◽

Yan Bin Zhang ◽

Gang Quan Si

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Similar Process ◽

The Core ◽

Dbscan Algorithm ◽

Proposed Model ◽

Density Clustering ◽

Relationship Of ◽

Core Points ◽

The Relationship

This paper proposes a novel multi-radius density clustering algorithm based on outlier factor. The algorithm first calculates the density-similar-neighbor-based outlier factor (DSNOF) for each point in the dataset according to the relationship of the density of the point and its neighbors, and then treats the point whose DSNOF is smaller than 1 as a core point. Second, the core points are used for clustering by the similar process of the density based spatial clustering application with noise (DBSCAN) to get some sub-clusters. Third, the proposed algorithm merges the obtained sub-clusters into some clusters. Finally, the points whose DSNOF are larger than 1 are assigned into these clusters. Experiments are performed on some real datasets of the UCI Machine Learning Repository and the experiments results verify that the effectiveness of the proposed model is higher than the DBSCAN algorithm and k-means algorithm and would not be affected by the parameter greatly.

Download Full-text

Ontology-Based K-Means Clustering Algorithm Analysis

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1290 ◽

2013 ◽

Vol 380-384 ◽

pp. 1290-1293

Author(s):

Qing Ju Guo ◽

Wen Tian Ji ◽

Sheng Zhong

Keyword(s):

Semantic Web ◽

Clustering Algorithm ◽

Algorithm Analysis ◽

Clustering Method ◽

Data Set ◽

Advantages And Disadvantages ◽

Research Findings ◽

Partition Clustering ◽

Improved Algorithm

Lots of research findings have been made from home and abroad on clustering algorithm in recent years. In view of the traditional partition clustering method K-means algorithm, this paper, after analyzing its advantages and disadvantages, combines it with ontology-based data set to establish a semantic web model. It improves the existing clustering algorithm in various constraint conditions with the aim of demonstrating that the improved algorithm has better efficiency and accuracy under semantic web.

Download Full-text

Density Clustering Algorithm Based on the Dynamic Selection of Cluster Center

2019 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) ◽

10.1109/cyberc.2019.00050 ◽

2019 ◽

Author(s):

Lulu Sun ◽

Ruilin Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Center ◽

Dynamic Selection ◽

Density Clustering ◽

Selection Of

Download Full-text

Nonuniform Sparse Data Clustering Cascade Algorithm Based on Dynamic Cumulative Entropy

Mathematical Problems in Engineering ◽

10.1155/2016/5707692 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10

Author(s):

Ning Li ◽

Yunxia Gu ◽

Zhongliang Deng

Keyword(s):

Initial Data ◽

Prior Knowledge ◽

Clustering Algorithm ◽

Sparse Data ◽

Cluster Center ◽

Control Factor ◽

Cascade Algorithm ◽

Initial Cluster ◽

Data Clusters ◽

Cumulative Entropy

A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.

Download Full-text

Study of Combined Fuzzy Clustering Algorithm Based on F-Statistics Hierarchy Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.29-32.802 ◽

2010 ◽

Vol 29-32 ◽

pp. 802-808

Author(s):

Min Min

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Evaluation Data ◽

Fuzzy Clustering Algorithm ◽

Initial Cluster ◽

The Common ◽

Common Problems ◽

F Statistics

On analyzing the common problems in fuzzy clustering algorithms, we put forward the combined fuzzy clustering one, which will automatically generate a reasonable clustering numbers and initial cluster center. This clustering algorithm has been tested by real evaluation data of teaching designs. The result proves that the combined fuzzy clustering based on F-statistic is more effective.

Download Full-text

Improved K-means Clustering Algorithm and its Applications

Recent Patents on Engineering ◽

10.2174/1872212113666181203110611 ◽

2019 ◽

Vol 13 (4) ◽

pp. 403-409

Author(s):

Hui Qi ◽

Jinqing Li ◽

Xiaoqiang Di ◽

Weiwu Ren ◽

Fengrong Zhang

Keyword(s):

Data Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Principal Component ◽

Mean Value ◽

Attack Detection ◽

Cluster Center ◽

Network Attack ◽

Initial Cluster ◽

Low Dimensional

Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as twodimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest variance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for highdimensional data.

Download Full-text

Fuzzy C-Means Clustering Algorithm Based on Coefficient of Variation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.873 ◽

2014 ◽

Vol 998-999 ◽

pp. 873-877

Author(s):

Zhen Bo Wang ◽

Bao Zhi Qiu

Keyword(s):

Coefficient Of Variation ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

Cluster Center ◽

Data Set ◽

Fuzzy C Means ◽

Initial Cluster ◽

Fuzzy C Means Clustering ◽

The Impact

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.

Download Full-text

Research on K-Value Selection Method of K-Means Clustering Algorithm

J ◽

10.3390/j2020016 ◽

2019 ◽

Vol 2 (2) ◽

pp. 226-235 ◽

Cited By ~ 18

Author(s):

Chunhui Yuan ◽

Haitao Yang

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Simple Algorithm ◽

Convergence Result ◽

Data Set ◽

K Value ◽

Standard Data ◽

Advantages And Disadvantages ◽

Gap Statistic ◽

Selection Algorithms

Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, the K-value of clustering needs to be given in advance and the choice of K-value directly affect the convergence result. To solve this problem, we mainly analyze four K-value selection algorithms, namely Elbow Method, Gap Statistic, Silhouette Coefficient, and Canopy; give the pseudo code of the algorithm; and use the standard data set Iris for experimental verification. Finally, the verification results are evaluated, the advantages and disadvantages of the above four algorithms in a K-value selection are given, and the clustering range of the data set is pointed out.

Download Full-text