Density Peak Clustering Algorithm Considering Topological Features

Shuyi Lu; Yuanjie Zheng; Rong Luo; Weikuan Jia; Jian Lian; Chengjiang Li

doi:10.3390/electronics9030459

Improving Density Peak Clustering by Automatic Peak Selection and Single Linkage Clustering

Symmetry ◽

10.3390/sym12071168 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1168

Author(s):

Jun-Lin Lin ◽

Jen-Chieh Kuo ◽

Hsing-Wang Chuang

Keyword(s):

Clustering Algorithm ◽

Academic Community ◽

Performance Study ◽

Potential Density ◽

Cluster Assignment ◽

Density Peak ◽

Single Linkage ◽

Density Peaks ◽

Assignment Strategy ◽

Density Peak Clustering

Density peak clustering (DPC) is a density-based clustering method that has attracted much attention in the academic community. DPC works by first searching density peaks in the dataset, and then assigning each data point to the same cluster as its nearest higher-density point. One problem with DPC is the determination of the density peaks, where poor selection of the density peaks could yield poor clustering results. Another problem with DPC is its cluster assignment strategy, which often makes incorrect cluster assignments for data points that are far from their nearest higher-density points. This study modifies DPC and proposes a new clustering algorithm to resolve the above problems. The proposed algorithm uses the radius of the neighborhood to automatically select a set of the likely density peaks, which are far from their nearest higher-density points. Using the potential density peaks as the density peaks, it then applies DPC to yield the preliminary clustering results. Finally, it uses single-linkage clustering on the preliminary clustering results to reduce the number of clusters, if necessary. The proposed algorithm avoids the cluster assignment problem in DPC because the cluster assignments for the potential density peaks are based on single-linkage clustering, not based on DPC. Our performance study shows that the proposed algorithm outperforms DPC for datasets with irregularly shaped clusters.

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669.v1 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

HaloDPC: An Improved Recognition Method on Halo Node for Density Peak Clustering Algorithm

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419500125 ◽

2019 ◽

Vol 33 (08) ◽

pp. 1950012 ◽

Cited By ~ 4

Author(s):

Jianhua Jiang ◽

Wei Zhou ◽

Limin Wang ◽

Xin Tao ◽

Keqin Li

Keyword(s):

Clustering Algorithm ◽

High Dimensionality ◽

Test Cases ◽

Low Density ◽

Recognition Method ◽

Density Peak ◽

Irregular Shapes ◽

Density Peaks ◽

Density Peaks Clustering ◽

Density Peak Clustering

The density peaks clustering (DPC) is known as an excellent approach to detect some complicated-shaped clusters with high-dimensionality. However, it is not able to detect outliers, hub nodes and boundary nodes, or form low-density clusters. Therefore, halo is adopted to improve the performance of DPC in processing low-density nodes. This paper explores the potential reasons for adopting halos instead of low-density nodes, and proposes an improved recognition method on Halo node for Density Peak Clustering algorithm (HaloDPC). The proposed HaloDPC has improved the ability to deal with varying densities, irregular shapes, the number of clusters, outlier and hub node detection. This paper presents the advantages of the HaloDPC algorithm on several test cases.

Download Full-text

Clustering Mixed Data Based on Density Peaks and Stacked Denoising Autoencoders

Symmetry ◽

10.3390/sym11020163 ◽

2019 ◽

Vol 11 (2) ◽

pp. 163

Author(s):

Baobin Duan ◽

Lixin Han ◽

Zhinan Gou ◽

Yi Yang ◽

Shuangshuang Chen

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Feature Space ◽

Original Data ◽

Mixed Data ◽

Feature Representations ◽

Density Peaks ◽

Categorical Attributes ◽

Data Objects

With the universal existence of mixed data with numerical and categorical attributes in real world, a variety of clustering algorithms have been developed to discover the potential information hidden in mixed data. Most existing clustering algorithms often compute the distances or similarities between data objects based on original data, which may cause the instability of clustering results because of noise. In this paper, a clustering framework is proposed to explore the grouping structure of the mixed data. First, the transformed categorical attributes by one-hot encoding technique and normalized numerical attributes are input to a stacked denoising autoencoders to learn the internal feature representations. Secondly, based on these feature representations, all the distances between data objects in feature space can be calculated and the local density and relative distance of each data object can be also computed. Thirdly, the density peaks clustering algorithm is improved and employed to allocate all the data objects into different clusters. Finally, experiments conducted on some UCI datasets have demonstrated that our proposed algorithm for clustering mixed data outperforms three baseline algorithms in terms of the clustering accuracy and the rand index.

Download Full-text

Density Peak Clustering Based on Relative Density Optimization

Mathematical Problems in Engineering ◽

10.1155/2020/2816102 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Chunzhong Li ◽

Yunong Zhang

Keyword(s):

Relative Density ◽

Clustering Algorithms ◽

Real Data ◽

Classification Problem ◽

Data Sets ◽

Density Peak ◽

Data Set ◽

Density Peaks ◽

Assignment Strategy ◽

Density Peak Clustering

Among numerous clustering algorithms, clustering by fast search and find of density peaks (DPC) is favoured because it is less affected by shapes and density structures of the data set. However, DPC still shows some limitations in clustering of data set with heterogeneity clusters and easily makes mistakes in assignment of remaining points. The new algorithm, density peak clustering based on relative density optimization (RDO-DPC), is proposed to settle these problems and try obtaining better results. With the help of neighborhood information of sample points, the proposed algorithm defines relative density of the sample data and searches and recognizes density peaks of the nonhomogeneous distribution as cluster centers. A new assignment strategy is proposed to solve the abundance classification problem. The experiments on synthetic and real data sets show good performance of the proposed algorithm.

Download Full-text

Accelerating Density Peak Clustering Algorithm

Symmetry ◽

10.3390/sym11070859 ◽

2019 ◽

Vol 11 (7) ◽

pp. 859 ◽

Cited By ~ 1

Author(s):

Lin

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Early Stage ◽

Separation Distance ◽

Density Peak ◽

Density Peaks ◽

Density Based Clustering ◽

Data Points ◽

Data Point ◽

Density Peak Clustering

The Density Peak Clustering (DPC) algorithm is a new density-based clustering method. It spends most of its execution time on calculating the local density and the separation distance for each data point in a dataset. The purpose of this study is to accelerate its computation. On average, the DPC algorithm scans half of the dataset to calculate the separation distance of each data point. We propose an approach to calculate the separation distance of a data point by scanning only the neighbors of the data point. Additionally, the purpose of the separation distance is to assist in choosing the density peaks, which are the data points with both high local density and high separation distance. We propose an approach to identify non-peak data points at an early stage to avoid calculating their separation distances. Our experimental results show that most of the data points in a dataset can benefit from the proposed approaches to accelerate the DPC algorithm.

Download Full-text

A Novel Hierarchical Clustering Algorithm Based on Density Peaks for Complex Datasets

Complexity ◽

10.1155/2018/2032461 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Rong Zhou ◽

Yong Zhang ◽

Shengzhong Feng ◽

Nurbol Luktarhan

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Complex Structure ◽

Density Peak ◽

Global Parameter ◽

Density Peaks ◽

Complex Datasets

Clustering aims to differentiate objects from different groups (clusters) by similarities or distances between pairs of objects. Numerous clustering algorithms have been proposed to investigate what factors constitute a cluster and how to efficiently find them. The clustering by fast search and find of density peak algorithm is proposed to intuitively determine cluster centers and assign points to corresponding partitions for complex datasets. This method incorporates simple structure due to the noniterative logic and less few parameters; however, the guidelines for parameter selection and center determination are not explicit. To tackle these problems, we propose an improved hierarchical clustering method HCDP aiming to represent the complex structure of the dataset. A k-nearest neighbor strategy is integrated to compute the local density of each point, avoiding to select the nonnecessary global parameter dc and enables cluster smoothing and condensing. In addition, a new clustering evaluation approach is also introduced to extract a “flat” and “optimal” partition solution from the structure by adaptively computing the clustering stability. The proposed approach is conducted on some applications with complex datasets, where the results demonstrate that the novel method outperforms its counterparts to a large extent.

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

Clustering with Missing Features: A Density-Based Approach

Symmetry ◽

10.3390/sym14010060 ◽

2022 ◽

Vol 14 (1) ◽

pp. 60

Author(s):

Kun Gao ◽

Hassan Ali Khan ◽

Wenwen Qu

Keyword(s):

Real World ◽

Incomplete Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Distance Matrix ◽

Density Peak ◽

Density Clustering ◽

Real World Datasets ◽

Density Peak Clustering ◽

Feature Values

Density clustering has been widely used in many research disciplines to determine the structure of real-world datasets. Existing density clustering algorithms only work well on complete datasets. In real-world datasets, however, there may be missing feature values due to technical limitations. Many imputation methods used for density clustering cause the aggregation phenomenon. To solve this problem, a two-stage novel density peak clustering approach with missing features is proposed: First, the density peak clustering algorithm is used for the data with complete features, while the labeled core points that can represent the whole data distribution are used to train the classifier. Second, we calculate a symmetrical FWPD distance matrix for incomplete data points, then the incomplete data are imputed by the symmetrical FWPD distance matrix and classified by the classifier. The experimental results show that the proposed approach performs well on both synthetic datasets and real datasets.

Download Full-text

Density Peak Clustering algorithm using knowledge learning-based fruit fly optimization

International Journal of Computers and Applications ◽

10.1080/1206212x.2018.1440340 ◽

2018 ◽

Vol 40 (3) ◽

pp. 1-10

Author(s):

Ruihong Zhou ◽

Qiaoming Liu ◽

Xuming Han ◽

Limin Wang

Keyword(s):

Clustering Algorithm ◽

Fruit Fly ◽

Density Peak ◽

Fruit Fly Optimization ◽

Density Peak Clustering ◽

Knowledge Learning

Download Full-text