scholarly journals Improving Density Peak Clustering by Automatic Peak Selection and Single Linkage Clustering

Symmetry ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 1168
Author(s):  
Jun-Lin Lin ◽  
Jen-Chieh Kuo ◽  
Hsing-Wang Chuang

Density peak clustering (DPC) is a density-based clustering method that has attracted much attention in the academic community. DPC works by first searching density peaks in the dataset, and then assigning each data point to the same cluster as its nearest higher-density point. One problem with DPC is the determination of the density peaks, where poor selection of the density peaks could yield poor clustering results. Another problem with DPC is its cluster assignment strategy, which often makes incorrect cluster assignments for data points that are far from their nearest higher-density points. This study modifies DPC and proposes a new clustering algorithm to resolve the above problems. The proposed algorithm uses the radius of the neighborhood to automatically select a set of the likely density peaks, which are far from their nearest higher-density points. Using the potential density peaks as the density peaks, it then applies DPC to yield the preliminary clustering results. Finally, it uses single-linkage clustering on the preliminary clustering results to reduce the number of clusters, if necessary. The proposed algorithm avoids the cluster assignment problem in DPC because the cluster assignments for the potential density peaks are based on single-linkage clustering, not based on DPC. Our performance study shows that the proposed algorithm outperforms DPC for datasets with irregularly shaped clusters.

Author(s):  
Jianhua Jiang ◽  
Wei Zhou ◽  
Limin Wang ◽  
Xin Tao ◽  
Keqin Li

The density peaks clustering (DPC) is known as an excellent approach to detect some complicated-shaped clusters with high-dimensionality. However, it is not able to detect outliers, hub nodes and boundary nodes, or form low-density clusters. Therefore, halo is adopted to improve the performance of DPC in processing low-density nodes. This paper explores the potential reasons for adopting halos instead of low-density nodes, and proposes an improved recognition method on Halo node for Density Peak Clustering algorithm (HaloDPC). The proposed HaloDPC has improved the ability to deal with varying densities, irregular shapes, the number of clusters, outlier and hub node detection. This paper presents the advantages of the HaloDPC algorithm on several test cases.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Chunzhong Li ◽  
Yunong Zhang

Among numerous clustering algorithms, clustering by fast search and find of density peaks (DPC) is favoured because it is less affected by shapes and density structures of the data set. However, DPC still shows some limitations in clustering of data set with heterogeneity clusters and easily makes mistakes in assignment of remaining points. The new algorithm, density peak clustering based on relative density optimization (RDO-DPC), is proposed to settle these problems and try obtaining better results. With the help of neighborhood information of sample points, the proposed algorithm defines relative density of the sample data and searches and recognizes density peaks of the nonhomogeneous distribution as cluster centers. A new assignment strategy is proposed to solve the abundance classification problem. The experiments on synthetic and real data sets show good performance of the proposed algorithm.


Symmetry ◽  
2019 ◽  
Vol 11 (7) ◽  
pp. 859 ◽  
Author(s):  
Lin

The Density Peak Clustering (DPC) algorithm is a new density-based clustering method. It spends most of its execution time on calculating the local density and the separation distance for each data point in a dataset. The purpose of this study is to accelerate its computation. On average, the DPC algorithm scans half of the dataset to calculate the separation distance of each data point. We propose an approach to calculate the separation distance of a data point by scanning only the neighbors of the data point. Additionally, the purpose of the separation distance is to assist in choosing the density peaks, which are the data points with both high local density and high separation distance. We propose an approach to identify non-peak data points at an early stage to avoid calculating their separation distances. Our experimental results show that most of the data points in a dataset can benefit from the proposed approaches to accelerate the DPC algorithm.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 459
Author(s):  
Shuyi Lu ◽  
Yuanjie Zheng ◽  
Rong Luo ◽  
Weikuan Jia ◽  
Jian Lian ◽  
...  

The clustering algorithm plays an important role in data mining and image processing. The breakthrough of algorithm precision and method directly affects the direction and progress of the following research. At present, types of clustering algorithms are mainly divided into hierarchical, density-based, grid-based and model-based ones. This paper mainly studies the Clustering by Fast Search and Find of Density Peaks (CFSFDP) algorithm, which is a new clustering method based on density. The algorithm has the characteristics of no iterative process, few parameters and high precision. However, we found that the clustering algorithm did not consider the original topological characteristics of the data. We also found that the clustering data is similar to the social network nodes mentioned in DeepWalk, which satisfied power-law distribution. In this study, we tried to consider the topological characteristics of the graph in the clustering algorithm. Based on previous studies, we propose a clustering algorithm that adds the topological characteristics of original data on the basis of the CFSFDP algorithm. Our experimental results show that the clustering algorithm with topological features significantly improves the clustering effect and proves that the addition of topological features is effective and feasible.


2019 ◽  
Vol 1229 ◽  
pp. 012024 ◽  
Author(s):  
Fan Hong ◽  
Yang Jing ◽  
Hou Cun-cun ◽  
Zhang Ke-zhen ◽  
Yao Ruo-xia

Author(s):  
Xiaoyu Qin ◽  
Kai Ming Ting ◽  
Ye Zhu ◽  
Vincent CS Lee

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on densitybased clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.


Author(s):  
Liping Sun ◽  
Shang Ci ◽  
Xiaoqing Liu ◽  
Xiaoyao Zheng ◽  
Qingying Yu ◽  
...  

2021 ◽  
Author(s):  
Yizhang Wang ◽  
Di Wang ◽  
You Zhou ◽  
Chai Quek ◽  
Xiaofeng Zhang

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>


Sign in / Sign up

Export Citation Format

Share Document