Accelerating Density Peak Clustering Algorithm

The Density Peak Clustering (DPC) algorithm is a new density-based clustering method. It spends most of its execution time on calculating the local density and the separation distance for each data point in a dataset. The purpose of this study is to accelerate its computation. On average, the DPC algorithm scans half of the dataset to calculate the separation distance of each data point. We propose an approach to calculate the separation distance of a data point by scanning only the neighbors of the data point. Additionally, the purpose of the separation distance is to assist in choosing the density peaks, which are the data points with both high local density and high separation distance. We propose an approach to identify non-peak data points at an early stage to avoid calculating their separation distances. Our experimental results show that most of the data points in a dataset can benefit from the proposed approaches to accelerate the DPC algorithm.

Download Full-text

Nearest-Neighbour-Induced Isolation Similarity and Its Impact on Density-Based Clustering

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014755 ◽

2019 ◽

Vol 33 ◽

pp. 4755-4762 ◽

Cited By ~ 3

Author(s):

Xiaoyu Qin ◽

Kai Ming Ting ◽

Ye Zhu ◽

Vincent CS Lee

Keyword(s):

Clustering Algorithm ◽

Distance Measure ◽

Nearest Neighbour ◽

Density Peak ◽

Density Based Clustering ◽

New Type ◽

Density Peak Clustering ◽

The Impact ◽

First Time ◽

Tree Method

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on densitybased clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.

Download Full-text

Improving Density Peak Clustering by Automatic Peak Selection and Single Linkage Clustering

Symmetry ◽

10.3390/sym12071168 ◽

2020 ◽

Vol 12 (7) ◽

pp. 1168

Author(s):

Jun-Lin Lin ◽

Jen-Chieh Kuo ◽

Hsing-Wang Chuang

Keyword(s):

Clustering Algorithm ◽

Academic Community ◽

Performance Study ◽

Potential Density ◽

Cluster Assignment ◽

Density Peak ◽

Single Linkage ◽

Density Peaks ◽

Assignment Strategy ◽

Density Peak Clustering

Density peak clustering (DPC) is a density-based clustering method that has attracted much attention in the academic community. DPC works by first searching density peaks in the dataset, and then assigning each data point to the same cluster as its nearest higher-density point. One problem with DPC is the determination of the density peaks, where poor selection of the density peaks could yield poor clustering results. Another problem with DPC is its cluster assignment strategy, which often makes incorrect cluster assignments for data points that are far from their nearest higher-density points. This study modifies DPC and proposes a new clustering algorithm to resolve the above problems. The proposed algorithm uses the radius of the neighborhood to automatically select a set of the likely density peaks, which are far from their nearest higher-density points. Using the potential density peaks as the density peaks, it then applies DPC to yield the preliminary clustering results. Finally, it uses single-linkage clustering on the preliminary clustering results to reduce the number of clusters, if necessary. The proposed algorithm avoids the cluster assignment problem in DPC because the cluster assignments for the potential density peaks are based on single-linkage clustering, not based on DPC. Our performance study shows that the proposed algorithm outperforms DPC for datasets with irregularly shaped clusters.

Download Full-text

HaloDPC: An Improved Recognition Method on Halo Node for Density Peak Clustering Algorithm

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419500125 ◽

2019 ◽

Vol 33 (08) ◽

pp. 1950012 ◽

Cited By ~ 4

Author(s):

Jianhua Jiang ◽

Wei Zhou ◽

Limin Wang ◽

Xin Tao ◽

Keqin Li

Keyword(s):

Clustering Algorithm ◽

High Dimensionality ◽

Test Cases ◽

Low Density ◽

Recognition Method ◽

Density Peak ◽

Irregular Shapes ◽

Density Peaks ◽

Density Peaks Clustering ◽

Density Peak Clustering

The density peaks clustering (DPC) is known as an excellent approach to detect some complicated-shaped clusters with high-dimensionality. However, it is not able to detect outliers, hub nodes and boundary nodes, or form low-density clusters. Therefore, halo is adopted to improve the performance of DPC in processing low-density nodes. This paper explores the potential reasons for adopting halos instead of low-density nodes, and proposes an improved recognition method on Halo node for Density Peak Clustering algorithm (HaloDPC). The proposed HaloDPC has improved the ability to deal with varying densities, irregular shapes, the number of clusters, outlier and hub node detection. This paper presents the advantages of the HaloDPC algorithm on several test cases.

Download Full-text

Fast Searching Density Peak Clustering Algorithm Based on Shared Nearest Neighbor and Adaptive Clustering Center

Symmetry ◽

10.3390/sym12122014 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2014

Author(s):

Yi Lv ◽

Mandan Liu ◽

Yue Xiang

Keyword(s):

Prior Knowledge ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Density Peak ◽

Adaptive Clustering ◽

Clustering Center ◽

Density Peak Clustering ◽

Shared Nearest Neighbor ◽

Fast Searching

The clustering analysis algorithm is used to reveal the internal relationships among the data without prior knowledge and to further gather some data with common attributes into a group. In order to solve the problem that the existing algorithms always need prior knowledge, we proposed a fast searching density peak clustering algorithm based on the shared nearest neighbor and adaptive clustering center (DPC-SNNACC) algorithm. It can automatically ascertain the number of knee points in the decision graph according to the characteristics of different datasets, and further determine the number of clustering centers without human intervention. First, an improved calculation method of local density based on the symmetric distance matrix was proposed. Then, the position of knee point was obtained by calculating the change in the difference between decision values. Finally, the experimental and comparative evaluation of several datasets from diverse domains established the viability of the DPC-SNNACC algorithm.

Download Full-text

A Novel Hierarchical Clustering Algorithm Based on Density Peaks for Complex Datasets

Complexity ◽

10.1155/2018/2032461 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Rong Zhou ◽

Yong Zhang ◽

Shengzhong Feng ◽

Nurbol Luktarhan

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Complex Structure ◽

Density Peak ◽

Global Parameter ◽

Density Peaks ◽

Complex Datasets

Clustering aims to differentiate objects from different groups (clusters) by similarities or distances between pairs of objects. Numerous clustering algorithms have been proposed to investigate what factors constitute a cluster and how to efficiently find them. The clustering by fast search and find of density peak algorithm is proposed to intuitively determine cluster centers and assign points to corresponding partitions for complex datasets. This method incorporates simple structure due to the noniterative logic and less few parameters; however, the guidelines for parameter selection and center determination are not explicit. To tackle these problems, we propose an improved hierarchical clustering method HCDP aiming to represent the complex structure of the dataset. A k-nearest neighbor strategy is integrated to compute the local density of each point, avoiding to select the nonnecessary global parameter dc and enables cluster smoothing and condensing. In addition, a new clustering evaluation approach is also introduced to extract a “flat” and “optimal” partition solution from the structure by adaptively computing the clustering stability. The proposed approach is conducted on some applications with complex datasets, where the results demonstrate that the novel method outperforms its counterparts to a large extent.

Download Full-text

Adaptive density peak clustering based on dimensional-free and reverse k-nearest neighbors

Information Technology And Control ◽

10.5755/j01.itc.49.3.23405 ◽

2020 ◽

Vol 49 (3) ◽

pp. 395-411

Author(s):

Qiannan Wu ◽

Qianqian Zhang ◽

Ruizhi Sun ◽

Li Li ◽

Huiyu Mu ◽

...

Keyword(s):

High Dimension ◽

Clustering Algorithm ◽

Local Density ◽

Nearest Neighbors ◽

Allocation Strategy ◽

K Nearest Neighbor ◽

K Nearest Neighbors ◽

Density Peak ◽

Real World Datasets ◽

Density Peak Clustering

Cluster analysis plays a crucial component in consumer behavior segment. The density peak clustering algorithm (DPC) is a novel density-based clustering method. However, it performs poorly in high-dimension datasets and the local density for boundary points. In addition, its fault tolerance is affected by one-step allocation strategy. To overcome these disadvantages, an adaptive density peak clustering algorithm based on dimensional-free and reverse k-nearest neighbors (ERK-DPC) is proposed in this paper. First, we compute Euler cosine distance to obtain the similarity of sample points in high-dimension datasets. Then, the adaptive local density formula is used to measure the local density of each point. Finally, the reverse k-nearest neighbor idea is added on two-step allocation strategy, which assigns the remaining points accurately and effectively. The proposed clustering algorithm is experiments on several benchmark datasets and real-world datasets. By comparing the benchmarks, the results demonstrate that the ERK-DPC algorithm superior to some state-of- the-art methods.

Download Full-text

Density Peak Clustering Algorithm Considering Topological Features

Electronics ◽

10.3390/electronics9030459 ◽

2020 ◽

Vol 9 (3) ◽

pp. 459

Author(s):

Shuyi Lu ◽

Yuanjie Zheng ◽

Rong Luo ◽

Weikuan Jia ◽

Jian Lian ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Original Data ◽

Power Law Distribution ◽

Density Peak ◽

Topological Features ◽

Density Peaks ◽

Topological Characteristics ◽

Density Peak Clustering ◽

Clustering Data

The clustering algorithm plays an important role in data mining and image processing. The breakthrough of algorithm precision and method directly affects the direction and progress of the following research. At present, types of clustering algorithms are mainly divided into hierarchical, density-based, grid-based and model-based ones. This paper mainly studies the Clustering by Fast Search and Find of Density Peaks (CFSFDP) algorithm, which is a new clustering method based on density. The algorithm has the characteristics of no iterative process, few parameters and high precision. However, we found that the clustering algorithm did not consider the original topological characteristics of the data. We also found that the clustering data is similar to the social network nodes mentioned in DeepWalk, which satisfied power-law distribution. In this study, we tried to consider the topological characteristics of the graph in the clustering algorithm. Based on previous studies, we propose a clustering algorithm that adds the topological characteristics of original data on the basis of the CFSFDP algorithm. Our experimental results show that the clustering algorithm with topological features significantly improves the clustering effect and proves that the addition of topological features is effective and feasible.

Download Full-text

Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories

Journal of Computational Chemistry ◽

10.1002/jcc.24664 ◽

2016 ◽

Vol 38 (3) ◽

pp. 152-160 ◽

Cited By ~ 13

Author(s):

Song Liu ◽

Lizhe Zhu ◽

Fu Kit Sheong ◽

Wei Wang ◽

Xuhui Huang

Keyword(s):

Molecular Dynamics ◽

Clustering Algorithm ◽

Local Density ◽

Density Peaks ◽

Adaptive Partitioning ◽

Density Based Clustering

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text