Density Peaks Clustering based on Nature Nearest Neighbor and Multi-cluster Mergers

Abstract Clustering by fast search and find of Density Peaks (DPC) has the advantages of being simple, efficient, and capable of detecting arbitrary shapes, etc. However, there are still some shortcomings: 1) the cutoff distance is specified in advance, and the selection of local density formula will affect the final clustering effect; 2) after the cluster centers are found, the assignment strategy of the remaining points may produce “Domino effect”, that is, once a point is misallocated, more points may be misallocated subsequently. To overcome these shortcomings, we propose a density peaks clustering algorithm based on natural nearest neighbor and multi-cluster mergers. In this algorithm, a weighted local density calculation method is designed by the natural nearest neighbor, which avoids the selection of cutoff distance and the selection of the local density formula. This algorithm uses a new two-stage assignment strategy to assign the remaining points to the most suitable clusters, thus reducing assignment errors. The experiment was carried out on some artificial and real-world datasets. The experimental results show that the clustering effect of this algorithm is better than those other related algorithms.

Download Full-text

A Novel Density Peaks Clustering Algorithm Based on K Nearest Neighbors With Adaptive Merging Strategy

10.21203/rs.3.rs-95747/v1 ◽

2020 ◽

Author(s):

Xiaoning Yuan ◽

Hang Yu ◽

Jun Liang ◽

Bing Xu

Keyword(s):

Real World ◽

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Density Peaks Clustering ◽

Cutoff Distance ◽

Real World Datasets ◽

Merging Strategy ◽

Selection Of

Abstract Recently the density peaks clustering algorithm (dubbed as DPC) attracts lots of attention. The DPC is able to quickly find cluster centers and complete clustering tasks. And the DPC is suitable for many clustering tasks. However, the cutoff distance 𝑑𝑑𝑐𝑐 is depends on human experience which will greatly affect the clustering results. In addition, the selection of cluster centers requires manual participation which will affect the clustering efficiency. In order to solve these problem, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (dubbed as KNN-ADPC). We propose a clusters merging strategy to automatically aggregate the over-segmented clusters. Additionally, the K nearest neighbors is adopted to divide points more reasonably. The KNN-ADPC only has one parameter and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove the higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC and DPC-KNN.

Download Full-text

A novel density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-021-01369-7 ◽

2021 ◽

Author(s):

Xiaoning Yuan ◽

Hang Yu ◽

Jun Liang ◽

Bing Xu

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Density Peaks ◽

Data Points ◽

Density Peaks Clustering ◽

Cutoff Distance ◽

Real World Datasets ◽

Merging Strategy ◽

Selection Of

AbstractRecently the density peaks clustering algorithm (DPC) has received a lot of attention from researchers. The DPC algorithm is able to find cluster centers and complete clustering tasks quickly. It is also suitable for different kinds of clustering tasks. However, deciding the cutoff distance $${d}_{c}$$ d c largely depends on human experience which greatly affects clustering results. In addition, the selection of cluster centers requires manual participation which affects the efficiency of the algorithm. In order to solve these problems, we propose a density peaks clustering algorithm based on K nearest neighbors with adaptive merging strategy (KNN-ADPC). A clusters merging strategy is proposed to automatically aggregate over-segmented clusters. Additionally, the K nearest neighbors are adopted to divide data points more reasonably. There is only one parameter in KNN-ADPC algorithm, and the clustering task can be conducted automatically without human involvement. The experiment results on artificial and real-world datasets prove higher accuracy of KNN-ADPC compared with DBSCAN, K-means++, DPC, and DPC-KNN.

Download Full-text

Density Peaks Clustering Based on Weighted Local Density Sequence and Nearest Neighbor Assignment

IEEE Access ◽

10.1109/access.2019.2904254 ◽

2019 ◽

Vol 7 ◽

pp. 34301-34317 ◽

Cited By ~ 6

Author(s):

Donghua Yu ◽

Guojun Liu ◽

Maozu Guo ◽

Xiaoyan Liu ◽

Shuang Yao

Keyword(s):

Nearest Neighbor ◽

Local Density ◽

Density Peaks ◽

Density Peaks Clustering

Download Full-text

Improved Density Peaks Clustering Based on Natural Neighbor Expanded Group

Complexity ◽

10.1155/2020/8864239 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Lin Ding ◽

Weihong Xu ◽

Yuantao Chen

Keyword(s):

State Of The Art ◽

Local Density ◽

Clustering Technique ◽

Expanded Group ◽

Density Peaks ◽

Natural Neighbor ◽

Density Peaks Clustering ◽

Cutoff Distance ◽

Public Datasets ◽

Distance Parameter

Density peaks clustering (DPC) is an advanced clustering technique due to its multiple advantages of efficiently determining cluster centers, fewer arguments, no iterations, no border noise, etc. However, it does suffer from the following defects: (1) difficult to determine a suitable value of its crucial cutoff distance parameter, (2) the local density metric is too simple to find out the proper center(s) of the sparse cluster(s), and (3) it is not robust that parts of prominent density peaks are remotely assigned. This paper proposes improved density peaks clustering based on natural neighbor expanded group (DPC-NNEG). The cores of the proposed algorithm contain two parts: (1) define natural neighbor expanded (NNE) and natural neighbor expanded group (NNEG) and (2) divide all NNEGs into a goal number of sets as the final clustering result, according to the closeness degree of NNEGs. At the same time, the paper provides the measurement of the closeness degree. We compared the state of the art with our proposal in public datasets, including several complex and real datasets. Experiments show the effectiveness and robustness of the proposed algorithm.

Download Full-text

A Novel Hierarchical Clustering Algorithm Based on Density Peaks for Complex Datasets

Complexity ◽

10.1155/2018/2032461 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 5

Author(s):

Rong Zhou ◽

Yong Zhang ◽

Shengzhong Feng ◽

Nurbol Luktarhan

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Complex Structure ◽

Density Peak ◽

Global Parameter ◽

Density Peaks ◽

Complex Datasets

Clustering aims to differentiate objects from different groups (clusters) by similarities or distances between pairs of objects. Numerous clustering algorithms have been proposed to investigate what factors constitute a cluster and how to efficiently find them. The clustering by fast search and find of density peak algorithm is proposed to intuitively determine cluster centers and assign points to corresponding partitions for complex datasets. This method incorporates simple structure due to the noniterative logic and less few parameters; however, the guidelines for parameter selection and center determination are not explicit. To tackle these problems, we propose an improved hierarchical clustering method HCDP aiming to represent the complex structure of the dataset. A k-nearest neighbor strategy is integrated to compute the local density of each point, avoiding to select the nonnecessary global parameter dc and enables cluster smoothing and condensing. In addition, a new clustering evaluation approach is also introduced to extract a “flat” and “optimal” partition solution from the structure by adaptively computing the clustering stability. The proposed approach is conducted on some applications with complex datasets, where the results demonstrate that the novel method outperforms its counterparts to a large extent.

Download Full-text

Density Peaks Clustering by Zero-Pointed Samples of Regional Group Borders

Computational Intelligence and Neuroscience ◽

10.1155/2020/8891778 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Lin Ding ◽

Weihong Xu ◽

Yuantao Chen

Keyword(s):

Clustering Algorithm ◽

State Of The Art ◽

Selection Method ◽

Lower Number ◽

Selection Strategy ◽

Density Peaks ◽

Regional Group ◽

Density Peaks Clustering ◽

Cutoff Distance ◽

Public Datasets

Density peaks clustering algorithm (DPC) has attracted the attention of many scholars because of its multiple advantages, including efficiently determining cluster centers, a lower number of parameters, no iterations, and no border noise. However, DPC does not provide a reliable and specific selection method of threshold (cutoff distance) and an automatic selection strategy of cluster centers. In this paper, we propose density peaks clustering by zero-pointed samples (DPC-ZPSs) of regional group borders. DPC-ZPS finds the subclusters and the cluster borders by zero-pointed samples (ZPSs). And then, subclusters are merged into individuals by comparing the density of edge samples. By iteration of the merger, the suitable dc and cluster centers are ensured. Finally, we compared state-of-the-art methods with our proposal in public datasets. Experiments show that our algorithm automatically determines cutoff distance and centers accurately.

Download Full-text

Automatic Density Peaks Clustering Using DNA Genetic Algorithm Optimized Data Field and Gaussian Process

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500239 ◽

2017 ◽

Vol 31 (08) ◽

pp. 1750023 ◽

Cited By ~ 11

Author(s):

Wenke Zang ◽

Liyan Ren ◽

Wenqian Zhang ◽

Xiyu Liu

Keyword(s):

Genetic Algorithm ◽

Gaussian Process ◽

Real World ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Set ◽

Density Peaks ◽

Data Field ◽

Density Peaks Clustering ◽

Cutoff Distance

Clustering by fast search and finding of Density Peaks (called as DPC) introduced by Alex Rodríguez and Alessandro Laio attracted much attention in the field of pattern recognition and artificial intelligence. However, DPC still has a lot of defects that are not resolved. Firstly, the local density [Formula: see text] of point [Formula: see text] is affected by the cutoff distance [Formula: see text], which can influence the clustering result, especially for small real-world cases. Secondly, the number of clusters is still found intuitively by using the decision diagram to select the cluster centers. In order to overcome these defects, this paper proposes an automatic density peaks clustering approach using DNA genetic algorithm optimized data field and Gaussian process (referred to as ADPC-DNAGA). ADPC-DNAGA can extract the optimal value of threshold with the potential entropy of data field and automatically determine the cluster centers by Gaussian method. For any data set to be clustered, the threshold can be calculated from the data set objectively rather than the empirical estimation. The proposed clustering algorithm is benchmarked on publicly available synthetic and real-world datasets which are commonly used for testing the performance of clustering algorithms. The clustering results are compared not only with that of DPC but also with that of several well-known clustering algorithms such as Affinity Propagation, DBSCAN and Spectral Cluster. The experimental results demonstrate that our proposed clustering algorithm can find the optimal cutoff distance [Formula: see text], to automatically identify clusters, regardless of their shape and dimension of the embedded space, and can often outperform the comparisons.

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text