Fast clustering using adaptive density peak detection

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the “optimal” parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

Download Full-text

A Novel Local Density Hierarchical Clustering Algorithm Based on Reverse Nearest Neighbors

Mathematical Problems in Engineering ◽

10.1155/2019/2959017 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Yaohui Liu ◽

Dong Liu ◽

Fang Yu ◽

Zhengming Ma

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Real Data ◽

Nearest Neighbors ◽

Clustering Methods ◽

Density Peak ◽

Hierarchical Clustering Algorithm

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.

Download Full-text

Clustering by Detecting Density Peaks and Assigning Points by Similarity-First Search Based on Weighted K-Nearest Neighbors Graph

Complexity ◽

10.1155/2020/1731075 ◽

2020 ◽

Vol 2020 ◽

pp. 1-17

Author(s):

Qi Diao ◽

Yaping Dai ◽

Qichao An ◽

Weixing Li ◽

Xiaoxue Feng ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Local Density ◽

Search Algorithm ◽

Real Data ◽

Nearest Neighbors ◽

Adjusted Rand Index ◽

Clustering Methods ◽

K Nearest Neighbors ◽

Density Peaks

This paper presents an improved clustering algorithm for categorizing data with arbitrary shapes. Most of the conventional clustering approaches work only with round-shaped clusters. This task can be accomplished by quickly searching and finding clustering methods for density peaks (DPC), but in some cases, it is limited by density peaks and allocation strategy. To overcome these limitations, two improvements are proposed in this paper. To describe the clustering center more comprehensively, the definitions of local density and relative distance are fused with multiple distances, including K-nearest neighbors (KNN) and shared-nearest neighbors (SNN). A similarity-first search algorithm is designed to search the most matching cluster centers for noncenter points in a weighted KNN graph. Extensive comparison with several existing DPC methods, e.g., traditional DPC algorithm, density-based spatial clustering of applications with noise (DBSCAN), affinity propagation (AP), FKNN-DPC, and K-means methods, has been carried out. Experiments based on synthetic data and real data show that the proposed clustering algorithm can outperform DPC, DBSCAN, AP, and K-means in terms of the clustering accuracy (ACC), the adjusted mutual information (AMI), and the adjusted Rand index (ARI).

Download Full-text

Adaptive Unified Differential Evolution for Clustering

IJCCS (Indonesian Journal of Computing and Cybernetics Systems) ◽

10.22146/ijccs.27871 ◽

2018 ◽

Vol 12 (1) ◽

pp. 53

Author(s):

Maulida Ayu Fitriani ◽

Aina Musdholifah ◽

Sri Hartati

Keyword(s):

Differential Evolution ◽

Fitness Function ◽

Single Mutation ◽

Clustering Methods ◽

Mutation Strategy ◽

Clustering Quality ◽

Silhouette Index ◽

Mutation Strategies ◽

Time Required

Various clustering methods to obtain optimal information continues to evolve one of its development is Evolutionary Algorithm (EA). Adaptive Unified Differential Evolution (AuDE), is the development of Differential Evolution (DE) which is one of the EA techniques. AuDE has self adaptive scale factor control parameters (F) and crossover-rate (Cr).. It also has a single mutation strategy that represents the most commonly used standard mutation strategies from previous studies.The AuDE clustering method was tested using 4 datasets. Silhouette Index and CS Measure is a fitness function used as a measure of the quality of clustering results. The quality of the AuDE clustering results is then compared against the quality of clustering results using the DE method.The results show that the AuDE mutation strategy can expand the cluster central search produced by ED so that better clustering quality can be obtained. The comparison of the quality of AuDE and DE using Silhoutte Index is 1:0.816, whereas the use of CS Measure shows a comparison of 0.565:1. The execution time required AuDE shows better but Number significant results, aimed at the comparison of Silhoutte Index usage of 0.99:1 , Whereas on the use of CS Measure obtained the comparison of 0.184:1.

Download Full-text

An Adaptive Ellipse Distance Density Peak Fuzzy Clustering Algorithm Based on the Multi-target Traffic Radar

Sensors ◽

10.3390/s20174920 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4920

Author(s):

Lin Cao ◽

Xinyi Zhang ◽

Tao Wang ◽

Kangning Du ◽

Chong Fu

Keyword(s):

Euclidean Distance ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Search Algorithm ◽

Measurement Data ◽

Data Sets ◽

Density Peak ◽

Close Range ◽

Decision Graph ◽

Membership Matrix

In the multi-target traffic radar scene, the clustering accuracy between vehicles with close driving distance is relatively low. In response to this problem, this paper proposes a new clustering algorithm, namely an adaptive ellipse distance density peak fuzzy (AEDDPF) clustering algorithm. Firstly, the Euclidean distance is replaced by adaptive ellipse distance, which can more accurately describe the structure of data obtained by radar measurement vehicles. Secondly, the adaptive exponential function curve is introduced in the decision graph of the fast density peak search algorithm to accurately select the density peak point, and the initialization of the AEDDPF algorithm is completed. Finally, the membership matrix and the clustering center are calculated through successive iterations to obtain the clustering result.The time complexity of the AEDDPF algorithm is analyzed. Compared with the density-based spatial clustering of applications with noise (DBSCAN), k-means, fuzzy c-means (FCM), Gustafson-Kessel (GK), and adaptive Euclidean distance density peak fuzzy (Euclid-ADDPF) algorithms, the AEDDPF algorithm has higher clustering accuracy for real measurement data sets in certain scenarios. The experimental results also prove that the proposed algorithm has a better clustering effect in some close-range vehicle scene applications. The generalization ability of the proposed AEDDPF algorithm applied to other types of data is also analyzed.

Download Full-text

A Density Peak-Based Clustering Approach for Fault Diagnosis of Photovoltaic Arrays

International Journal of Photoenergy ◽

10.1155/2017/4903613 ◽

2017 ◽

Vol 2017 ◽

pp. 1-14 ◽

Cited By ~ 11

Author(s):

Peijie Lin ◽

Yaohai Lin ◽

Zhicong Chen ◽

Lijun Wu ◽

Lingchen Chen ◽

...

Keyword(s):

Fault Diagnosis ◽

Local Density ◽

Operating Conditions ◽

Cluster Center ◽

Pv System ◽

Density Peak ◽

Pv Systems ◽

Distribution Features ◽

Clustering Approach ◽

Distance Vector

Fault diagnosis of photovoltaic (PV) arrays plays a significant role in safe and reliable operation of PV systems. In this paper, the distribution of the PV systems’ daily operating data under different operating conditions is analyzed. The results show that the data distribution features significant nonspherical clustering, the cluster center has a relatively large distance from any points with a higher local density, and the cluster number cannot be predetermined. Based on these features, a density peak-based clustering approach is then proposed to automatically cluster the PV data. And then, a set of labeled data with various conditions are employed to compute the minimum distance vector between each cluster and the reference data. According to the distance vector, the clusters can be identified and categorized into various conditions and/or faults. Simulation results demonstrate the feasibility of the proposed method in the diagnosis of certain faults occurring in a PV array. Moreover, a 1.8 kW grid-connected PV system with6×3 PVarray is established and experimentally tested to investigate the performance of the developed method.

Download Full-text

Simultaneous clustering of multiview biomedical data using manifold optimization

Bioinformatics ◽

10.1093/bioinformatics/btz217 ◽

2019 ◽

Vol 35 (20) ◽

pp. 4029-4037 ◽

Cited By ~ 2

Author(s):

Yun Yu ◽

Lei-Hong Zhang ◽

Shuqin Zhang

Keyword(s):

Search Algorithm ◽

Real Data ◽

Supplementary Information ◽

Stiefel Manifold ◽

Biomedical Data ◽

Clustering Methods ◽

Multiple Cancer ◽

Cancer Subtypes ◽

Kegg Pathways ◽

Manifold Optimization

Abstract Motivation Multiview clustering has attracted much attention in recent years. Several models and algorithms have been proposed for finding the clusters. However, these methods are developed either to find the consistent/common clusters across different views, or to identify the differential clusters among different views. In reality, both consistent and differential clusters may exist in multiview datasets. Thus, development of simultaneous clustering methods such that both the consistent and the differential clusters can be identified is of great importance. Results In this paper, we proposed one method for simultaneous clustering of multiview data based on manifold optimization. The binary optimization model for finding the clusters is relaxed to a real value optimization problem on the Stiefel manifold, which is solved by the line-search algorithm on manifold. We applied the proposed method to both simulation data and four real datasets from TCGA. Both studies show that when the underlying clusters are consistent, our method performs competitive to the state-of-the-art algorithms. When there are differential clusters, our method performs much better. In the real data study, we performed experiments on cancer stratification and differential cluster (module) identification across multiple cancer subtypes. For the patients of different subtypes, both consistent clusters and differential clusters are identified at the same time. The proposed method identifies more clusters that are enriched by gene ontology and KEGG pathways. The differential clusters could be used to explain the different mechanisms for the cancer development in the patients of different subtypes. Availability and implementation Codes can be downloaded from: http://homepage.fudan.edu.cn/sqzhang/files/2018/12/MVCMOcode.zip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An Improved Integrated Clustering Learning Strategy Based on Three-Stage Affinity Propagation Algorithm with Density Peak Optimization Theory

Complexity ◽

10.1155/2021/6666619 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Limin Wang ◽

Wenjing Sun ◽

Xuming Han ◽

Zhiyuan Hao ◽

Ruihong Zhou ◽

...

Keyword(s):

Learning Strategy ◽

Local Density ◽

Optimization Theory ◽

Affinity Propagation ◽

Data Sets ◽

Density Peak ◽

Second Stage ◽

Propagation Algorithm ◽

Affinity Propagation Algorithm ◽

Clustering Center

To better reflect the precise clustering results of the data samples with different shapes and densities for affinity propagation clustering algorithm (AP), an improved integrated clustering learning strategy based on three-stage affinity propagation algorithm with density peak optimization theory (DPKT-AP) was proposed in this paper. DPKT-AP combined the ideology of integrated clustering with the AP algorithm, by introducing the density peak theory and k-means algorithm to carry on the three-stage clustering process. In the first stage, the clustering center point was selected by density peak clustering. Because the clustering center was surrounded by the nearest neighbor point with lower local density and had a relatively large distance from other points with higher density, it could help the k-means algorithm in the second stage avoiding the local optimal situation. In the second stage, the k-means algorithm was used to cluster the data samples to form several relatively small spherical subgroups, and each of subgroups had a local density maximum point, which is called the center point of the subgroup. In the third stage, DPKT-AP used the AP algorithm to merge and cluster the spherical subgroups. Experiments on UCI data sets and synthetic data sets showed that DPKT-AP improved the clustering performance and accuracy for the algorithm.

Download Full-text

Density-based Clustering using Automatic Density Peak Detection

Proceedings of the 7th International Conference on Pattern Recognition Applications and Methods ◽

10.5220/0006572300950102 ◽

2018 ◽

Cited By ~ 3

Author(s):

Huanqian Yan ◽

Yonggang Lu ◽

Heng Ma

Keyword(s):

Peak Detection ◽

Density Peak ◽

Density Based Clustering

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669.v1 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

Analysis of Wheat Samples Using the Calculation of Multifractal Spectrum

Computer Tools in Education ◽

10.32603/2071-2340-2021-1-5-20 ◽

2021 ◽

pp. 5-20

Author(s):

Ivan Murenin ◽

◽

Natalia Ampilova ◽

Keyword(s):

Random Forest ◽

Local Density ◽

Principal Component ◽

Feature Space ◽

Multifractal Spectrum ◽

Support Vector ◽

Clustering Methods ◽

Wheat Varieties ◽

Crystallization With Additives ◽

Classi Fication

The computational analysis of wheat images to identify wheat varieties and quality has wide applications in agriculture and production. This paper presents an approach to the analysis and classiﬁcation of images of wheat samples obtained by the method of crystallization with additives. In tests 3 concentration and 4 times for each concentration were used, such that each type of wheat was characterized by 12 images. We used the images obtained for 5 classes. All the images have similar visual characteristics, that makes it diﬃcult to use statistical methods of analysis. The multifractal spectrum obtained by calculating the local density function was used as a classifying feature. The classiﬁcation was performed on a set of 60 wheat images corresponding to 5 different samples (classes) by various machine learning methods such as linear regression, naive Bayesian classiﬁer, support vector machine, and random forest. In some cases, to reduce the dimension of the feature space the method of principal components was applied. To identify the relationships between wheat samples obtained at different concentrations, 3 different clustering methods were used. The classiﬁcation results showed that the multifractal spectrum as classifying sign and using the random forest method in combination with the principal component analysis allow identifying wheat samples obtained by crystallization with additives, being the highest average classi- ﬁcation accuracy is 74 %.

Download Full-text