Fast clustering using adaptive density peak detection

2015 ◽  
Vol 26 (6) ◽  
pp. 2800-2811 ◽  
Author(s):  
Xiao-Feng Wang ◽  
Yifan Xu

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the “optimal” parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Yaohui Liu ◽  
Dong Liu ◽  
Fang Yu ◽  
Zhengming Ma

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Qi Diao ◽  
Yaping Dai ◽  
Qichao An ◽  
Weixing Li ◽  
Xiaoxue Feng ◽  
...  

This paper presents an improved clustering algorithm for categorizing data with arbitrary shapes. Most of the conventional clustering approaches work only with round-shaped clusters. This task can be accomplished by quickly searching and finding clustering methods for density peaks (DPC), but in some cases, it is limited by density peaks and allocation strategy. To overcome these limitations, two improvements are proposed in this paper. To describe the clustering center more comprehensively, the definitions of local density and relative distance are fused with multiple distances, including K-nearest neighbors (KNN) and shared-nearest neighbors (SNN). A similarity-first search algorithm is designed to search the most matching cluster centers for noncenter points in a weighted KNN graph. Extensive comparison with several existing DPC methods, e.g., traditional DPC algorithm, density-based spatial clustering of applications with noise (DBSCAN), affinity propagation (AP), FKNN-DPC, and K-means methods, has been carried out. Experiments based on synthetic data and real data show that the proposed clustering algorithm can outperform DPC, DBSCAN, AP, and K-means in terms of the clustering accuracy (ACC), the adjusted mutual information (AMI), and the adjusted Rand index (ARI).


Author(s):  
Maulida Ayu Fitriani ◽  
Aina Musdholifah ◽  
Sri Hartati

Various clustering methods to obtain optimal information continues to evolve one of its development is Evolutionary Algorithm (EA). Adaptive Unified Differential Evolution (AuDE), is the development of Differential Evolution (DE) which is one of the EA techniques. AuDE has self adaptive scale factor control parameters (F) and crossover-rate (Cr).. It also has a single mutation strategy that represents the most commonly used standard mutation strategies from previous studies.The AuDE clustering method was tested using 4 datasets. Silhouette Index and CS Measure is a fitness function used as a measure of the quality of clustering results. The quality of the AuDE clustering results is then compared against the quality of clustering results using the DE method.The results show that the AuDE mutation strategy can expand the cluster central search produced by ED so that better clustering quality can be obtained. The comparison of the quality of AuDE and DE using Silhoutte Index is 1:0.816, whereas the use of CS Measure shows a comparison of 0.565:1. The execution time required AuDE shows better but Number significant results, aimed at the comparison of Silhoutte Index usage of 0.99:1 , Whereas on the use of CS Measure obtained the comparison of 0.184:1.


Sensors ◽  
2020 ◽  
Vol 20 (17) ◽  
pp. 4920
Author(s):  
Lin Cao ◽  
Xinyi Zhang ◽  
Tao Wang ◽  
Kangning Du ◽  
Chong Fu

In the multi-target traffic radar scene, the clustering accuracy between vehicles with close driving distance is relatively low. In response to this problem, this paper proposes a new clustering algorithm, namely an adaptive ellipse distance density peak fuzzy (AEDDPF) clustering algorithm. Firstly, the Euclidean distance is replaced by adaptive ellipse distance, which can more accurately describe the structure of data obtained by radar measurement vehicles. Secondly, the adaptive exponential function curve is introduced in the decision graph of the fast density peak search algorithm to accurately select the density peak point, and the initialization of the AEDDPF algorithm is completed. Finally, the membership matrix and the clustering center are calculated through successive iterations to obtain the clustering result.The time complexity of the AEDDPF algorithm is analyzed. Compared with the density-based spatial clustering of applications with noise (DBSCAN), k-means, fuzzy c-means (FCM), Gustafson-Kessel (GK), and adaptive Euclidean distance density peak fuzzy (Euclid-ADDPF) algorithms, the AEDDPF algorithm has higher clustering accuracy for real measurement data sets in certain scenarios. The experimental results also prove that the proposed algorithm has a better clustering effect in some close-range vehicle scene applications. The generalization ability of the proposed AEDDPF algorithm applied to other types of data is also analyzed.


2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Peijie Lin ◽  
Yaohai Lin ◽  
Zhicong Chen ◽  
Lijun Wu ◽  
Lingchen Chen ◽  
...  

Fault diagnosis of photovoltaic (PV) arrays plays a significant role in safe and reliable operation of PV systems. In this paper, the distribution of the PV systems’ daily operating data under different operating conditions is analyzed. The results show that the data distribution features significant nonspherical clustering, the cluster center has a relatively large distance from any points with a higher local density, and the cluster number cannot be predetermined. Based on these features, a density peak-based clustering approach is then proposed to automatically cluster the PV data. And then, a set of labeled data with various conditions are employed to compute the minimum distance vector between each cluster and the reference data. According to the distance vector, the clusters can be identified and categorized into various conditions and/or faults. Simulation results demonstrate the feasibility of the proposed method in the diagnosis of certain faults occurring in a PV array. Moreover, a 1.8 kW grid-connected PV system with6×3 PVarray is established and experimentally tested to investigate the performance of the developed method.


2019 ◽  
Vol 35 (20) ◽  
pp. 4029-4037 ◽  
Author(s):  
Yun Yu ◽  
Lei-Hong Zhang ◽  
Shuqin Zhang

Abstract Motivation Multiview clustering has attracted much attention in recent years. Several models and algorithms have been proposed for finding the clusters. However, these methods are developed either to find the consistent/common clusters across different views, or to identify the differential clusters among different views. In reality, both consistent and differential clusters may exist in multiview datasets. Thus, development of simultaneous clustering methods such that both the consistent and the differential clusters can be identified is of great importance. Results In this paper, we proposed one method for simultaneous clustering of multiview data based on manifold optimization. The binary optimization model for finding the clusters is relaxed to a real value optimization problem on the Stiefel manifold, which is solved by the line-search algorithm on manifold. We applied the proposed method to both simulation data and four real datasets from TCGA. Both studies show that when the underlying clusters are consistent, our method performs competitive to the state-of-the-art algorithms. When there are differential clusters, our method performs much better. In the real data study, we performed experiments on cancer stratification and differential cluster (module) identification across multiple cancer subtypes. For the patients of different subtypes, both consistent clusters and differential clusters are identified at the same time. The proposed method identifies more clusters that are enriched by gene ontology and KEGG pathways. The differential clusters could be used to explain the different mechanisms for the cancer development in the patients of different subtypes. Availability and implementation Codes can be downloaded from: http://homepage.fudan.edu.cn/sqzhang/files/2018/12/MVCMOcode.zip. Supplementary information Supplementary data are available at Bioinformatics online.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Limin Wang ◽  
Wenjing Sun ◽  
Xuming Han ◽  
Zhiyuan Hao ◽  
Ruihong Zhou ◽  
...  

To better reflect the precise clustering results of the data samples with different shapes and densities for affinity propagation clustering algorithm (AP), an improved integrated clustering learning strategy based on three-stage affinity propagation algorithm with density peak optimization theory (DPKT-AP) was proposed in this paper. DPKT-AP combined the ideology of integrated clustering with the AP algorithm, by introducing the density peak theory and k-means algorithm to carry on the three-stage clustering process. In the first stage, the clustering center point was selected by density peak clustering. Because the clustering center was surrounded by the nearest neighbor point with lower local density and had a relatively large distance from other points with higher density, it could help the k-means algorithm in the second stage avoiding the local optimal situation. In the second stage, the k-means algorithm was used to cluster the data samples to form several relatively small spherical subgroups, and each of subgroups had a local density maximum point, which is called the center point of the subgroup. In the third stage, DPKT-AP used the AP algorithm to merge and cluster the spherical subgroups. Experiments on UCI data sets and synthetic data sets showed that DPKT-AP improved the clustering performance and accuracy for the algorithm.


2021 ◽  
Author(s):  
Yizhang Wang ◽  
Di Wang ◽  
You Zhou ◽  
Chai Quek ◽  
Xiaofeng Zhang

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>


2021 ◽  
pp. 5-20
Author(s):  
Ivan Murenin ◽  
◽  
Natalia Ampilova ◽  

The computational analysis of wheat images to identify wheat varieties and quality has wide applications in agriculture and production. This paper presents an approach to the analysis and classification of images of wheat samples obtained by the method of crystallization with additives. In tests 3 concentration and 4 times for each concentration were used, such that each type of wheat was characterized by 12 images. We used the images obtained for 5 classes. All the images have similar visual characteristics, that makes it difficult to use statistical methods of analysis. The multifractal spectrum obtained by calculating the local density function was used as a classifying feature. The classification was performed on a set of 60 wheat images corresponding to 5 different samples (classes) by various machine learning methods such as linear regression, naive Bayesian classifier, support vector machine, and random forest. In some cases, to reduce the dimension of the feature space the method of principal components was applied. To identify the relationships between wheat samples obtained at different concentrations, 3 different clustering methods were used. The classification results showed that the multifractal spectrum as classifying sign and using the random forest method in combination with the principal component analysis allow identifying wheat samples obtained by crystallization with additives, being the highest average classi- fication accuracy is 74 %.


Sign in / Sign up

Export Citation Format

Share Document