A Novel Hierarchical Clustering Algorithm Based on Density Peaks for Complex Datasets

Clustering aims to differentiate objects from different groups (clusters) by similarities or distances between pairs of objects. Numerous clustering algorithms have been proposed to investigate what factors constitute a cluster and how to efficiently find them. The clustering by fast search and find of density peak algorithm is proposed to intuitively determine cluster centers and assign points to corresponding partitions for complex datasets. This method incorporates simple structure due to the noniterative logic and less few parameters; however, the guidelines for parameter selection and center determination are not explicit. To tackle these problems, we propose an improved hierarchical clustering method HCDP aiming to represent the complex structure of the dataset. A k-nearest neighbor strategy is integrated to compute the local density of each point, avoiding to select the nonnecessary global parameter dc and enables cluster smoothing and condensing. In addition, a new clustering evaluation approach is also introduced to extract a “flat” and “optimal” partition solution from the structure by adaptively computing the clustering stability. The proposed approach is conducted on some applications with complex datasets, where the results demonstrate that the novel method outperforms its counterparts to a large extent.

Download Full-text

A Novel Local Density Hierarchical Clustering Algorithm Based on Reverse Nearest Neighbors

Mathematical Problems in Engineering ◽

10.1155/2019/2959017 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Yaohui Liu ◽

Dong Liu ◽

Fang Yu ◽

Zhengming Ma

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Clustering Algorithms ◽

Real Data ◽

Nearest Neighbors ◽

Clustering Methods ◽

Density Peak ◽

Hierarchical Clustering Algorithm

Clustering is widely used in data analysis, and density-based methods are developed rapidly in the recent 10 years. Although the state-of-art density peak clustering algorithms are efficient and can detect arbitrary shape clusters, they are nonsphere type of centroid-based methods essentially. In this paper, a novel local density hierarchical clustering algorithm based on reverse nearest neighbors, RNN-LDH, is proposed. By constructing and using a reverse nearest neighbor graph, the extended core regions are found out as initial clusters. Then, a new local density metric is defined to calculate the density of each object; meanwhile, the density hierarchical relationships among the objects are built according to their densities and neighbor relations. Finally, each unclustered object is classified to one of the initial clusters or noise. Results of experiments on synthetic and real data sets show that RNN-LDH outperforms the current clustering methods based on density peak or reverse nearest neighbors.

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669.v1 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

<div>Clustering is an important unsupervised knowledge acquisition method, which divides the unlabeled data into different groups \cite{atilgan2021efficient,d2021automatic}. Different clustering algorithms make different assumptions on the cluster formation, thus, most clustering algorithms are able to well handle at least one particular type of data distribution but may not well handle the other types of distributions. For example, K-means identifies convex clusters well \cite{bai2017fast}, and DBSCAN is able to find clusters with similar densities \cite{DBSCAN}. </div><div>Therefore, most clustering methods may not work well on data distribution patterns that are different from the assumptions being made and on a mixture of different distribution patterns. Taking DBSCAN as an example, it is sensitive to the loosely connected points between dense natural clusters as illustrated in Figure~\ref{figconnect}. The density of the connected points shown in Figure~\ref{figconnect} is different from the natural clusters on both ends, however, DBSCAN with fixed global parameter values may wrongly assign these connected points and consider all the data points in Figure~\ref{figconnect} as one big cluster.</div>

Download Full-text

Density Peaks Clustering based on Nature Nearest Neighbor and Multi-cluster Mergers

10.21203/rs.3.rs-825405/v1 ◽

2021 ◽

Author(s):

Hui Ma ◽

Ruiqin Wang ◽

Shuai Yang

Keyword(s):

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Density Peaks ◽

Density Peaks Clustering ◽

Assignment Strategy ◽

Cutoff Distance ◽

Clustering Effect ◽

Cluster Mergers ◽

Selection Of

Abstract Clustering by fast search and find of Density Peaks (DPC) has the advantages of being simple, efficient, and capable of detecting arbitrary shapes, etc. However, there are still some shortcomings: 1) the cutoff distance is specified in advance, and the selection of local density formula will affect the final clustering effect; 2) after the cluster centers are found, the assignment strategy of the remaining points may produce “Domino effect”, that is, once a point is misallocated, more points may be misallocated subsequently. To overcome these shortcomings, we propose a density peaks clustering algorithm based on natural nearest neighbor and multi-cluster mergers. In this algorithm, a weighted local density calculation method is designed by the natural nearest neighbor, which avoids the selection of cutoff distance and the selection of the local density formula. This algorithm uses a new two-stage assignment strategy to assign the remaining points to the most suitable clusters, thus reducing assignment errors. The experiment was carried out on some artificial and real-world datasets. The experimental results show that the clustering effect of this algorithm is better than those other related algorithms.

Download Full-text

Clustering Mixed Data Based on Density Peaks and Stacked Denoising Autoencoders

Symmetry ◽

10.3390/sym11020163 ◽

2019 ◽

Vol 11 (2) ◽

pp. 163

Author(s):

Baobin Duan ◽

Lixin Han ◽

Zhinan Gou ◽

Yi Yang ◽

Shuangshuang Chen

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Feature Space ◽

Original Data ◽

Mixed Data ◽

Feature Representations ◽

Density Peaks ◽

Categorical Attributes ◽

Data Objects

With the universal existence of mixed data with numerical and categorical attributes in real world, a variety of clustering algorithms have been developed to discover the potential information hidden in mixed data. Most existing clustering algorithms often compute the distances or similarities between data objects based on original data, which may cause the instability of clustering results because of noise. In this paper, a clustering framework is proposed to explore the grouping structure of the mixed data. First, the transformed categorical attributes by one-hot encoding technique and normalized numerical attributes are input to a stacked denoising autoencoders to learn the internal feature representations. Secondly, based on these feature representations, all the distances between data objects in feature space can be calculated and the local density and relative distance of each data object can be also computed. Thirdly, the density peaks clustering algorithm is improved and employed to allocate all the data objects into different clusters. Finally, experiments conducted on some UCI datasets have demonstrated that our proposed algorithm for clustering mixed data outperforms three baseline algorithms in terms of the clustering accuracy and the rand index.

Download Full-text

Fast Searching Density Peak Clustering Algorithm Based on Shared Nearest Neighbor and Adaptive Clustering Center

Symmetry ◽

10.3390/sym12122014 ◽

2020 ◽

Vol 12 (12) ◽

pp. 2014

Author(s):

Yi Lv ◽

Mandan Liu ◽

Yue Xiang

Keyword(s):

Prior Knowledge ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Local Density ◽

Density Peak ◽

Adaptive Clustering ◽

Clustering Center ◽

Density Peak Clustering ◽

Shared Nearest Neighbor ◽

Fast Searching

The clustering analysis algorithm is used to reveal the internal relationships among the data without prior knowledge and to further gather some data with common attributes into a group. In order to solve the problem that the existing algorithms always need prior knowledge, we proposed a fast searching density peak clustering algorithm based on the shared nearest neighbor and adaptive clustering center (DPC-SNNACC) algorithm. It can automatically ascertain the number of knee points in the decision graph according to the characteristics of different datasets, and further determine the number of clustering centers without human intervention. First, an improved calculation method of local density based on the symmetric distance matrix was proposed. Then, the position of knee point was obtained by calculating the change in the difference between decision values. Finally, the experimental and comparative evaluation of several datasets from diverse domains established the viability of the DPC-SNNACC algorithm.

Download Full-text

Accelerating Density Peak Clustering Algorithm

Symmetry ◽

10.3390/sym11070859 ◽

2019 ◽

Vol 11 (7) ◽

pp. 859 ◽

Cited By ~ 1

Author(s):

Lin

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Early Stage ◽

Separation Distance ◽

Density Peak ◽

Density Peaks ◽

Density Based Clustering ◽

Data Points ◽

Data Point ◽

Density Peak Clustering

The Density Peak Clustering (DPC) algorithm is a new density-based clustering method. It spends most of its execution time on calculating the local density and the separation distance for each data point in a dataset. The purpose of this study is to accelerate its computation. On average, the DPC algorithm scans half of the dataset to calculate the separation distance of each data point. We propose an approach to calculate the separation distance of a data point by scanning only the neighbors of the data point. Additionally, the purpose of the separation distance is to assist in choosing the density peaks, which are the data points with both high local density and high separation distance. We propose an approach to identify non-peak data points at an early stage to avoid calculating their separation distances. Our experimental results show that most of the data points in a dataset can benefit from the proposed approaches to accelerate the DPC algorithm.

Download Full-text

VDPC: Variational Density Peak Clustering Algorithm

10.36227/techrxiv.17597669 ◽

2021 ◽

Author(s):

Yizhang Wang ◽

Di Wang ◽

You Zhou ◽

Chai Quek ◽

Xiaofeng Zhang

Keyword(s):

Clustering Algorithm ◽

Cluster Formation ◽

Clustering Algorithms ◽

Data Distribution ◽

Distribution Patterns ◽

Clustering Methods ◽

Density Peak ◽

Global Parameter ◽

Density Peak Clustering ◽

Parameter Values

Download Full-text

NBC: An Efficient Hierarchical Clustering Algorithm for Large Datasets

International Journal of Semantic Computing ◽

10.1142/s1793351x15400085 ◽

2015 ◽

Vol 09 (03) ◽

pp. 307-331 ◽

Cited By ~ 1

Author(s):

Wei Zhang ◽

Gongxuan Zhang ◽

Yongli Wang ◽

Zhaomeng Zhu ◽

Tao Li

Keyword(s):

Hierarchical Clustering ◽

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Large Datasets ◽

Nearest Neighbor Search ◽

Large Dataset ◽

Neighbor Search ◽

Hierarchical Clustering Algorithm

Nearest neighbor search is a key technique used in hierarchical clustering and its computing complexity decides the performance of the hierarchical clustering algorithm. The time complexity of standard agglomerative hierarchical clustering is O(n3), while the time complexity of more advanced hierarchical clustering algorithms (such as nearest neighbor chain, SLINK and CLINK) is O(n2). This paper presents a new nearest neighbor search method called nearest neighbor boundary (NNB), which first divides a large dataset into independent subset and then finds nearest neighbor of each point in subset. When NNB is used, the time complexity of hierarchical clustering can be reduced to O(n log 2n). Based on NNB, we propose a fast hierarchical clustering algorithm called nearest-neighbor boundary clustering (NBC), and the proposed algorithm can be adapted to the parallel and distributed computing framework. The experimental results demonstrate that our algorithm is practical for large datasets.

Download Full-text

Density Peak Clustering Algorithm Considering Topological Features

Electronics ◽

10.3390/electronics9030459 ◽

2020 ◽

Vol 9 (3) ◽

pp. 459

Author(s):

Shuyi Lu ◽

Yuanjie Zheng ◽

Rong Luo ◽

Weikuan Jia ◽

Jian Lian ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Original Data ◽

Power Law Distribution ◽

Density Peak ◽

Topological Features ◽

Density Peaks ◽

Topological Characteristics ◽

Density Peak Clustering ◽

Clustering Data

The clustering algorithm plays an important role in data mining and image processing. The breakthrough of algorithm precision and method directly affects the direction and progress of the following research. At present, types of clustering algorithms are mainly divided into hierarchical, density-based, grid-based and model-based ones. This paper mainly studies the Clustering by Fast Search and Find of Density Peaks (CFSFDP) algorithm, which is a new clustering method based on density. The algorithm has the characteristics of no iterative process, few parameters and high precision. However, we found that the clustering algorithm did not consider the original topological characteristics of the data. We also found that the clustering data is similar to the social network nodes mentioned in DeepWalk, which satisfied power-law distribution. In this study, we tried to consider the topological characteristics of the graph in the clustering algorithm. Based on previous studies, we propose a clustering algorithm that adds the topological characteristics of original data on the basis of the CFSFDP algorithm. Our experimental results show that the clustering algorithm with topological features significantly improves the clustering effect and proves that the addition of topological features is effective and feasible.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text