An Empirical Analysis of Similarity Matrix for Spectral Clustering

2013 ◽  
Vol 433-435 ◽  
pp. 725-730
Author(s):  
Sheng Zhang ◽  
Xiao Qi He ◽  
Yang Guang Liu ◽  
Qi Chun Huang

Constructing the similarity matrix is the key step for spectral clustering, and its goal is to model the local neighborhood relationships between the data points. In order to evaluate the influence of similarity matrix on performance of the different spectral clustering algorithms and find the rules on how to construct an appropriate similarity matrix, a system empirical study was carried out. In the study, six recently proposed spectral clustering algorithms were selected as evaluation object, and normalized mutual information, F-measures and Rand Index were used as evaluation metrics. Then experiments were carried out on eight synthetic datasets and eleven real word datasets respectively. The experimental results show that with multiple metrics the results are more comprehensive and confident, and the comprehensive performance of locality spectral clustering algorithm is better than other five algorithms on synthetic datasets and real word datasets.

Author(s):  
Subhanshu Goyal ◽  
Sushil Kumar ◽  
M. A. Zaveri ◽  
A. K. Shukla

In recent times, graph based spectral clustering algorithms have received immense attention in many areas like, data mining, object recognition, image analysis and processing. The commonly used similarity measure in the clustering algorithms is the Gaussian kernel function which uses sensitive scaling parameter and when applied to the segmentation of noise contaminated images leads to unsatisfactory performance because of neglecting the spatial pixel information. The present work introduces a novel framework for spectral clustering which embodied local spatial information and fuzzy based similarity measure to tackle the above mentioned issues. In our approach, firstly we filter the noise components from original image by using the spatial and gray–level information. The similarity matrix is then constructed by employing a similarity measure which takes into account the fuzzy c-partition matrix and vectors of the cluster centers obtained by fuzzy c-means clustering algorithm. In the last step, spectral clustering technique is realized on derived similarity matrix to obtain the desired segmentation result. Experimental results on segmentation of synthetic and Berkeley benchmark images with noise demonstrates the effectiveness and robustness of the proposed method, giving it an edge over the clustering based segmentation method reported in the literature.


2013 ◽  
Vol 765-767 ◽  
pp. 580-584
Author(s):  
Yu Yang ◽  
Cheng Gui Zhao

Spectral clustering algorithms inevitable exist computational time and memory use problems for large-scale spectral clustering, owing to compute-intensive and data-intensive. We analyse the time complexity of constructing similarity matrix, doing eigendecomposition and performing k-means and exploiting SPMD parallel structure supported by MATLAB Parallel Computing Toolbox (PCT) to decrease eigendecomposition computational time. We propose using MATLAB Distributed Computing Server to parallel construct similarity matrix, whilst using t-nearest neighbors approach to reduce memory use. Ultimately, we present clustering time, clustering quality and clustering accuracy in the experiments.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


2014 ◽  
Vol 687-691 ◽  
pp. 1350-1353
Author(s):  
Li Li Fu ◽  
Yong Li Liu ◽  
Li Jing Hao

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.


2011 ◽  
Vol 268-270 ◽  
pp. 811-816
Author(s):  
Yong Zhou ◽  
Yan Xing

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.


Author(s):  
Juanjuan Luo ◽  
Huadong Ma ◽  
Dongqing Zhou

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Shuxia Ren ◽  
Shubo Zhang ◽  
Tao Wu

The similarity graphs of most spectral clustering algorithms carry lots of wrong community information. In this paper, we propose a probability matrix and a novel improved spectral clustering algorithm based on the probability matrix for community detection. First, the Markov chain is used to calculate the transition probability between nodes, and the probability matrix is constructed by the transition probability. Then, the similarity graph is constructed with the mean probability matrix. Finally, community detection is achieved by optimizing the NCut objective function. The proposed algorithm is compared with SC, WT, FG, FluidC, and SCRW on artificial networks and real networks. Experimental results show that the proposed algorithm can detect communities more accurately and has better clustering performance.


2016 ◽  
Vol 10 (04) ◽  
pp. 527-555
Author(s):  
Lubomir Stanchev

In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains [Formula: see text] newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.


Author(s):  
Hui Du ◽  
Yuping Wang ◽  
Xiaopan Dong

Clustering is a popular and effective method for image segmentation. However, existing cluster methods often suffer the following problems: (1) Need a huge space and a lot of computation when the input data are large. (2) Need to assign some parameters (e.g. number of clusters) in advance which will affect the clustering results greatly. To save the space and computation, reduce the sensitivity of the parameters, and improve the effectiveness and efficiency of the clustering algorithms, we construct a new clustering algorithm for image segmentation. The new algorithm consists of two phases: coarsening clustering and exact clustering. First, we use Affinity Propagation (AP) algorithm for coarsening. Specifically, in order to save the space and computational cost, we only compute the similarity between each point and its t nearest neighbors, and get a condensed similarity matrix (with only t columns, where t << N and N is the number of data points). Second, to further improve the efficiency and effectiveness of the proposed algorithm, the Self-tuning Spectral Clustering (SSC) is used to the resulted points (the representative points gotten in the first phase) to do the exact clustering. As a result, the proposed algorithm can quickly and precisely realize the clustering for texture image segmentation. The experimental results show that the proposed algorithm is more efficient than the compared algorithms FCM, K-means and SOM.


Sign in / Sign up

Export Citation Format

Share Document