Automatic Scale Parameters in Affinity Matrix Construction for Improved Spectral Clustering

Spectral clustering partitions data into similar groups in the eigenspace of the affinity matrix. The accuracy of the spectral clustering algorithm is affected by the affine equivariance realized in the translation of distance to similarity relationship. The similarity value computed as a Gaussian of the distance between data objects is sensitive to the scale factor [Formula: see text]. The value of [Formula: see text], a control parameter of drop in affinity value, is generally a fixed constant or determined by manual tuning. In this research work, [Formula: see text] is determined automatically from the distance values i.e. the similarity relationship that exists in the real data space. The affinity value of a data pair is determined as a location estimate of the spread of distance values of the data points with the other points. The scale factor [Formula: see text] corresponding to a data point [Formula: see text] is computed as the trimean of its distance vector and used in fixing the scale to compute the affinity matrix. Our proposed automatic scale parameter for spectral clustering resulted in a robust similarity matrix which is affine equivariant with the distance distribution and also eliminates the overhead of manual tuning to find the best [Formula: see text] value. The performance of spectral clustering using such affinity matrices was analyzed using UCI data sets and image databases. The obtained scores for NMI, ARI, Purity and F-score were observed to be equivalent to those of existing works and better for most of the data sets. The proposed scale factor was used in various state-of-the-art spectral clustering algorithms and it proves to perform well irrespective of the normalization operations applied in the algorithms. A comparison of clustering error rates obtained for various data sets across the algorithms shows that the proposed automatic scale factor is successful in clustering the data sets equivalent to that obtained using manually tuned best [Formula: see text] value. Thus the automatic scale factor proposed in this research work eliminates the need for exhaustive grid search for the best scale parameter that results in best clustering performance.

Download Full-text

Privacy-preserving constrained spectral clustering algorithm for large-scale data sets

IET Information Security ◽

10.1049/iet-ifs.2019.0255 ◽

2020 ◽

Vol 14 (3) ◽

pp. 321-331 ◽

Cited By ~ 1

Author(s):

Ji Li ◽

Jianghong Wei ◽

Mao Ye ◽

Wenfen Liu ◽

Xuexian Hu

Keyword(s):

Spectral Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Privacy Preserving ◽

Data Sets ◽

Large Scale Data ◽

Spectral Clustering Algorithm ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Spectral Clustering Algorithm Based on Improved Gaussian Kernel Function and Beetle Antennae Search with Damping Factor

Computational Intelligence and Neuroscience ◽

10.1155/2020/1648573 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Zhe Zhang ◽

Xiyu Liu ◽

Lin Wang

Keyword(s):

Kernel Function ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Gaussian Kernel ◽

Damping Factor ◽

Data Sets ◽

Similarity Matrix ◽

Scale Parameters ◽

Gaussian Kernel Function ◽

Spectral Clustering Algorithm

There are two problems in the traditional spectral clustering algorithm. Firstly, when it uses Gaussian kernel function to construct the similarity matrix, different scale parameters in Gaussian kernel function will lead to different results of the algorithm. Secondly, K-means algorithm is often used in the clustering stage of the spectral clustering algorithm. It needs to initialize the cluster center randomly, which will result in the instability of the results. In this paper, an improved spectral clustering algorithm is proposed to solve these two problems. In constructing a similarity matrix, we proposed an improved Gaussian kernel function, which is based on the distance information of some nearest neighbors and can adaptively select scale parameters. In the clustering stage, beetle antennae search algorithm with damping factor is proposed to complete the clustering to overcome the problem of instability of the clustering results. In the experiment, we use four artificial data sets and seven UCI data sets to verify the performance of our algorithm. In addition, four images in BSDS500 image data sets are segmented in this paper, and the results show that our algorithm is better than other comparison algorithms in image segmentation.

Download Full-text

ALLEVIATING THE SPARSITY PROBLEM OF COLLABORATIVE FILTERING USING AN EFFICIENT ITERATIVE CLUSTERED PREDICTION TECHNIQUE

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622012500022 ◽

2012 ◽

Vol 11 (01) ◽

pp. 33-53 ◽

Cited By ~ 4

Author(s):

AMIRA ABDELWAHAB ◽

HIROO SEKIYA ◽

IKUO MATSUBA ◽

YASUO HORIUCHI ◽

SHINGO KUROIWA

Keyword(s):

Collaborative Filtering ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Sparse Matrix ◽

Hybrid Approach ◽

Singular Value ◽

Data Sets ◽

Prediction Technique ◽

Value Decomposition ◽

Spectral Clustering Algorithm

Collaborative filtering (CF) is one of the most prevalent recommendation techniques, providing personalized recommendations to users based on their previously expressed preferences and those of other similar users. Although CF has been widely applied in various applications, its applicability is restricted due to the data sparsity, the data inadequateness of new users and new items (cold start problem), and the growth of both the number of users and items in the database (scalability problem). In this paper, we propose an efficient iterative clustered prediction technique to transform user-item sparse matrix to a dense one and overcome the scalability problem. In this technique, spectral clustering algorithm is utilized to optimize the neighborhood selection and group the data into users' and items' clusters. Then, both clustered user-based and clustered item-based approaches are aggregated to efficiently predict the unknown ratings. Our experiments on MovieLens and book-crossing data sets indicate substantial and consistent improvements in recommendations accuracy compared to the hybrid user-based and item-based approach without clustering, hybrid approach with k-means and singular value decomposition (SVD)-based CF. Furthermore, we demonstrated the effectiveness of the proposed iterative technique and proved its performance through a varying number of iterations.

Download Full-text

Noises Cutting and Natural Neighbors Spectral Clustering Based on Coupling P System

Processes ◽

10.3390/pr9030439 ◽

2021 ◽

Vol 9 (3) ◽

pp. 439

Author(s):

Xiaoling Zhang ◽

Xiyu Liu

Keyword(s):

Spectral Clustering ◽

Critical Density ◽

Synthetic Data ◽

P System ◽

Data Sets ◽

Affinity Matrix ◽

Clustering Method ◽

Data Points ◽

Comparison Algorithms ◽

Searching Method

Clustering analysis, a key step for many data mining problems, can be applied to various fields. However, no matter what kind of clustering method, noise points have always been an important factor affecting the clustering effect. In addition, in spectral clustering, the construction of affinity matrix affects the formation of new samples, which in turn affects the final clustering results. Therefore, this study proposes a noise cutting and natural neighbors spectral clustering method based on coupling P system (NCNNSC-CP) to solve the above problems. The whole algorithm process is carried out in the coupled P system. We propose a natural neighbors searching method without parameters, which can quickly determine the natural neighbors and natural characteristic value of data points. Then, based on it, the critical density and reverse density are obtained, and noise identification and cutting are performed. The affinity matrix constructed using core natural neighbors greatly improve the similarity between data points. Experimental results on nine synthetic data sets and six UCI datasets demonstrate that the proposed algorithm is better than other comparison algorithms.

Download Full-text

Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/2192-113x-2-18 ◽

2013 ◽

Vol 2 (1) ◽

pp. 18 ◽

Cited By ~ 9

Author(s):

Ran Jin ◽

Chunhai Kou ◽

Ruijuan Liu ◽

Yefeng Li

Keyword(s):

Cloud Computing ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Algorithm Design ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Computing Environment ◽

Cloud Computing Environment ◽

Spectral Clustering Algorithm

Download Full-text

A new affinity matrix weighted k-nearest neighbors graph to improve spectral clustering accuracy

PeerJ Computer Science ◽

10.7717/peerj-cs.692 ◽

2021 ◽

Vol 7 ◽

pp. e692

Author(s):

Muhammad Jamal Ahmed ◽

Faisal Saeed ◽

Anand Paul ◽

Sadeeq Jan ◽

Hyuncheol Seo

Keyword(s):

Spectral Clustering ◽

Large Data ◽

Nearest Neighbors ◽

Data Sets ◽

Affinity Matrix ◽

Clustering Methods ◽

K Nearest Neighbors ◽

Learning Techniques ◽

Interesting Task ◽

Similarity Graph

Researchers have thought about clustering approaches that incorporate traditional clustering methods and deep learning techniques. These approaches normally boost the performance of clustering. Getting knowledge from large data-sets is quite an interesting task. In this case, we use some dimensionality reduction and clustering techniques. Spectral clustering is gaining popularity recently because of its performance. Lately, numerous techniques have been introduced to boost spectral clustering performance. One of the most significant part of these techniques is to construct a similarity graph. We introduced weighted k-nearest neighbors technique for the construction of similarity graph. Using this new metric for the construction of affinity matrix, we achieved good results as we tested it both on real and artificial data-sets.

Download Full-text

An improved spectral clustering algorithm based on local neighbors in kernel space

Computer Science and Information Systems ◽

10.2298/csis110415064l ◽

2011 ◽

Vol 8 (4) ◽

pp. 1143-1157 ◽

Cited By ~ 5

Author(s):

Xinyue Liu ◽

Xing Yong ◽

Hongfei Lin

Keyword(s):

Real World ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Sparse Matrix ◽

Feature Space ◽

Data Sets ◽

Kernel Space ◽

Real World Data ◽

World Data ◽

Linear Reconstruction

Similarity matrix is critical to the performance of spectral clustering. Mercer kernels have become popular largely due to its successes in applying kernel methods such as kernel PCA. A novel spectral clustering method is proposed based on local neighborhood in kernel space (SC-LNK), which assumes that each data point can be linearly reconstructed from its neighbors. The SC-LNK algorithm tries to project the data to a feature space by the Mercer kernel, and then learn a sparse matrix using linear reconstruction as the similarity graph for spectral clustering. Experiments have been performed on synthetic and real world data sets and have shown that spectral clustering based on linear reconstruction in kernel space outperforms the conventional spectral clustering and the other two algorithms, especially in real world data sets.

Download Full-text

Evaluation of Two-Step Spectral Clustering Algorithm for Large Untypical Data Sets

Data Analysis and Classification - Studies in Classification, Data Analysis, and Knowledge Organization ◽

10.1007/978-3-030-75190-6_1 ◽

2021 ◽

pp. 3-9

Author(s):

Andrzej Dudek

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Data Sets ◽

Spectral Clustering Algorithm

Download Full-text

The Larger the Better: Analysis of a Scalable Spectral Clustering Algorithm with Cosine Similarity

10.3233/faia210280 ◽

2021 ◽

Author(s):

Guangliang Chen

Keyword(s):

Perturbation Analysis ◽

Spectral Clustering ◽

Clustering Algorithm ◽

Linear Complexity ◽

Large Data ◽

Cosine Similarity ◽

Large Data Sets ◽

Data Sets ◽

Scalable Algorithm ◽

Spectral Clustering Algorithm

Chen (2018) proposed a scalable spectral clustering algorithm for cosine similarity to handle the task of clustering large data sets. It runs extremely fast, with a linear complexity in the size of the data, and achieves state of the art accuracy. This paper conducts perturbation analysis of the algorithm to understand the effect of discarding a perturbation term in an eigendecomposition step. Our results show that the accuracy of the approximation by the scalable algorithm depends on the connectivity of the clusters, their separation and sizes, and is especially accurate for large data sets.

Download Full-text

User Power Behavior Similarity Clustering Based on Unsupervised Extreme Learning Machine Algorithm

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096512666191004130655 ◽

2020 ◽

Vol 13 (5) ◽

pp. 641-649

Author(s):

Yuancheng Li ◽

Yaqi Cui ◽

Xiaolong Zhang

Keyword(s):

Extreme Learning Machine ◽

Clustering Algorithm ◽

Characteristic Curve ◽

Clustering Algorithms ◽

Data Sets ◽

Residential Areas ◽

Processing Power ◽

Learning Machine ◽

Advanced Metering ◽

Matlab Programming

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.

Download Full-text