A Weighted Kernel PCA Formulation with Out-of-Sample Extensions for Spectral Clustering Methods

Author(s):  
C. Alzate ◽  
J.A.K. Suykens
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yonghua Tang ◽  
Qiang Fan ◽  
Peng Liu

The traditional teaching model cannot adapt to the teaching needs of the era of smart teaching. Based on this, this paper combines data mining technology to carry out teaching reforms, constructs a computer-aided system based on data mining, and constructs teaching system functions based on actual conditions. The constructed system can carry out multisubject teaching. Moreover, this paper uses a data mining system to mine teaching resources and uses spectral clustering methods to integrate multiple teaching resources to improve the practicability of data mining algorithms. In addition, this paper combines digital technology to deal with teaching resources. Finally, after building the system, this paper designs experiments to verify the performance of the system. From the research results, it can be seen that the system constructed in this paper has certain teaching and practical effects, and it can be applied to a larger teaching scope in subsequent research.


Data clustering is an active topic of research as it has applications in various fields such as biology, management, statistics, pattern recognition, etc. Spectral Clustering (SC) has gained popularity in recent times due to its ability to handle complex data and ease of implementation. A crucial step in spectral clustering is the construction of the affinity matrix, which is based on a pairwise similarity measure. The varied characteristics of datasets affect the performance of a spectral clustering technique. In this paper, we have proposed an affinity measure based on Topological Node Features (TNFs) viz., Clustering Coefficient (CC) and Summation index (SI) to define the notion of density and local structure. It has been shown that these features improve the performance of SC in clustering the data. The experiments were conducted on synthetic datasets, UCI datasets, and the MNIST handwritten datasets. The results show that the proposed affinity metric outperforms several recent spectral clustering methods in terms of accuracy.


2015 ◽  
Vol 76 (1) ◽  
Author(s):  
Ang Jun Chin ◽  
Andri Mirzal ◽  
Habibollah Haron

Gene expression profile is eminent for its broad applications and achievements in disease discovery and analysis, especially in cancer research. Spectral clustering is robust to irrelevant features which are appropriated for gene expression analysis. However, previous works show that performance comparison with other clustering methods is limited and only a few microarray data sets were analyzed in each study. In this study, we demonstrate the use of spectral clustering in identifying cancer types or subtypes from microarray gene expression profiling. Spectral clustering was applied to eleven microarray data sets and its clustering performances were compared with the results in the literature. Based on the result, overall the spectral clustering slightly outperformed the corresponding results in the literature. The spectral clustering can also offer more stable clustering performances as it has smaller standard deviation value. Moreover, out of eleven data sets the spectral clustering outperformed the corresponding methods in the literature for six data sets. So, it can be stated that the spectral clustering is a promising method in identifying the cancer types or subtypes for microarray gene expression data sets.


2014 ◽  
Vol 496-500 ◽  
pp. 1817-1820
Author(s):  
Wang Ming Xu ◽  
Hang Yang ◽  
Kang Ling Fang ◽  
Xin Hai Liu

BoVW (Bag of Visual Words) Model has attracted much attention for many computer vision applications in which an image is represented by a histogram of visual words. Two of its critical steps are to construct a visual dictionary and to quantize each local feature to its nearest visual word in the dictionary. In this paper, we present the framework of a generalized BoVW (GBoVW) Model in which feature quantization can be replaced by sparse coding based feature encoding. We also propose to use spectral clustering to construct a visual dictionary to overcome the shortcomings of K-Means based clustering algorithms. Image retrieval experiments on ZuBud database indicate that GBoVW Model improves BoVW Model and the visual dictionary generated by spectral clustering achieves better performance than that by K-Means based clustering methods.


Author(s):  
Meagan Carney ◽  
Holger Kantz

Abstract. We use sophisticated machine-learning techniques on a network of summer temperature and precipitation time series taken from stations throughout Germany for the years from 1960 to 2018. In particular, we consider (normalized) maximized mutual information as the measure of similarity and expand on recent clustering methods for climate modeling by applying a weighted kernel-based k-means algorithm. We find robust regional clusters that are both time invariant and shared by networks defined separately by precipitation and temperature time series. Finally, we use the resulting clusters to create a nonstationary model of regional summer temperature extremes throughout Germany and are thereby able to quantify the increase in the probability of observing high extreme summer temperature values (>35 ∘C) compared with the last 30 years.


2018 ◽  
Vol 8 (11) ◽  
pp. 2175 ◽  
Author(s):  
Ye Yang ◽  
Yongli Hu ◽  
Fei Wu

Data clustering is an important research topic in data mining and signal processing communications. In all the data clustering methods, the subspace spectral clustering methods based on self expression model, e.g., the Sparse Subspace Clustering (SSC) and the Low Rank Representation (LRR) methods, have attracted a lot of attention and shown good performance. The key step of SSC and LRR is to construct a proper affinity or similarity matrix of data for spectral clustering. Recently, Laplacian graph constraint was introduced into the basic SSC and LRR and obtained considerable improvement. However, the current graph construction methods do not well exploit and reveal the non-linear properties of the clustering data, which is common for high dimensional data. In this paper, we introduce the classic manifold learning method, the Local Linear Embedding (LLE), to learn the non-linear structure underlying the data and use the learned local geometry of manifold as a regularization for SSC and LRR, which results the proposed LLE-SSC and LLE-LRR clustering methods. Additionally, to solve the complex optimization problem involved in the proposed models, an efficient algorithm is also proposed. We test the proposed data clustering methods on several types of public databases. The experimental results show that our methods outperform typical subspace clustering methods with Laplacian graph constraint.


Sign in / Sign up

Export Citation Format

Share Document