scholarly journals Pseudo Supervised Matrix Factorization in Discriminative Subspace

Author(s):  
Jiaqi Ma ◽  
Yipeng Zhang ◽  
Lefei Zhang ◽  
Bo Du ◽  
Dapeng Tao

Non-negative Matrix Factorization (NMF) and spectral clustering have been proved to be efficient and effective for data clustering tasks and have been applied to various real-world scenes. However, there are still some drawbacks in traditional methods: (1) most existing algorithms only consider high-dimensional data directly while neglect the intrinsic data structure in the low-dimensional subspace; (2) the pseudo-information got in the optimization process is not relevant to most spectral clustering and manifold regularization methods. In this paper, a novel unsupervised matrix factorization method, Pseudo Supervised Matrix Factorization (PSMF), is proposed for data clustering. The main contributions are threefold: (1) to cluster in the discriminant subspace, Linear Discriminant Analysis (LDA) combines with NMF to become a unified framework; (2) we propose a pseudo supervised manifold regularization term which utilizes the pseudo-information to instruct the regularization term in order to find subspace that discriminates different classes; (3) an efficient optimization algorithm is designed to solve the proposed problem with proved convergence. Extensive experiments on multiple benchmark datasets illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.

Author(s):  
Xianjin Shi ◽  
Wanwan Wang ◽  
Chongsheng Zhang

Over the past few decades, a great many data clustering algorithms have been developed, including K-Means, DBSCAN, Bi-Clustering and Spectral clustering, etc. In recent years, two new data  clustering algorithms have been proposed, which are affinity propagation (AP, 2007) and density peak based clustering (DP, 2014). In this work, we empirically compare the performance of these two latest data clustering algorithms with state-of-the-art, using 6 external and 2 internal clustering validation metrics. Our experimental results on 16 public datasets show that, the two latest clustering algorithms, AP and DP, do not always outperform DBSCAN. Therefore, to find the best clustering algorithm for a specific dataset, all of AP, DP and DBSCAN should be considered.  Moreover, we find that the comparison of different clustering algorithms is closely related to the clustering evaluation metrics adopted. For instance, when using the Silhouette clustering validation metric, the overall performance of K-Means is as good as AP and DP. This work has important reference values for researchers and engineers who need to select appropriate clustering algorithms for their specific applications.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16
Author(s):  
Wei Jiang ◽  
Qian Lv ◽  
Chenggang Yan ◽  
Kewei Tang ◽  
Jie Zhang

Obtaining an optimum data representation is a challenging issue that arises in many intellectual data processing techniques such as data mining, pattern recognition, and gene clustering. Many existing methods formulate this problem as a nonnegative matrix factorization (NMF) approximation problem. The standard NMF uses the least square loss function, which is not robust to outlier points and noises and fails to utilize prior label information to enhance the discriminability of representations. In this study, we develop a novel matrix factorization method called robust semisupervised nonnegative local coordinate factorization by integrating robust NMF, a robust local coordinate constraint, and local spline regression into a unified framework. We use the l2,1 norm for the loss function of the NMF and a local coordinate constraint term to make our method insensitive to outlier points and noises. In addition, we exploit the local and global consistencies of sample labels to guarantee that data representation is compact and discriminative. An efficient multiplicative updating algorithm is deduced to solve the novel loss function, followed by a strict proof of the convergence. Several experiments conducted in this study on face and gene datasets clearly indicate that the proposed method is more effective and robust compared to the state-of-the-art methods.


Author(s):  
Lefei Zhang ◽  
Qian Zhang ◽  
Bo Du ◽  
Jane You ◽  
Dacheng Tao

Data clustering is the task to group the data samples into certain clusters based on the relationships of samples and structures hidden in data, and it is a fundamental and important topic in data mining and machine learning areas. In the literature, the spectral clustering is one of the most popular approaches and has many variants in recent years. However, the performance of spectral clustering is determined by the affinity matrix, which is always computed by a predefined model (e.g., Gaussian kernel function) with carefully tuned parameters combination, and may far from optimal in practice. In this paper, we propose to consider the observed data clustering as a robust matrix factorization point of view, and learn an affinity matrix simultaneously to regularize the proposed matrix factorization. The solution of the proposed adaptive manifold regularized matrix factorization (AMRMF) is reached by a novel Augmented Lagrangian Multiplier (ALM) based algorithm. The experimental results on standard clustering datasets demonstrate the superior performance over the exist alternatives.


2018 ◽  
Vol 306 ◽  
pp. 182-188 ◽  
Author(s):  
Mulin Chen ◽  
Qi Wang ◽  
Xuelong Li

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


Author(s):  
Yinlei Hu ◽  
Bin Li ◽  
Falai Chen ◽  
Kun Qu

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.


2014 ◽  
Vol 687-691 ◽  
pp. 1350-1353
Author(s):  
Li Li Fu ◽  
Yong Li Liu ◽  
Li Jing Hao

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.


Sign in / Sign up

Export Citation Format

Share Document