Pseudo Supervised Matrix Factorization in Discriminative Subspace

Non-negative Matrix Factorization (NMF) and spectral clustering have been proved to be efficient and effective for data clustering tasks and have been applied to various real-world scenes. However, there are still some drawbacks in traditional methods: (1) most existing algorithms only consider high-dimensional data directly while neglect the intrinsic data structure in the low-dimensional subspace; (2) the pseudo-information got in the optimization process is not relevant to most spectral clustering and manifold regularization methods. In this paper, a novel unsupervised matrix factorization method, Pseudo Supervised Matrix Factorization (PSMF), is proposed for data clustering. The main contributions are threefold: (1) to cluster in the discriminant subspace, Linear Discriminant Analysis (LDA) combines with NMF to become a unified framework; (2) we propose a pseudo supervised manifold regularization term which utilizes the pseudo-information to instruct the regularization term in order to find subspace that discriminates different classes; (3) an efficient optimization algorithm is designed to solve the proposed problem with proved convergence. Extensive experiments on multiple benchmark datasets illustrate that the proposed model outperforms other state-of-the-art clustering algorithms.

Download Full-text

An Empirical Comparison of Latest Data Clustering Algorithms with State-of-the-Art

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i2.pp410-415 ◽

2017 ◽

Vol 5 (2) ◽

pp. 410 ◽

Cited By ~ 4

Author(s):

Xianjin Shi ◽

Wanwan Wang ◽

Chongsheng Zhang

Keyword(s):

Data Clustering ◽

Spectral Clustering ◽

Clustering Algorithm ◽

State Of The Art ◽

Clustering Algorithms ◽

Density Peak ◽

The Past ◽

Overall Performance ◽

Public Datasets ◽

Clustering Validation

Over the past few decades, a great many data clustering algorithms have been developed, including K-Means, DBSCAN, Bi-Clustering and Spectral clustering, etc. In recent years, two new data clustering algorithms have been proposed, which are affinity propagation (AP, 2007) and density peak based clustering (DP, 2014). In this work, we empirically compare the performance of these two latest data clustering algorithms with state-of-the-art, using 6 external and 2 internal clustering validation metrics. Our experimental results on 16 public datasets show that, the two latest clustering algorithms, AP and DP, do not always outperform DBSCAN. Therefore, to find the best clustering algorithm for a specific dataset, all of AP, DP and DBSCAN should be considered. Moreover, we find that the comparison of different clustering algorithms is closely related to the clustering evaluation metrics adopted. For instance, when using the Silhouette clustering validation metric, the overall performance of K-Means is as good as AP and DP. This work has important reference values for researchers and engineers who need to select appropriate clustering algorithms for their specific applications.

Download Full-text

Robust Semisupervised Nonnegative Local Coordinate Factorization for Data Representation

Complexity ◽

10.1155/2018/7963210 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16

Author(s):

Wei Jiang ◽

Qian Lv ◽

Chenggang Yan ◽

Kewei Tang ◽

Jie Zhang

Keyword(s):

Loss Function ◽

Matrix Factorization ◽

Nonnegative Matrix ◽

Approximation Problem ◽

Factorization Method ◽

Data Representation ◽

Least Square ◽

Gene Clustering ◽

Unified Framework ◽

Label Information

Obtaining an optimum data representation is a challenging issue that arises in many intellectual data processing techniques such as data mining, pattern recognition, and gene clustering. Many existing methods formulate this problem as a nonnegative matrix factorization (NMF) approximation problem. The standard NMF uses the least square loss function, which is not robust to outlier points and noises and fails to utilize prior label information to enhance the discriminability of representations. In this study, we develop a novel matrix factorization method called robust semisupervised nonnegative local coordinate factorization by integrating robust NMF, a robust local coordinate constraint, and local spline regression into a unified framework. We use the l2,1 norm for the loss function of the NMF and a local coordinate constraint term to make our method insensitive to outlier points and noises. In addition, we exploit the local and global consistencies of sample labels to guarantee that data representation is compact and discriminative. An efficient multiplicative updating algorithm is deduced to solve the novel loss function, followed by a strict proof of the convergence. Several experiments conducted in this study on face and gene datasets clearly indicate that the proposed method is more effective and robust compared to the state-of-the-art methods.

Download Full-text

Multi-view data clustering via non-negative matrix factorization with manifold regularization

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-021-01307-7 ◽

2021 ◽

Author(s):

Ghufran Ahmad Khan ◽

Jie Hu ◽

Tianrui Li ◽

Bassoma Diallo ◽

Hongjun Wang

Keyword(s):

Matrix Factorization ◽

Data Clustering ◽

Manifold Regularization ◽

Non Negative Matrix Factorization

Download Full-text

Adaptive Manifold Regularized Matrix Factorization for Data Clustering

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/475 ◽

2017 ◽

Cited By ~ 12

Author(s):

Lefei Zhang ◽

Qian Zhang ◽

Bo Du ◽

Jane You ◽

Dacheng Tao

Keyword(s):

Matrix Factorization ◽

Data Clustering ◽

Spectral Clustering ◽

Augmented Lagrangian ◽

Point Of View ◽

Superior Performance ◽

Gaussian Kernel ◽

Lagrangian Multiplier ◽

Affinity Matrix ◽

Gaussian Kernel Function

Data clustering is the task to group the data samples into certain clusters based on the relationships of samples and structures hidden in data, and it is a fundamental and important topic in data mining and machine learning areas. In the literature, the spectral clustering is one of the most popular approaches and has many variants in recent years. However, the performance of spectral clustering is determined by the affinity matrix, which is always computed by a predefined model (e.g., Gaussian kernel function) with carefully tuned parameters combination, and may far from optimal in practice. In this paper, we propose to consider the observed data clustering as a robust matrix factorization point of view, and learn an affinity matrix simultaneously to regularize the proposed matrix factorization. The solution of the proposed adaptive manifold regularized matrix factorization (AMRMF) is reached by a novel Augmented Lagrangian Multiplier (ALM) based algorithm. The experimental results on standard clustering datasets demonstrate the superior performance over the exist alternatives.

Download Full-text

Adaptive Projected Matrix Factorization method for data clustering

Neurocomputing ◽

10.1016/j.neucom.2018.04.031 ◽

2018 ◽

Vol 306 ◽

pp. 182-188 ◽

Cited By ~ 5

Author(s):

Mulin Chen ◽

Qi Wang ◽

Xuelong Li

Keyword(s):

Matrix Factorization ◽

Data Clustering ◽

Factorization Method

Download Full-text

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

An Enhanced Spectral Clustering Algorithm with S-Distance

Symmetry ◽

10.3390/sym13040596 ◽

2021 ◽

Vol 13 (4) ◽

pp. 596

Author(s):

Krishna Kumar Sharma ◽

Ayan Seal ◽

Enrique Herrera-Viedma ◽

Ondrej Krejcar

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Rank Test ◽

Customer Churn ◽

Signed Rank ◽

Signed Rank Test ◽

Spectral Clustering Algorithm ◽

Industrial Databases

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text

Single-cell data clustering based on sparse optimization and low-rank matrix factorization

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab098 ◽

2021 ◽

Author(s):

Yinlei Hu ◽

Bin Li ◽

Falai Chen ◽

Kun Qu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Data Clustering ◽

Cell Types ◽

Low Rank ◽

Sequencing Data ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.

Download Full-text

Research on Spectral Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1350 ◽

2014 ◽

Vol 687-691 ◽

pp. 1350-1353

Author(s):

Li Li Fu ◽

Yong Li Liu ◽

Li Jing Hao

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Theoretical Foundation ◽

Clustering Algorithms ◽

Spectral Graph Theory ◽

Graph Partition ◽

Mining Areas ◽

Spectral Graph ◽

Definition Of ◽

Spectral Clustering Algorithm

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.

Download Full-text

Separation of singing voice from music accompaniment using matrix factorization method

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) ◽

10.1109/icatcct.2015.7456876 ◽

2015 ◽

Author(s):

Harshada Burute ◽

P. B. Mane

Keyword(s):

Matrix Factorization ◽

Factorization Method ◽

Singing Voice

Download Full-text