Multiple Independent Subspace Clusterings

Multiple clustering aims at discovering diverse ways of organizing data into clusters. Despite the progress made, it’s still a challenge for users to analyze and understand the distinctive structure of each output clustering. To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering. To this end, we provide a two-stage approach called MISC (Multiple Independent Subspace Clusterings). In the first stage, MISC uses independent subspace analysis to seek multiple and statistical independent (i.e. non-redundant) subspaces, and determines the number of subspaces via the minimum description length principle. In the second stage, to account for the intrinsic geometric structure of samples embedded in each subspace, MISC performs graph regularized semi-nonnegative matrix factorization to explore clusters. It additionally integrates the kernel trick into matrix factorization to handle non-linearly separable clusters. Experimental results on synthetic datasets show that MISC can find different interesting clusterings from the sought independent subspaces, and it also outperforms other related and competitive approaches on real-world datasets.

Download Full-text

Factor-Bounded Nonnegative Matrix Factorization

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451395 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-18

Author(s):

Kai Liu ◽

Xiangyu Li ◽

Zhihui Zhu ◽

Lodewijk Brand ◽

Hua Wang

Keyword(s):

Matrix Factorization ◽

Clustering Algorithm ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Optimization Methods ◽

Auxiliary Function ◽

Image Clustering ◽

Real World Datasets ◽

The Relationship ◽

Matrix Factors

Nonnegative Matrix Factorization (NMF) is broadly used to determine class membership in a variety of clustering applications. From movie recommendations and image clustering to visual feature extractions, NMF has applications to solve a large number of knowledge discovery and data mining problems. Traditional optimization methods, such as the Multiplicative Updating Algorithm (MUA), solves the NMF problem by utilizing an auxiliary function to ensure that the objective monotonically decreases. Although the objective in MUA converges, there exists no proof to show that the learned matrix factors converge as well. Without this rigorous analysis, the clustering performance and stability of the NMF algorithms cannot be guaranteed. To address this knowledge gap, in this article, we study the factor-bounded NMF problem and provide a solution algorithm with proven convergence by rigorous mathematical analysis, which ensures that both the objective and matrix factors converge. In addition, we show the relationship between MUA and our solution followed by an analysis of the convergence of MUA. Experiments on both toy data and real-world datasets validate the correctness of our proposed method and its utility as an effective clustering algorithm.

Download Full-text

Multi-Component Nonnegative Matrix Factorization

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/407 ◽

2017 ◽

Cited By ~ 3

Author(s):

Jing Wang ◽

Feng Tian ◽

Xiao Wang ◽

Hongchuan Yu ◽

Chang Hong Liu ◽

...

Keyword(s):

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Multiple Representations ◽

Nonnegative Matrix ◽

Real Data ◽

Multiple Components ◽

Face Images ◽

The Arts ◽

Accurate Performance ◽

Real World Datasets

Real data are usually complex and contain various components. For example, face images have expressions and genders. Each component mainly reflects one aspect of data and provides information others do not have. Therefore, exploring the semantic information of multiple components as well as the diversity among them is of great benefit to understand data comprehensively and in-depth. However, this cannot be achieved by current nonnegative matrix factorization (NMF)-based methods, despite that NMF has shown remarkable competitiveness in learning parts-based representation of data. To overcome this limitation, we propose a novel multi-component nonnegative matrix factorization (MCNMF). Instead of seeking for only one representation of data, MCNMF learns multiple representations simultaneously, with the help of the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term. HSIC explores the diverse information among the representations, where each representation corresponds to a component. By integrating the multiple representations, a more comprehensive representation is then established. A new iterative updating optimization scheme is derived to solve the objective function of MCNMF, along with its correctness and convergence guarantees. Extensive experimental results on real-world datasets have shown that MCNMF not only achieves more accurate performance over the state-of-the-arts using the aggregated representation, but also interprets data from different aspects with the multiple representations, which is beyond what current NMFs can offer.

Download Full-text

Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length

Neural Computation ◽

10.1162/neco_a_00980 ◽

2017 ◽

Vol 29 (8) ◽

pp. 2164-2176 ◽

Cited By ~ 9

Author(s):

Steven Squires ◽

Adam Prügel-Bennett ◽

Mahesan Niranjan

Keyword(s):

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Minimum Description Length ◽

Synthetic Data ◽

Nonnegative Matrix ◽

Real Data ◽

Data Matrix ◽

Constraint Forces ◽

Data Points ◽

Linear Dimensionality Reduction

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

Download Full-text

Multiview Clustering via Robust Neighboring Constraint Nonnegative Matrix Factorization

Mathematical Problems in Engineering ◽

10.1155/2019/6084382 ◽

2019 ◽

Vol 2019 ◽

pp. 1-10

Author(s):

Feiqiong Chen ◽

Guopeng Li ◽

Shuaihui Wang ◽

Zhisong Pan

Keyword(s):

Real World ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Data Representation ◽

Structure Representation ◽

Cut Method ◽

Accurate Performance ◽

Real World Datasets ◽

Multiview Clustering

Many real-world datasets are described by multiple views, which can provide complementary information to each other. Synthesizing multiview features for data representation can lead to more comprehensive data description for clustering task. However, it is often difficult to preserve the locally real structure in each view and reconcile the noises and outliers among views. In this paper, instead of seeking for the common representation among views, a novel robust neighboring constraint nonnegative matrix factorization (rNNMF) is proposed to learn the neighbor structure representation in each view, and L2,1-norm-based loss function is designed to improve its robustness against noises and outliers. Then, a final comprehensive representation of data was integrated with those representations of multiviews. Finally, a neighboring similarity graph was learned and the graph cut method was used to partition data into its underlying clusters. Experimental results on several real-world datasets have shown that our model achieves more accurate performance in multiview clustering compared to existing state-of-the-art methods.

Download Full-text