scholarly journals Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length

2017 ◽  
Vol 29 (8) ◽  
pp. 2164-2176 ◽  
Author(s):  
Steven Squires ◽  
Adam Prügel-Bennett ◽  
Mahesan Niranjan

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

Author(s):  
H. Fang ◽  
A. H. Li ◽  
H. X. Xu ◽  
T. Wang ◽  
K. Jiang ◽  
...  

Due to the limited spatial resolution of remote hyperspectral sensors, pixels are usually highly mixed in the hyperspectral images. Endmember extraction refers to the process identifying the pure endmember signatures from the mixture, which is an important step towards the utilization of hyperspectral data. Nonnegative matrix factorization (NMF) is a widely used method of endmember extraction due to its effectiveness and convenience. While most NMF-based methods have single-layer structures, which may have difficulties in effectively learning the structures of highly mixed and complex data. On the other hand, multilayer algorithms have shown great advantages in learning data features and been widely studied in many fields. In this paper, we presented a <i>L</i><sub>1</sub> sparsityconstrained multilayer NMF method for endmember extraction of highly mixed data. Firstly, the multilayer NMF structure was obtained by unfolding NMF into a certain number of layers. In each layer, the abundance matrix was decomposed into the endmember matrix and abundance matrix of the next layer. Besides, to improve the performance of NMF, we incorporated sparsity constraints to the multilayer NMF model by adding a <i>L</i><sub>1</sub> regularizer of the abundance matrix to each layer. At last, a layer-wise optimization method based on NeNMF was proposed to train the multilayer NMF structure. Experiments were conducted on both synthetic data and real data. The results demonstrate that our proposed algorithm can achieve better results than several state-of-art approaches.


Author(s):  
Wen-Sheng Chen ◽  
Jingmin Liu ◽  
Binbin Pan ◽  
Yugao Li

Nonnegative matrix factorization (NMF) is a linear approach for extracting localized feature of facial image. However, NMF may fail to process the data points that are nonlinearly separable. The kernel extension of NMF, named kernel NMF (KNMF), can model the nonlinear relationship among data points and extract nonlinear features of facial images. KNMF is an unsupervised method, thus it does not utilize the supervision information. Moreover, the extracted features by KNMF are not sparse enough. To overcome these limitations, this paper proposes a supervised KNMF called block kernel NMF (BKNMF). A novel objective function is established by incorporating the intra-class information. The algorithm is derived by making use of the block strategy and kernel theory. Our BKNMF has some merits for face recognition, such as highly sparse features and orthogonal features from different classes. We theoretically analyze the convergence of the proposed BKNMF. Compared with some state-of-the-art methods, our BKNMF achieves superior performance in face recognition.


Author(s):  
Akhand Rai ◽  
Sanjay H Upadhyay

Bearing faults are a major reason for the catastrophic breakdown of rotating machinery. Therefore, the early detection of bearing faults becomes a necessity to attain an uninterrupted and safe operation. This paper proposes a novel approach based on semi-nonnegative matrix factorization for detection of incipient faults in bearings. The semi-nonnegative matrix factorization algorithm creates a sparse, localized, part-based representation of the original data and assists to capture the fault information in bearing signals more effectively. Through semi-nonnegative matrix factorization, two bearing health indicators are derived to fulfill the desired purpose. In doing so, the paper tries to address two critical issues: (i) how to reduce the dimensionality of feature space (ii) how to obtain a definite range of the indicator between 0 and 1. Firstly, a set of time domain, frequency domain, and time–frequency domain features are extracted from the bearing vibration signals. Secondly, the feature dataset is utilized to train the semi-nonnegative matrix factorization algorithm which decomposes the training data matrix into two new matrices of lower ranks. Thirdly, the test feature vectors are projected onto these lower dimensional matrices to obtain two statistics called as square prediction error and Q2. Finally, the Bayesian inference approach is exploited to convert the two statistics into health indicators that have a fixed range between [0–1]. The application of the advocated technique on experimental bearing signals demonstrates that it can effectively predict the weak defects in bearings as well as performs better than the earlier methods like principal component analysis and locality preserving projections.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Jing Wu ◽  
Bin Chen ◽  
Tao Han

Nonnegative matrix factorization (NMF) is a popular method for the multivariate analysis of nonnegative data. It involves decomposing a data matrix into a product of two factor matrices with all entries restricted to being nonnegative. Orthogonal nonnegative matrix factorization (ONMF) has been introduced recently. This method has demonstrated remarkable performance in clustering tasks, such as gene expression classification. In this study, we introduce two convergence methods for solving ONMF. First, we design a convergent orthogonal algorithm based on the Lagrange multiplier method. Second, we propose an approach that is based on the alternating direction method. Finally, we demonstrate that the two proposed approaches tend to deliver higher-quality solutions and perform better in clustering tasks compared with a state-of-the-art ONMF.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Jiang Wei ◽  
Li Min ◽  
Zhang Yongqing

The convex nonnegative matrix factorization (CNMF) is a variation of nonnegative matrix factorization (NMF) in which each cluster is expressed by a linear combination of the data points and each data point is represented by a linear combination of the cluster centers. When there exists nonlinearity in the manifold structure, both NMF and CNMF are incapable of characterizing the geometric structure of the data. This paper introduces a neighborhood preserving convex nonnegative matrix factorization (NPCNMF), which imposes an additional constraint on CNMF that each data point can be represented as a linear combination of its neighbors. Thus our method is able to reap the benefits of both nonnegative data factorization and the purpose of manifold structure. An efficient multiplicative updating procedure is produced, and its convergence is guaranteed theoretically. The feasibility and effectiveness of NPCNMF are verified on several standard data sets with promising results.


Author(s):  
Jing Wang ◽  
Feng Tian ◽  
Xiao Wang ◽  
Hongchuan Yu ◽  
Chang Hong Liu ◽  
...  

Real data are usually complex and contain various components. For example, face images have expressions and genders. Each component mainly reflects one aspect of data and provides information others do not have. Therefore, exploring the semantic information of multiple components as well as the diversity among them is of great benefit to understand data comprehensively and in-depth. However, this cannot be achieved by current nonnegative matrix factorization (NMF)-based methods, despite that NMF has shown remarkable competitiveness in learning parts-based representation of data. To overcome this limitation, we propose a novel multi-component nonnegative matrix factorization (MCNMF). Instead of seeking for only one representation of data, MCNMF learns multiple representations simultaneously, with the help of the Hilbert Schmidt Independence Criterion (HSIC) as a diversity term. HSIC explores the diverse information among the representations, where each representation corresponds to a component. By integrating the multiple representations, a more comprehensive representation is then established. A new iterative updating optimization scheme is derived to solve the objective function of MCNMF, along with its correctness and convergence guarantees. Extensive experimental results on real-world datasets have shown that MCNMF not only achieves more accurate performance over the state-of-the-arts using the aggregated representation, but also interprets data from different aspects with the multiple representations, which is beyond what current NMFs can offer.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Shu-Zhen Lai ◽  
Hou-Biao Li ◽  
Zu-Tao Zhang

As is well known, the nonnegative matrix factorization (NMF) is a dimension reduction method that has been widely used in image processing, text compressing, signal processing, and so forth. In this paper, an algorithm on nonnegative matrix approximation is proposed. This method is mainly based on a relaxed active set and the quasi-Newton type algorithm, by using the symmetric rank-one and negative curvature direction technologies to approximate the Hessian matrix. The method improves some recent results. In addition, some numerical experiments are presented in the synthetic data, imaging processing, and text clustering. By comparing with the other six nonnegative matrix approximation methods, this method is more robust in almost all cases.


2016 ◽  
Vol 2016 ◽  
pp. 1-14
Author(s):  
Bingfeng Li ◽  
Yandong Tang ◽  
Zhi Han

As a linear dimensionality reduction method, nonnegative matrix factorization (NMF) has been widely used in many fields, such as machine learning and data mining. However, there are still two major drawbacks for NMF: (a) NMF can only perform semantic factorization in Euclidean space, and it fails to discover the intrinsic geometrical structure of high-dimensional data distribution. (b) NMF suffers from noisy data, which are commonly encountered in real-world applications. To address these issues, in this paper, we present a new robust structure preserving nonnegative matrix factorization (RSPNMF) framework. In RSPNMF, a local affinity graph and a distant repulsion graph are constructed to encode the geometrical information, and noisy data influence is alleviated by characterizing the data reconstruction term of NMF withl2,1-norm instead ofl2-norm. With incorporation of the local and distant structure preservation regularization term into the robust NMF framework, our algorithm can discover a low-dimensional embedding subspace with the nature of structure preservation. RSPNMF is formulated as an optimization problem and solved by an effective iterative multiplicative update algorithm. Experimental results on some facial image datasets clustering show significant performance improvement of RSPNMF in comparison with the state-of-the-art algorithms.


Author(s):  
Xing Wang ◽  
Jun Wang ◽  
Carlotta Domeniconi ◽  
Guoxian Yu ◽  
Guoqiang Xiao ◽  
...  

Multiple clustering aims at discovering diverse ways of organizing data into clusters. Despite the progress made, it’s still a challenge for users to analyze and understand the distinctive structure of each output clustering. To ease this process, we consider diverse clusterings embedded in different subspaces, and analyze the embedding subspaces to shed light into the structure of each clustering. To this end, we provide a two-stage approach called MISC (Multiple Independent Subspace Clusterings). In the first stage, MISC uses independent subspace analysis to seek multiple and statistical independent (i.e. non-redundant) subspaces, and determines the number of subspaces via the minimum description length principle. In the second stage, to account for the intrinsic geometric structure of samples embedded in each subspace, MISC performs graph regularized semi-nonnegative matrix factorization to explore clusters. It additionally integrates the kernel trick into matrix factorization to handle non-linearly separable clusters. Experimental results on synthetic datasets show that MISC can find different interesting clusterings from the sought independent subspaces, and it also outperforms other related and competitive approaches on real-world datasets.


Sign in / Sign up

Export Citation Format

Share Document