Microbiome Data Analysis by Symmetric Non-negative Matrix Factorization With Local and Global Regularization

A network is an efficient tool to organize complicated data. The Laplacian graph has attracted more and more attention for its good properties and has been applied to many tasks including clustering, feature selection, and so on. Recently, studies have indicated that though the Laplacian graph can capture the global information of data, it lacks the power to capture fine-grained structure inherent in network. In contrast, a Vicus matrix can make full use of local topological information from the data. Given this consideration, in this paper we simultaneously introduce Laplacian and Vicus graphs into a symmetric non-negative matrix factorization framework (LVSNMF) to seek and exploit the global and local structure patterns that inherent in the original data. Extensive experiments are conducted on three real datasets (cancer, cell populations, and microbiome data). The experimental results show the proposed LVSNMF algorithm significantly outperforms other competing algorithms, suggesting its potential in biological data analysis.

Download Full-text

A zero-inflated non-negative matrix factorization for the deconvolution of mixed signals of biological data

The International Journal of Biostatistics ◽

10.1515/ijb-2020-0039 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Yixin Kong ◽

Ariangela Kozik ◽

Cindy H. Nakatsu ◽

Yava L. Jones-Hall ◽

Hyonho Chun

Keyword(s):

Matrix Factorization ◽

Factor Model ◽

R Package ◽

Biological Data ◽

Superior Performance ◽

Sequencing Data ◽

Fecal Microbiome ◽

Brain Gene Expression ◽

Cell Transcriptome ◽

Non Negative Matrix Factorization

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.

Download Full-text

Application of non-negative matrix factorization to multispectral FLIM data analysis

Biomedical Optics Express ◽

10.1364/boe.3.002244 ◽

2012 ◽

Vol 3 (9) ◽

pp. 2244 ◽

Cited By ~ 15

Author(s):

Paritosh Pande ◽

Brian E. Applegate ◽

Javier A. Jo

Keyword(s):

Data Analysis ◽

Matrix Factorization ◽

Non Negative Matrix Factorization

Download Full-text

A Non-negative Matrix Factorization Based Method for Identifying Essential Proteins

10.21203/rs.3.rs-537545/v1 ◽

2021 ◽

Author(s):

Zhihong Zhang ◽

Sai Hu ◽

Wei Yan ◽

Bihai Zhao ◽

Lei Wang

Keyword(s):

Protein Interaction ◽

Matrix Factorization ◽

Biological Data ◽

Protein Domain ◽

Biological Information ◽

Ppi Network ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Non Negative Matrix Factorization

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.

Download Full-text

MHSNMF: multi-view hessian regularization based symmetric nonnegative matrix factorization for microbiome data analysis

BMC Bioinformatics ◽

10.1186/s12859-020-03555-w ◽

2020 ◽

Vol 21 (S6) ◽

Author(s):

Yuanyuan Ma ◽

Junmin Zhao ◽

Yingjun Ma

Keyword(s):

Data Analysis ◽

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Rapid Development ◽

Nonnegative Matrix ◽

Metabolomics Data ◽

Normalized Mutual Information ◽

Microbiome Data ◽

Symmetric Nonnegative Matrix Factorization ◽

Microbiome Data Analysis

Abstract Background With the rapid development of high-throughput technique, multiple heterogeneous omics data have been accumulated vastly (e.g., genomics, proteomics and metabolomics data). Integrating information from multiple sources or views is challenging to obtain a profound insight into the complicated relations among micro-organisms, nutrients and host environment. In this paper we propose a multi-view Hessian regularization based symmetric nonnegative matrix factorization algorithm (MHSNMF) for clustering heterogeneous microbiome data. Compared with many existing approaches, the advantages of MHSNMF lie in: (1) MHSNMF combines multiple Hessian regularization to leverage the high-order information from the same cohort of instances with multiple representations; (2) MHSNMF utilities the advantages of SNMF and naturally handles the complex relationship among microbiome samples; (3) uses the consensus matrix obtained by MHSNMF, we also design a novel approach to predict the classification of new microbiome samples. Results We conduct extensive experiments on two real-word datasets (Three-source dataset and Human Microbiome Plan dataset), the experimental results show that the proposed MHSNMF algorithm outperforms other baseline and state-of-the-art methods. Compared with other methods, MHSNMF achieves the best performance (accuracy: 95.28%, normalized mutual information: 91.79%) on microbiome data. It suggests the potential application of MHSNMF in microbiome data analysis. Conclusions Results show that the proposed MHSNMF algorithm can effectively combine the phylogenetic, transporter, and metabolic profiles into a unified paradigm to analyze the relationships among different microbiome samples. Furthermore, the proposed prediction method based on MHSNMF has been shown to be effective in judging the types of new microbiome samples.

Download Full-text

Robust hypergraph regularized non-negative matrix factorization for sample clustering and feature selection in multi-view gene expression data

Human Genomics ◽

10.1186/s40246-019-0222-6 ◽

2019 ◽

Vol 13 (S1) ◽

Cited By ~ 1

Author(s):

Na Yu ◽

Ying-Lian Gao ◽

Jin-Xing Liu ◽

Juan Wang ◽

Junliang Shang

Keyword(s):

Feature Selection ◽

Matrix Factorization ◽

Gene Selection ◽

Matrix Decomposition ◽

Original Data ◽

Data Representation ◽

Abnormal Expression ◽

Sample Points ◽

Laplacian Regularization ◽

Non Negative Matrix Factorization

Abstract Background As one of the most popular data representation methods, non-negative matrix decomposition (NMF) has been widely concerned in the tasks of clustering and feature selection. However, most of the previously proposed NMF-based methods do not adequately explore the hidden geometrical structure in the data. At the same time, noise and outliers are inevitably present in the data. Results To alleviate these problems, we present a novel NMF framework named robust hypergraph regularized non-negative matrix factorization (RHNMF). In particular, the hypergraph Laplacian regularization is imposed to capture the geometric information of original data. Unlike graph Laplacian regularization which captures the relationship between pairwise sample points, it captures the high-order relationship among more sample points. Moreover, the robustness of the RHNMF is enhanced by using the L2,1-norm constraint when estimating the residual. This is because the L2,1-norm is insensitive to noise and outliers. Conclusions Clustering and common abnormal expression gene (com-abnormal expression gene) selection are conducted to test the validity of the RHNMF model. Extensive experimental results on multi-view datasets reveal that our proposed model outperforms other state-of-the-art methods.

Download Full-text

Group Sparse Joint Non-negative Matrix Factorization on Orthogonal Subspace for Multi-modal Imaging Genetics Data Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2020.2999397 ◽

2020 ◽

pp. 1-1

Author(s):

Peng Peng ◽

Yipu Zhang ◽

Yongfeng Ju ◽

Kaiming Wang ◽

Gang Li ◽

...

Keyword(s):

Data Analysis ◽

Matrix Factorization ◽

Imaging Genetics ◽

Orthogonal Subspace ◽

Non Negative Matrix Factorization

Download Full-text

Part-Based Data Analysis with Masked Non-negative Matrix Factorization

Computational Science and Its Applications – ICCSA 2014 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-09153-2_33 ◽

2014 ◽

pp. 440-454 ◽

Cited By ~ 1

Author(s):

Gabriella Casalino ◽

Nicoletta Del Buono ◽

Corrado Mencar

Keyword(s):

Data Analysis ◽

Matrix Factorization ◽

Non Negative Matrix Factorization

Download Full-text

THz spectral data analysis and components unmixing based on non-negative matrix factorization methods

Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy ◽

10.1016/j.saa.2017.01.009 ◽

2017 ◽

Vol 177 ◽

pp. 49-57 ◽

Cited By ~ 1

Author(s):

Yehao Ma ◽

Xian Li ◽

Pingjie Huang ◽

Dibo Hou ◽

Qiang Wang ◽

...

Keyword(s):

Data Analysis ◽

Spectral Data ◽

Matrix Factorization ◽

Factorization Methods ◽

Spectral Data Analysis ◽

Non Negative Matrix Factorization

Download Full-text

Optimal Recovery of Missing Values for Non-negative Matrix Factorization

10.1101/647560 ◽

2019 ◽

Author(s):

Rebecca Chen ◽

Lav R. Varshney

Keyword(s):

Matrix Factorization ◽

Missing Values ◽

Image Data ◽

Optimal Recovery ◽

Upper Bounds ◽

Biological Data ◽

Geometric Conditions ◽

Theoretic Technique ◽

Better Than ◽

Non Negative Matrix Factorization

AbstractWe extend the approximation-theoretic technique of optimal recovery to the setting of imputing missing values in clustered data, specifically for non-negative matrix factorization (NMF), and develop an implementable algorithm. Under certain geometric conditions, we prove tight upper bounds on NMF relative error, which is the first bound of this type for missing values. We also give probabilistic bounds for the same geometric assumptions. Experiments on image data and biological data show that this theoretically-grounded technique performs as well as or better than other imputation techniques that account for local structure.

Download Full-text