Optimal Recovery of Missing Values for Non-negative Matrix Factorization

Mapping Intimacies ◽

10.1101/647560 ◽

2019 ◽

Author(s):

Rebecca Chen ◽

Lav R. Varshney

Keyword(s):

Matrix Factorization ◽

Missing Values ◽

Image Data ◽

Optimal Recovery ◽

Upper Bounds ◽

Biological Data ◽

Geometric Conditions ◽

Theoretic Technique ◽

Better Than ◽

Non Negative Matrix Factorization

AbstractWe extend the approximation-theoretic technique of optimal recovery to the setting of imputing missing values in clustered data, specifically for non-negative matrix factorization (NMF), and develop an implementable algorithm. Under certain geometric conditions, we prove tight upper bounds on NMF relative error, which is the first bound of this type for missing values. We also give probabilistic bounds for the same geometric assumptions. Experiments on image data and biological data show that this theoretically-grounded technique performs as well as or better than other imputation techniques that account for local structure.

Download Full-text

Optimal Recovery of Missing Values for Non-negative Matrix Factorization

IEEE Open Journal of Signal Processing ◽

10.1109/ojsp.2021.3069373 ◽

2021 ◽

pp. 1-1

Author(s):

Rebecca Chen Dean ◽

Lav Varshney

Keyword(s):

Matrix Factorization ◽

Missing Values ◽

Optimal Recovery ◽

Non Negative Matrix Factorization

Download Full-text

A zero-inflated non-negative matrix factorization for the deconvolution of mixed signals of biological data

The International Journal of Biostatistics ◽

10.1515/ijb-2020-0039 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Yixin Kong ◽

Ariangela Kozik ◽

Cindy H. Nakatsu ◽

Yava L. Jones-Hall ◽

Hyonho Chun

Keyword(s):

Matrix Factorization ◽

Factor Model ◽

R Package ◽

Biological Data ◽

Superior Performance ◽

Sequencing Data ◽

Fecal Microbiome ◽

Brain Gene Expression ◽

Cell Transcriptome ◽

Non Negative Matrix Factorization

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.

Download Full-text

A Non-negative Matrix Factorization Based Method for Identifying Essential Proteins

10.21203/rs.3.rs-537545/v1 ◽

2021 ◽

Author(s):

Zhihong Zhang ◽

Sai Hu ◽

Wei Yan ◽

Bihai Zhao ◽

Lei Wang

Keyword(s):

Protein Interaction ◽

Matrix Factorization ◽

Biological Data ◽

Protein Domain ◽

Biological Information ◽

Ppi Network ◽

Essential Proteins ◽

Protein Protein Interaction ◽

Ppi Networks ◽

Non Negative Matrix Factorization

Abstract BackgroundIdentification of essential proteins is very important for understanding the basic requirements to sustain a living organism. In recent years, various different computational methods have been proposed to identify essential proteins based on protein-protein interaction (PPI) networks. However, there has been reliable evidence that a huge amount of false negatives and false positives exist in PPI data. Therefore, it is necessary to reduce the influence of false data on accuracy of essential proteins prediction by integrating multi-source biological information with PPI networks.ResultsIn this paper, we proposed a non-negative matrix factorization and multiple biological information based model (NDM) for identifying essential proteins. The first stage in this progress was to construct a weighted PPI network by combing the information of protein domain, protein complex and the topology characteristic of the original PPI network. Then, the non-negative matrix factorization technique was used to reconstruct an optimized PPI network with whole enough weight of edges. In the final stage, the ranking score of each protein was computed by the PageRank algorithm in which the initial scores were calculated with homologous and subcellular localization information. In order to verify the effectiveness of the NDM method, we compared the NDM with other state-of-the-art essential proteins prediction methods. The comparison of the results obtained from different methods indicated that our NDM model has better performance in predicting essential proteins.ConclusionEmploying the non-negative matrix factorization and integrating multi-source biological data can effectively improve quality of the PPI network, which resulted in the led to optimization of the performance essential proteins identification. This will also provide a new perspective for other prediction based on protein-protein interaction networks.

Download Full-text

Non-Negative Matrix Factorization of Clustered Data with Missing Values

2019 IEEE Data Science Workshop (DSW) ◽

10.1109/dsw.2019.8755555 ◽

2019 ◽

Cited By ~ 1

Author(s):

Rebecca Chen ◽

Lav R. Varshney

Keyword(s):

Matrix Factorization ◽

Missing Values ◽

Clustered Data ◽

Non Negative Matrix Factorization

Download Full-text

Low-complexity privacy preserving scheme based on compressed sensing and non-negative matrix factorization for image data

Optics and Lasers in Engineering ◽

10.1016/j.optlaseng.2020.106056 ◽

2020 ◽

Vol 129 ◽

pp. 106056

Author(s):

Jia Liang ◽

Di Xiao ◽

Mengdi Wang ◽

Min Li ◽

Ran Liu

Keyword(s):

Compressed Sensing ◽

Matrix Factorization ◽

Image Data ◽

Low Complexity ◽

Privacy Preserving ◽

Non Negative Matrix Factorization

Download Full-text

Using non-negative matrix factorization toward finding an informative basis in spin-image data

10.1117/12.776964 ◽

2008 ◽

Author(s):

Andrew J. Patterson ◽

Nitesh N. Shah ◽

Donald E. Waagen

Keyword(s):

Matrix Factorization ◽

Image Data ◽

Spin Image ◽

Non Negative Matrix Factorization

Download Full-text

Sampled and discretized of short-time Fourier transform and non-negative matrix factorization: the single-channel source separation case

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.2020.13858 ◽

2020 ◽

Vol 9 (1) ◽

pp. 41-48

Author(s):

Jans Hendry ◽

Isnan Nur Rifai ◽

Yoga Mileniandi

Keyword(s):

Fourier Transform ◽

Matrix Factorization ◽

Single Channel ◽

Source Separation ◽

Short Time Fourier Transform ◽

Time Frequency ◽

Frequency Representation ◽

Short Time ◽

Better Than ◽

Non Negative Matrix Factorization

The Short-time Fourier transform (STFT) is a popular time-frequency representation in many source separation problems. In this work, the sampled and discretized version of Discrete Gabor Transform (DGT) is proposed to replace STFT within the single-channel source separation problem of the Non-negative Matrix Factorization (NMF) framework. The result shows that NMF-DGT is better than NMF-STFT according to Signal-to-Interference Ratio (SIR), Signal-to-Artifact Ratio (SAR), and Signal-to-Distortion Ratio (SDR). In the supervised scheme, NMF-DGT has a SIR of 18.60 dB compared to 16.24 dB in NMF-STFT, SAR of 13.77 dB to 13.69 dB, and SDR of 12.45 dB to 11.16 dB. In the unsupervised scheme, NMF-DGT has a SIR of 0.40 dB compared to 0.27 dB by NMF-STFT, SAR of -10.21 dB to -10.36 dB, and SDR of -15.01 dB to -15.23 dB.

Download Full-text

Microbiome Data Analysis by Symmetric Non-negative Matrix Factorization With Local and Global Regularization

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.643014 ◽

2021 ◽

Vol 8 ◽

Author(s):

Junmin Zhao ◽

Yuanyuan Ma ◽

Lifang Liu

Keyword(s):

Data Analysis ◽

Matrix Factorization ◽

Original Data ◽

Biological Data ◽

Fine Grained ◽

Laplacian Graph ◽

Fine Grained Structure ◽

Microbiome Data ◽

Global And Local ◽

Non Negative Matrix Factorization

A network is an efficient tool to organize complicated data. The Laplacian graph has attracted more and more attention for its good properties and has been applied to many tasks including clustering, feature selection, and so on. Recently, studies have indicated that though the Laplacian graph can capture the global information of data, it lacks the power to capture fine-grained structure inherent in network. In contrast, a Vicus matrix can make full use of local topological information from the data. Given this consideration, in this paper we simultaneously introduce Laplacian and Vicus graphs into a symmetric non-negative matrix factorization framework (LVSNMF) to seek and exploit the global and local structure patterns that inherent in the original data. Extensive experiments are conducted on three real datasets (cancer, cell populations, and microbiome data). The experimental results show the proposed LVSNMF algorithm significantly outperforms other competing algorithms, suggesting its potential in biological data analysis.

Download Full-text