Low-Rank Matrix Factorization and Co-clustering Algorithms for Analyzing Large Data Sets

Low rank matrix factorizations(LRMF) have attracted much attention due to its wide range of applications in computer vision, such as image impainting and video denoising. Most of the existing methods assume that the loss between an observed measurement matrix and its bilinear factorization follows symmetric distribution, like gaussian or gamma families. However, in real-world situations, this assumption is often found too idealized, because pictures under various illumination and angles may suffer from multi-peaks, asymmetric and irregular noises. To address these problems, this paper assumes that the loss follows a mixture of Asymmetric Laplace distributions and proposes robust Asymmetric Laplace Adaptive Matrix Factorization model(ALAMF) under bayesian matrix factorization framework. The assumption of Laplace distribution makes our model more robust and the asymmetric attribute makes our model more flexible and adaptable to real-world noise. A variational method is then devised for model inference. We compare ALAMF with other state-of-the-art matrix factorization methods both on data sets ranging from synthetic and real-world application. The experimental results demonstrate the effectiveness of our proposed approach.

Download Full-text

Robust MST-Based Clustering Algorithm

Neural Computation ◽

10.1162/neco_a_01081 ◽

2018 ◽

Vol 30 (6) ◽

pp. 1624-1646 ◽

Cited By ~ 1

Author(s):

Qidong Liu ◽

Ruisheng Zhang ◽

Zhili Zhao ◽

Zhenghai Wang ◽

Mengyao Jiao ◽

...

Keyword(s):

Clustering Algorithm ◽

Minimum Spanning Tree ◽

Clustering Algorithms ◽

Low Rank ◽

Data Sets ◽

Real World Data ◽

Data Set ◽

Rank Matrix ◽

Data Points ◽

Low Rank Matrix

Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.

Download Full-text

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

10.21437/interspeech.2018-1417 ◽

2018 ◽

Cited By ~ 39

Author(s):

Daniel Povey ◽

Gaofeng Cheng ◽

Yiming Wang ◽

Ke Li ◽

Hainan Xu ◽

...

Keyword(s):

Neural Networks ◽

Matrix Factorization ◽

Deep Neural Networks ◽

Low Rank ◽

Rank Matrix ◽

Low Rank Matrix

Download Full-text

Single-cell data clustering based on sparse optimization and low-rank matrix factorization

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab098 ◽

2021 ◽

Author(s):

Yinlei Hu ◽

Bin Li ◽

Falai Chen ◽

Kun Qu

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Matrix Factorization ◽

Data Clustering ◽

Cell Types ◽

Low Rank ◽

Sequencing Data ◽

Rank Matrix ◽

Single Cell Rna Sequencing ◽

Low Rank Matrix

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.

Download Full-text

Structural Identifiability in Low-Rank Matrix Factorization

Algorithmica ◽

10.1007/s00453-009-9331-2 ◽

2009 ◽

Vol 56 (3) ◽

pp. 313-332 ◽

Cited By ~ 4

Author(s):

Epameinondas Fritzilas ◽

Martin Milanič ◽

Sven Rahmann ◽

Yasmin A. Rios-Solis

Keyword(s):

Matrix Factorization ◽

Low Rank ◽

Structural Identifiability ◽

Rank Matrix ◽

Low Rank Matrix

Download Full-text

Efficient global optimization for exponential family PCA and low-rank matrix factorization

2008 46th Annual Allerton Conference on Communication, Control, and Computing ◽

10.1109/allerton.2008.4797683 ◽

2008 ◽

Cited By ~ 2

Author(s):

Yuhong Guo ◽

Dale Schuurmans

Keyword(s):

Global Optimization ◽

Matrix Factorization ◽

Exponential Family ◽

Low Rank ◽

Efficient Global Optimization ◽

Rank Matrix ◽

Low Rank Matrix

Download Full-text

A non-convex optimization framework for large-scale low-rank matrix factorization

10.36227/techrxiv.12199026 ◽

2020 ◽

Author(s):

Sajad Fathi Hafshejani ◽

Saeed Vahidian ◽

Zahra Moaberfard ◽

Reza Alikhani ◽

Bill Lin

Keyword(s):

Matrix Factorization ◽

Large Scale ◽

Significant Loss ◽

Low Rank ◽

Function Evaluation ◽

Dimensional Manifold ◽

Step Size ◽

Rank Matrix ◽

Real World Datasets ◽

Low Rank Matrix

Low-rank matrix factorization problems such as non negative matrix factorization (NMF) can be categorized as a clustering or dimension reduction technique. The latter denotes techniques designed to find representations of some high dimensional dataset in a lower dimensional manifold without a significant loss of information. If such a representation exists, the features ought to contain the most relevant features of the dataset. Many linear dimensionality reduction techniques can be formulated as a matrix factorization. In this paper, we combine the conjugate gradient (CG) method with the Barzilai and Borwein (BB) gradient method, and propose a BB scaling CG method for NMF problems. The new method does not require to compute and store matrices associated with Hessian of the objective functions. Moreover, adopting a suitable BB step size along with a proper nonmonotone strategy which comes by the size convex parameter $\eta_k$, results in a new algorithm that can significantly improve the CPU time, efficiency, the number of function evaluation. Convergence result is established and numerical comparisons of methods on both synthetic and real-world datasets show that the proposed method is efficient in comparison with existing methods and demonstrate the superiority of our algorithms.

Download Full-text

DLSLA 3-D SAR Imaging via Sparse Recovery Through Combination of Nuclear Norm and Low-Rank Matrix Factorization

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3100715 ◽

2021 ◽

pp. 1-13

Author(s):

Tong Gu ◽

Guisheng Liao ◽

Yachao Li ◽

Yifan Guo ◽

Yongjun Liu

Keyword(s):

Matrix Factorization ◽

Sparse Recovery ◽

Nuclear Norm ◽

Low Rank ◽

Rank Matrix ◽

Sar Imaging ◽

Low Rank Matrix

Download Full-text

Uncertainty-Based Clustering Algorithms for Large Data Sets

Modern Technologies for Big Data Classification and Clustering - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2805-0.ch001 ◽

2018 ◽

pp. 1-33 ◽

Cited By ~ 1

Author(s):

B. K. Tripathy ◽

Hari Seetha ◽

M. N. Murty

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Large Data Sets ◽

Mining Machine ◽

Data Sets ◽

Fuzzy C Means ◽

Intuitionistic Fuzzy ◽

New Algorithms

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.

Download Full-text

Multi-view low-rank matrix factorization using multiple manifold regularization

Neurocomputing ◽

10.1016/j.neucom.2019.01.004 ◽

2019 ◽

Vol 335 ◽

pp. 143-152 ◽

Cited By ~ 4

Author(s):

Shengxiang Gao ◽

Zhengtao Yu ◽

Taisong Jin ◽

Ming Yin

Keyword(s):

Matrix Factorization ◽

Low Rank ◽

Manifold Regularization ◽

Rank Matrix ◽

Low Rank Matrix

Download Full-text