Low-Rank Matrix Factorization and Co-clustering Algorithms for Analyzing Large Data Sets

Author(s):  
Archana Donavalli ◽  
Manjeet Rege ◽  
Xumin Liu ◽  
Kourosh Jafari-Khouzani
Author(s):  
Xin Guo ◽  
Boyuan Pan ◽  
Deng Cai ◽  
Xiaofei He

Low rank matrix factorizations(LRMF) have attracted much attention due to its wide range of applications in computer vision, such as image impainting and video denoising. Most of the existing methods assume that the loss between an observed measurement matrix and its bilinear factorization follows symmetric distribution, like gaussian or gamma families. However, in real-world situations, this assumption is often found too idealized, because pictures under various illumination and angles may suffer from multi-peaks, asymmetric and irregular noises. To address these problems, this paper assumes that the loss follows a mixture of Asymmetric Laplace distributions and proposes robust Asymmetric Laplace Adaptive Matrix Factorization model(ALAMF) under bayesian matrix factorization framework. The assumption of Laplace distribution makes our model more robust and the asymmetric attribute makes our model more flexible and adaptable to real-world noise. A variational method is then devised for model inference. We compare ALAMF with other state-of-the-art matrix factorization methods both on data sets ranging from synthetic and real-world application. The experimental results demonstrate the effectiveness of our proposed approach.


2018 ◽  
Vol 30 (6) ◽  
pp. 1624-1646 ◽  
Author(s):  
Qidong Liu ◽  
Ruisheng Zhang ◽  
Zhili Zhao ◽  
Zhenghai Wang ◽  
Mengyao Jiao ◽  
...  

Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.


Author(s):  
Daniel Povey ◽  
Gaofeng Cheng ◽  
Yiming Wang ◽  
Ke Li ◽  
Hainan Xu ◽  
...  

Author(s):  
Yinlei Hu ◽  
Bin Li ◽  
Falai Chen ◽  
Kun Qu

Abstract Unsupervised clustering is a fundamental step of single-cell RNA sequencing data analysis. This issue has inspired several clustering methods to classify cells in single-cell RNA sequencing data. However, accurate prediction of the cell clusters remains a substantial challenge. In this study, we propose a new algorithm for single-cell RNA sequencing data clustering based on Sparse Optimization and low-rank matrix factorization (scSO). We applied our scSO algorithm to analyze multiple benchmark datasets and showed that the cluster number predicted by scSO was close to the number of reference cell types and that most cells were correctly classified. Our scSO algorithm is available at https://github.com/QuKunLab/scSO. Overall, this study demonstrates a potent cell clustering approach that can help researchers distinguish cell types in single-cell RNA sequencing data.


Algorithmica ◽  
2009 ◽  
Vol 56 (3) ◽  
pp. 313-332 ◽  
Author(s):  
Epameinondas Fritzilas ◽  
Martin Milanič ◽  
Sven Rahmann ◽  
Yasmin A. Rios-Solis

2020 ◽  
Author(s):  
Sajad Fathi Hafshejani ◽  
Saeed Vahidian ◽  
Zahra Moaberfard ◽  
Reza Alikhani ◽  
Bill Lin

Low-rank matrix factorization problems such as non negative matrix factorization (NMF) can be categorized as a clustering or dimension reduction technique. The latter denotes techniques designed to find representations of some high dimensional dataset in a lower dimensional manifold without a significant loss of information. If such a representation exists, the features ought to contain the most relevant features of the dataset. Many linear dimensionality reduction techniques can be formulated as a matrix factorization. In this paper, we combine the conjugate gradient (CG) method with the Barzilai and Borwein (BB) gradient method, and propose a BB scaling CG method for NMF problems. The new method does not require to compute and store matrices associated with Hessian of the objective functions. Moreover, adopting a suitable BB step size along with a proper nonmonotone strategy which comes by the size convex parameter $\eta_k$, results in a new algorithm that can significantly improve the CPU time, efficiency, the number of function evaluation. Convergence result is established and numerical comparisons of methods on both synthetic and real-world datasets show that the proposed method is efficient in comparison with existing methods and demonstrate the superiority of our algorithms.


Author(s):  
B. K. Tripathy ◽  
Hari Seetha ◽  
M. N. Murty

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.


2019 ◽  
Vol 335 ◽  
pp. 143-152 ◽  
Author(s):  
Shengxiang Gao ◽  
Zhengtao Yu ◽  
Taisong Jin ◽  
Ming Yin

Sign in / Sign up

Export Citation Format

Share Document