Scalable Probabilistic Matrix Factorization with Graph-Based Priors

Jonathan Strahl; Jaakko Peltonen; Hirsohi Mamitsuka; Samuel Kaski

doi:10.1609/aaai.v34i04.6043

Scalable Probabilistic Matrix Factorization with Graph-Based Priors

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6043 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5851-5858

Author(s):

Jonathan Strahl ◽

Jaakko Peltonen ◽

Hirsohi Mamitsuka ◽

Samuel Kaski

Keyword(s):

Matrix Factorization ◽

Prediction Accuracy ◽

Side Information ◽

Matrix Completion ◽

Real Data ◽

Data Matrix ◽

Laptop Computer ◽

Completion Problem ◽

Graphical Lasso ◽

The Matrix

In matrix factorization, available graph side-information may not be well suited for the matrix completion problem, having edges that disagree with the latent-feature relations learnt from the incomplete data matrix. We show that removing these contested edges improves prediction accuracy and scalability. We identify the contested edges through a highly-efficient graphical lasso approximation. The identification and removal of contested edges adds no computational complexity to state-of-the-art graph-regularized matrix factorization, remaining linear with respect to the number of non-zeros. Computational load even decreases proportional to the number of edges removed. Formulating a probabilistic generative model and using expectation maximization to extend graph-regularised alternating least squares (GRALS) guarantees convergence. Rich simulated experiments illustrate the desired properties of the resulting algorithm. On real data experiments we demonstrate improved prediction accuracy with fewer graph edges (empirical evidence that graph side-information is often inaccurate). A 300 thousand dimensional graph with three million edges (Yahoo music side-information) can be analyzed in under ten minutes on a standard laptop computer demonstrating the efficiency of our graph update.

Download Full-text

Three-way formal concept clustering technique for matrix completion in recommender system

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-07-2019-0055 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Chemmalar Selvi G. ◽

Lakshmi Priya G.G.

Keyword(s):

Recommender Systems ◽

Error Rate ◽

Prediction Accuracy ◽

Matrix Completion ◽

Sparse Data ◽

Data Matrix ◽

Formal Concept ◽

Data Loss ◽

Content Type ◽

The Matrix

Purpose In today’s world, the recommender systems are very valuable systems for the online users, as the World Wide Web is loaded with plenty of available information causing the online users to spend more time and money. The recommender systems suggest some possible and relevant recommendation to the online users by applying the recommendation filtering techniques to the available source of information. The recommendation filtering techniques take the input data denoted as the matrix representation which is generally very sparse and high dimensional data in nature. Hence, the sparse data matrix is completed by filling the unknown or missing entries by using many matrix completion techniques. One of the most popular techniques used is the matrix factorization (MF) which aims to decompose the sparse data matrix into two new and small dimensional data matrix and whose dot product completes the matrix by filling the logical values. However, the MF technique failed to retain the loss of original information when it tried to decompose the matrix, and the error rate is relatively high which clearly shows the loss of such valuable information. Design/methodology/approach To alleviate the problem of data loss and data sparsity, the new algorithm from formal concept analysis (FCA), a mathematical model, is proposed for matrix completion which aims at filling the unknown or missing entries without loss of valuable information to a greater extent. The proposed matrix completion algorithm uses the clustering technique where the users who have commonly rated the items and have not commonly rated the items are captured into two classes. The matrix completion algorithm fills the mean cluster value of the unknown entries which well completes the matrix without actually decomposing the matrix. Findings The experiment was conducted on the available public data set, MovieLens, whose result shows the prediction error rate is minimal, and the comparison with the existing algorithms is also studied. Thus, the application of FCA in recommender systems proves minimum or no data loss and improvement in the prediction accuracy of rating score. Social implications The proposed matrix completion algorithm using FCA performs good recommendation which will be more useful for today’s online users in making decision with regard to the online purchasing of products. Originality/value This paper presents the new technique of matrix completion adopting the vital properties from FCA which is applied in the recommender systems. Hence, the proposed algorithm performs well when compared to other existing algorithms in terms of prediction accuracy.

Download Full-text

Differentiating isoform functions with collaborative matrix factorization

Bioinformatics ◽

10.1093/bioinformatics/btz847 ◽

2019 ◽

Author(s):

Keyao Wang ◽

Jun Wang ◽

Carlotta Domeniconi ◽

Xiangliang Zhang ◽

Guoxian Yu

Keyword(s):

Matrix Factorization ◽

Characteristic Curve ◽

Function Prediction ◽

Low Rank ◽

Data Matrix ◽

Supplementary Information ◽

Genomic Databases ◽

Gene Level ◽

The Matrix ◽

Level Function

Abstract Motivation Isoforms are alternatively spliced mRNAs of genes. They can be translated into different functional proteoforms, and thus greatly increase the functional diversity of protein variants (or proteoforms). Differentiating the functions of isoforms (or proteoforms) helps understanding the underlying pathology of various complex diseases at a deeper granularity. Since existing functional genomic databases uniformly record the annotations at the gene-level, and rarely record the annotations at the isoform-level, differentiating isoform functions is more challenging than the traditional gene-level function prediction. Results Several approaches have been proposed to differentiate the functions of isoforms. They generally follow the multi-instance learning paradigm by viewing each gene as a bag and the spliced isoforms as its instances, and push functions of bags onto instances. These approaches implicitly assume the collected annotations of genes are complete and only integrate multiple RNA-seq datasets. As such, they have compromised performance. We propose a data integrative solution (called DisoFun) to Differentiate isoform Functions with collaborative matrix factorization. DisoFun assumes the functional annotations of genes are aggregated from those of key isoforms. It collaboratively factorizes the isoform data matrix and gene-term data matrix (storing Gene Ontology (GO) annotations of genes) into low-rank matrices to simultaneously explore the latent key isoforms, and achieve function prediction by aggregating predictions to their originating genes. In addition, it leverages the PPI network and GO structure to further coordinate the matrix factorization. Extensive experimental results show that DisoFun improves the AUROC (area under the receiver-operating characteristic curve) and AUPRC (area under the precision-recall curve) of existing solutions by at least 7.7% and 28.9%, respectively. We further investigate DisoFun on four exemplar genes (LMNA, ADAM15, BCL2L1, and CFLAR) with known functions at the isoform-level, and observed that DisoFun can differentiate functions of their isoforms with 90.5% accuracy. Availability The code of DisoFun is available at mlda.swu.edu.cn/codes.php?name=DisoFun. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

On the Parameterized Complexity of Clustering Incomplete Data into Subspaces of Small Rank

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5804 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3906-3913

Author(s):

Robert Ganian ◽

Iyad Kanj ◽

Sebastian Ordyniak ◽

Stefan Szeider

Keyword(s):

Machine Learning ◽

Parameterized Complexity ◽

Linear Codes ◽

Matrix Completion ◽

Low Rank ◽

Input Constraints ◽

Binary Linear Codes ◽

Completion Problem ◽

The Matrix ◽

Complete Matrix

We consider a fundamental matrix completion problem where we are given an incomplete matrix and a set of constraints modeled as a CSP instance. The goal is to complete the matrix subject to the input constraints and in such a way that the complete matrix can be clustered into few subspaces with low rank. This problem generalizes several problems in data mining and machine learning, including the problem of completing a matrix into one with minimum rank. In addition to its ubiquitous applications in machine learning, the problem has strong connections to information theory, related to binary linear codes, and variants of it have been extensively studied from that perspective. We formalize the problem mentioned above and study its classical and parameterized complexity. We draw a detailed landscape of the complexity and parameterized complexity of the problem with respect to several natural parameters that are desirably small and with respect to several well-studied CSP fragments.

Download Full-text

Modeling Implicit Trust in Matrix Factorization-Based Collaborative Filtering

Applied Sciences ◽

10.3390/app9204378 ◽

2019 ◽

Vol 9 (20) ◽

pp. 4378 ◽

Cited By ~ 2

Author(s):

Yuan ◽

Zahir ◽

Yang

Keyword(s):

Collaborative Filtering ◽

Matrix Factorization ◽

Prediction Accuracy ◽

State Of The Art ◽

Side Information ◽

Initial Trust ◽

The Social ◽

Implicit Trust ◽

Value Decomposition ◽

Better Than

Recommendation systems often use side information to both alleviate problems, such as the cold start problem and data sparsity, and increase prediction accuracy. One such piece of side information, which has been widely investigated in addressing such challenges, is trust. However, the difficulty in obtaining explicit relationship data has led researchers to infer trust values from other means such as the user-to-item relationship. This paper proposes a model to improve prediction accuracy by applying the trust relationship between the user and item ratings. Two approaches to implement trust into prediction are proposed: one involves the use of estimated trust, and the other involves the initial trust. The efficiency of the proposed method is verified by comparing the obtained results with four well-known methods, including the state-of-the-art deep learning-based method of neural graph collaborative filtering (NGCF). The experimental results demonstrate that the proposed method performs significantly better than the NGCF, and the three other matrix factorization methods, namely, the singular value decomposition (SVD), SVD++, and the social matrix factorization (SocialMF).

Download Full-text

Coupled matrix–matrix and coupled tensor–matrix completion methods for predicting drug–target interactions

Briefings in Bioinformatics ◽

10.1093/bib/bbaa025 ◽

2020 ◽

Cited By ~ 1

Author(s):

Maryam Bagherian ◽

Renaid B Kim ◽

Cheng Jiang ◽

Maureen A Sartor ◽

Harm Derksen ◽

...

Keyword(s):

Matrix Factorization ◽

Drug Target ◽

Drug Repositioning ◽

Matrix Completion ◽

Area Under The Curve ◽

Drug Repurposing ◽

Computational Prediction ◽

Comprehensive Information ◽

The Matrix ◽

Benchmark Datasets

Abstract Predicting the interactions between drugs and targets plays an important role in the process of new drug discovery, drug repurposing (also known as drug repositioning). There is a need to develop novel and efficient prediction approaches in order to avoid the costly and laborious process of determining drug–target interactions (DTIs) based on experiments alone. These computational prediction approaches should be capable of identifying the potential DTIs in a timely manner. Matrix factorization methods have been proven to be the most reliable group of methods. Here, we first propose a matrix factorization-based method termed ‘Coupled Matrix–Matrix Completion’ (CMMC). Next, in order to utilize more comprehensive information provided in different databases and incorporate multiple types of scores for drug–drug similarities and target–target relationship, we then extend CMMC to ‘Coupled Tensor–Matrix Completion’ (CTMC) by considering drug–drug and target–target similarity/interaction tensors. Results: Evaluation on two benchmark datasets, DrugBank and TTD, shows that CTMC outperforms the matrix-factorization-based methods: GRMF, $L_{2,1}$-GRMF, NRLMF and NRLMF$\beta $. Based on the evaluation, CMMC and CTMC outperform the above three methods in term of area under the curve, F1 score, sensitivity and specificity in a considerably shorter run time.

Download Full-text

State covariances and the matrix completion problem

52nd IEEE Conference on Decision and Control ◽

10.1109/cdc.2013.6760127 ◽

2013 ◽

Cited By ~ 7

Author(s):

Yongxin Chen ◽

Mihailo R. Jovanovic ◽

Tryphon T. Georgiou

Keyword(s):

Matrix Completion ◽

Completion Problem ◽

Matrix Completion Problem ◽

The Matrix

Download Full-text

Speech Recognition: B

Probability in Electrical Engineering and Computer Science ◽

10.1007/978-3-030-49995-2_12 ◽

2021 ◽

pp. 217-242

Author(s):

Jean Walrand

Keyword(s):

Matrix Completion ◽

Gradient Projection ◽

Stochastic Gradient ◽

Projection Algorithm ◽

General Technique ◽

Gradient Projection Algorithm ◽

Completion Problem ◽

Matrix Completion Problem ◽

Online Learning Algorithms ◽

The Matrix

AbstractOnline learning algorithms update their estimates as additional observations are made. Section 12.1 explains a simple example: online linear regression. The stochastic gradient projection algorithm is a general technique to update estimates based on additional observations; it is widely used in machine learning. Section 12.2 presents the theory behind that algorithm. When analyzing large amounts of data, one faces the problems of identifying the most relevant data and of how to use efficiently the available data. Section 12.3 explains three examples of how these questions are addressed: the LASSO algorithm, compressed sensing, and the matrix completion problem. Section 12.4 discusses deep neural networks for which the stochastic gradient projection algorithm is easy to implement.

Download Full-text

Constructing Confidence Sets for the Matrix Completion Problem

Springer Proceedings in Mathematics & Statistics - Nonparametric Statistics ◽

10.1007/978-3-319-96941-1_7 ◽

2018 ◽

pp. 103-118 ◽

Cited By ~ 1

Author(s):

A. Carpentier ◽

O. Klopp ◽

M. Löffler

Keyword(s):

Matrix Completion ◽

Confidence Sets ◽

Completion Problem ◽

Matrix Completion Problem ◽

The Matrix

Download Full-text

Rank Selection in Nonnegative Matrix Factorization using Minimum Description Length

Neural Computation ◽

10.1162/neco_a_00980 ◽

2017 ◽

Vol 29 (8) ◽

pp. 2164-2176 ◽

Cited By ~ 9

Author(s):

Steven Squires ◽

Adam Prügel-Bennett ◽

Mahesan Niranjan

Keyword(s):

Matrix Factorization ◽

Nonnegative Matrix Factorization ◽

Minimum Description Length ◽

Synthetic Data ◽

Nonnegative Matrix ◽

Real Data ◽

Data Matrix ◽

Constraint Forces ◽

Data Points ◽

Linear Dimensionality Reduction

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.

Download Full-text

Enhancing Matrix Completion Using a Modified Second-Order Total Variation

Discrete Dynamics in Nature and Society ◽

10.1155/2018/2598160 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Wendong Wang ◽

Jianjun Wang

Keyword(s):

Total Variation ◽

Optimization Problem ◽

State Of The Art ◽

Matrix Completion ◽

Second Order ◽

Low Rank ◽

Superior Performance ◽

Completion Problem ◽

Matrix Completion Problem ◽

The Matrix

In this paper, we propose a new method to deal with the matrix completion problem. Different from most existing matrix completion methods that only pursue the low rank of underlying matrices, the proposed method simultaneously optimizes their low rank and smoothness such that they mutually help each other and hence yield a better performance. In particular, the proposed method becomes very competitive with the introduction of a modified second-order total variation, even when it is compared with some recently emerged matrix completion methods that also combine the low rank and smoothness priors of matrices together. An efficient algorithm is developed to solve the induced optimization problem. The extensive experiments further confirm the superior performance of the proposed method over many state-of-the-art methods.

Download Full-text