scholarly journals Scalable Probabilistic Matrix Factorization with Graph-Based Priors

2020 ◽  
Vol 34 (04) ◽  
pp. 5851-5858
Author(s):  
Jonathan Strahl ◽  
Jaakko Peltonen ◽  
Hirsohi Mamitsuka ◽  
Samuel Kaski

In matrix factorization, available graph side-information may not be well suited for the matrix completion problem, having edges that disagree with the latent-feature relations learnt from the incomplete data matrix. We show that removing these contested edges improves prediction accuracy and scalability. We identify the contested edges through a highly-efficient graphical lasso approximation. The identification and removal of contested edges adds no computational complexity to state-of-the-art graph-regularized matrix factorization, remaining linear with respect to the number of non-zeros. Computational load even decreases proportional to the number of edges removed. Formulating a probabilistic generative model and using expectation maximization to extend graph-regularised alternating least squares (GRALS) guarantees convergence. Rich simulated experiments illustrate the desired properties of the resulting algorithm. On real data experiments we demonstrate improved prediction accuracy with fewer graph edges (empirical evidence that graph side-information is often inaccurate). A 300 thousand dimensional graph with three million edges (Yahoo music side-information) can be analyzed in under ten minutes on a standard laptop computer demonstrating the efficiency of our graph update.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Chemmalar Selvi G. ◽  
Lakshmi Priya G.G.

Purpose In today’s world, the recommender systems are very valuable systems for the online users, as the World Wide Web is loaded with plenty of available information causing the online users to spend more time and money. The recommender systems suggest some possible and relevant recommendation to the online users by applying the recommendation filtering techniques to the available source of information. The recommendation filtering techniques take the input data denoted as the matrix representation which is generally very sparse and high dimensional data in nature. Hence, the sparse data matrix is completed by filling the unknown or missing entries by using many matrix completion techniques. One of the most popular techniques used is the matrix factorization (MF) which aims to decompose the sparse data matrix into two new and small dimensional data matrix and whose dot product completes the matrix by filling the logical values. However, the MF technique failed to retain the loss of original information when it tried to decompose the matrix, and the error rate is relatively high which clearly shows the loss of such valuable information. Design/methodology/approach To alleviate the problem of data loss and data sparsity, the new algorithm from formal concept analysis (FCA), a mathematical model, is proposed for matrix completion which aims at filling the unknown or missing entries without loss of valuable information to a greater extent. The proposed matrix completion algorithm uses the clustering technique where the users who have commonly rated the items and have not commonly rated the items are captured into two classes. The matrix completion algorithm fills the mean cluster value of the unknown entries which well completes the matrix without actually decomposing the matrix. Findings The experiment was conducted on the available public data set, MovieLens, whose result shows the prediction error rate is minimal, and the comparison with the existing algorithms is also studied. Thus, the application of FCA in recommender systems proves minimum or no data loss and improvement in the prediction accuracy of rating score. Social implications The proposed matrix completion algorithm using FCA performs good recommendation which will be more useful for today’s online users in making decision with regard to the online purchasing of products. Originality/value This paper presents the new technique of matrix completion adopting the vital properties from FCA which is applied in the recommender systems. Hence, the proposed algorithm performs well when compared to other existing algorithms in terms of prediction accuracy.


2019 ◽  
Author(s):  
Keyao Wang ◽  
Jun Wang ◽  
Carlotta Domeniconi ◽  
Xiangliang Zhang ◽  
Guoxian Yu

Abstract Motivation Isoforms are alternatively spliced mRNAs of genes. They can be translated into different functional proteoforms, and thus greatly increase the functional diversity of protein variants (or proteoforms). Differentiating the functions of isoforms (or proteoforms) helps understanding the underlying pathology of various complex diseases at a deeper granularity. Since existing functional genomic databases uniformly record the annotations at the gene-level, and rarely record the annotations at the isoform-level, differentiating isoform functions is more challenging than the traditional gene-level function prediction. Results Several approaches have been proposed to differentiate the functions of isoforms. They generally follow the multi-instance learning paradigm by viewing each gene as a bag and the spliced isoforms as its instances, and push functions of bags onto instances. These approaches implicitly assume the collected annotations of genes are complete and only integrate multiple RNA-seq datasets. As such, they have compromised performance. We propose a data integrative solution (called DisoFun) to Differentiate isoform Functions with collaborative matrix factorization. DisoFun assumes the functional annotations of genes are aggregated from those of key isoforms. It collaboratively factorizes the isoform data matrix and gene-term data matrix (storing Gene Ontology (GO) annotations of genes) into low-rank matrices to simultaneously explore the latent key isoforms, and achieve function prediction by aggregating predictions to their originating genes. In addition, it leverages the PPI network and GO structure to further coordinate the matrix factorization. Extensive experimental results show that DisoFun improves the AUROC (area under the receiver-operating characteristic curve) and AUPRC (area under the precision-recall curve) of existing solutions by at least 7.7% and 28.9%, respectively. We further investigate DisoFun on four exemplar genes (LMNA, ADAM15, BCL2L1, and CFLAR) with known functions at the isoform-level, and observed that DisoFun can differentiate functions of their isoforms with 90.5% accuracy. Availability The code of DisoFun is available at mlda.swu.edu.cn/codes.php?name=DisoFun. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 34 (04) ◽  
pp. 3906-3913
Author(s):  
Robert Ganian ◽  
Iyad Kanj ◽  
Sebastian Ordyniak ◽  
Stefan Szeider

We consider a fundamental matrix completion problem where we are given an incomplete matrix and a set of constraints modeled as a CSP instance. The goal is to complete the matrix subject to the input constraints and in such a way that the complete matrix can be clustered into few subspaces with low rank. This problem generalizes several problems in data mining and machine learning, including the problem of completing a matrix into one with minimum rank. In addition to its ubiquitous applications in machine learning, the problem has strong connections to information theory, related to binary linear codes, and variants of it have been extensively studied from that perspective. We formalize the problem mentioned above and study its classical and parameterized complexity. We draw a detailed landscape of the complexity and parameterized complexity of the problem with respect to several natural parameters that are desirably small and with respect to several well-studied CSP fragments.


2019 ◽  
Vol 9 (20) ◽  
pp. 4378 ◽  
Author(s):  
Yuan ◽  
Zahir ◽  
Yang

Recommendation systems often use side information to both alleviate problems, such as the cold start problem and data sparsity, and increase prediction accuracy. One such piece of side information, which has been widely investigated in addressing such challenges, is trust. However, the difficulty in obtaining explicit relationship data has led researchers to infer trust values from other means such as the user-to-item relationship. This paper proposes a model to improve prediction accuracy by applying the trust relationship between the user and item ratings. Two approaches to implement trust into prediction are proposed: one involves the use of estimated trust, and the other involves the initial trust. The efficiency of the proposed method is verified by comparing the obtained results with four well-known methods, including the state-of-the-art deep learning-based method of neural graph collaborative filtering (NGCF). The experimental results demonstrate that the proposed method performs significantly better than the NGCF, and the three other matrix factorization methods, namely, the singular value decomposition (SVD), SVD++, and the social matrix factorization (SocialMF).


Author(s):  
Maryam Bagherian ◽  
Renaid B Kim ◽  
Cheng Jiang ◽  
Maureen A Sartor ◽  
Harm Derksen ◽  
...  

Abstract Predicting the interactions between drugs and targets plays an important role in the process of new drug discovery, drug repurposing (also known as drug repositioning). There is a need to develop novel and efficient prediction approaches in order to avoid the costly and laborious process of determining drug–target interactions (DTIs) based on experiments alone. These computational prediction approaches should be capable of identifying the potential DTIs in a timely manner. Matrix factorization methods have been proven to be the most reliable group of methods. Here, we first propose a matrix factorization-based method termed ‘Coupled Matrix–Matrix Completion’ (CMMC). Next, in order to utilize more comprehensive information provided in different databases and incorporate multiple types of scores for drug–drug similarities and target–target relationship, we then extend CMMC to ‘Coupled Tensor–Matrix Completion’ (CTMC) by considering drug–drug and target–target similarity/interaction tensors. Results: Evaluation on two benchmark datasets, DrugBank and TTD, shows that CTMC outperforms the matrix-factorization-based methods: GRMF, $L_{2,1}$-GRMF, NRLMF and NRLMF$\beta $. Based on the evaluation, CMMC and CTMC outperform the above three methods in term of area under the curve, F1 score, sensitivity and specificity in a considerably shorter run time.


Author(s):  
Jean Walrand

AbstractOnline learning algorithms update their estimates as additional observations are made. Section 12.1 explains a simple example: online linear regression. The stochastic gradient projection algorithm is a general technique to update estimates based on additional observations; it is widely used in machine learning. Section 12.2 presents the theory behind that algorithm. When analyzing large amounts of data, one faces the problems of identifying the most relevant data and of how to use efficiently the available data. Section 12.3 explains three examples of how these questions are addressed: the LASSO algorithm, compressed sensing, and the matrix completion problem. Section 12.4 discusses deep neural networks for which the stochastic gradient projection algorithm is easy to implement.


2017 ◽  
Vol 29 (8) ◽  
pp. 2164-2176 ◽  
Author(s):  
Steven Squires ◽  
Adam Prügel-Bennett ◽  
Mahesan Niranjan

Nonnegative matrix factorization (NMF) is primarily a linear dimensionality reduction technique that factorizes a nonnegative data matrix into two smaller nonnegative matrices: one that represents the basis of the new subspace and the second that holds the coefficients of all the data points in that new space. In principle, the nonnegativity constraint forces the representation to be sparse and parts based. Instead of extracting holistic features from the data, real parts are extracted that should be significantly easier to interpret and analyze. The size of the new subspace selects how many features will be extracted from the data. An effective choice should minimize the noise while extracting the key features. We propose a mechanism for selecting the subspace size by using a minimum description length technique. We demonstrate that our technique provides plausible estimates for real data as well as accurately predicting the known size of synthetic data. We provide an implementation of our code in a Matlab format.


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Wendong Wang ◽  
Jianjun Wang

In this paper, we propose a new method to deal with the matrix completion problem. Different from most existing matrix completion methods that only pursue the low rank of underlying matrices, the proposed method simultaneously optimizes their low rank and smoothness such that they mutually help each other and hence yield a better performance. In particular, the proposed method becomes very competitive with the introduction of a modified second-order total variation, even when it is compared with some recently emerged matrix completion methods that also combine the low rank and smoothness priors of matrices together. An efficient algorithm is developed to solve the induced optimization problem. The extensive experiments further confirm the superior performance of the proposed method over many state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document