scholarly journals Polynomial Matrix Completion for Missing Data Imputation and Transductive Learning

2020 ◽  
Vol 34 (04) ◽  
pp. 3842-3849
Author(s):  
Jicong Fan ◽  
Yuqian Zhang ◽  
Madeleine Udell

This paper develops new methods to recover the missing entries of a high-rank or even full-rank matrix when the intrinsic dimension of the data is low compared to the ambient dimension. Specifically, we assume that the columns of a matrix are generated by polynomials acting on a low-dimensional intrinsic variable, and wish to recover the missing entries under this assumption. We show that we can identify the complete matrix of minimum intrinsic dimension by minimizing the rank of the matrix in a high dimensional feature space. We develop a new formulation of the resulting problem using the kernel trick together with a new relaxation of the rank objective, and propose an efficient optimization method. We also show how to use our methods to complete data drawn from multiple nonlinear manifolds. Comparative studies on synthetic data, subspace clustering with missing data, motion capture data recovery, and transductive learning verify the superiority of our methods over the state-of-the-art.

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Bo Jiang ◽  
Shiqian Ma ◽  
Jason Causey ◽  
Linbo Qiao ◽  
Matthew Price Hardin ◽  
...  

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Bo Jiang ◽  
Shiqian Ma ◽  
Jason Causey ◽  
Linbo Qiao ◽  
Matthew Price Hardin ◽  
...  

Abstract Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.


Author(s):  
Tshilidzi Marwala

In this chapter, the traditional missing data imputation issues such as missing data patterns and mechanisms are described. Attention is paid to the best models to deal with particular missing data mechanisms. A review of traditional missing data imputation methods, namely case deletion and prediction rules, is conducted. For case deletion, list-wise and pair-wise deletions are reviewed. In addition, for prediction rules, the imputation techniques such as mean substitution, hot-deck, regression and decision trees are also reviewed. Two missing data examples are studied, namely: the Sudoku puzzle and a mechanical system. The major conclusions drawn from these examples are that there is a need for an accurate model that describes inter-relationships and rules that define the data and that a good optimization method is required for a successful missing data estimation procedure.


Author(s):  
Gerandy Brito ◽  
Ioana Dumitriu ◽  
Kameron Decker Harris

Abstract We prove an analogue of Alon’s spectral gap conjecture for random bipartite, biregular graphs. We use the Ihara–Bass formula to connect the non-backtracking spectrum to that of the adjacency matrix, employing the moment method to show there exists a spectral gap for the non-backtracking matrix. A by-product of our main theorem is that random rectangular zero-one matrices with fixed row and column sums are full rank with high probability. Finally, we illustrate applications to community detection, coding theory, and deterministic matrix completion.


Sign in / Sign up

Export Citation Format

Share Document