Information and Inference A Journal of the IMA
Latest Publications


TOTAL DOCUMENTS

210
(FIVE YEARS 112)

H-INDEX

15
(FIVE YEARS 3)

Published By Oxford University Press

2049-8772, 2049-8764

Author(s):  
John Lipor ◽  
David Hong ◽  
Yan Shuo Tan ◽  
Laura Balzano
Keyword(s):  

Author(s):  
Fabian Jaensch ◽  
Peter Jung

Abstract We consider a structured estimation problem where an observed matrix is assumed to be generated as an $s$-sparse linear combination of $N$ given $n\times n$ positive-semi-definite matrices. Recovering the unknown $N$-dimensional and $s$-sparse weights from noisy observations is an important problem in various fields of signal processing and also a relevant preprocessing step in covariance estimation. We will present related recovery guarantees and focus on the case of non-negative weights. The problem is formulated as a convex program and can be solved without further tuning. Such robust, non-Bayesian and parameter-free approaches are important for applications where prior distributions and further model parameters are unknown. Motivated by explicit applications in wireless communication, we will consider the particular rank-one case, where the known matrices are outer products of iid. zero-mean sub-Gaussian $n$-dimensional complex vectors. We show that, for given $n$ and $N$, one can recover non-negative $s$-sparse weights with a parameter-free convex program once $s\leq O(n^2 / \log ^2(N/n^2)$. Our error estimate scales linearly in the instantaneous noise power whereby the convex algorithm does not need prior bounds on the noise. Such estimates are important if the magnitude of the additive distortion depends on the unknown itself.


Author(s):  
Navin Kashyap ◽  
Manjunath Krishnapur

Abstract We show, by an explicit construction, that a mixture of univariate Gaussian densities with variance $1$ and means in $[-A,A]$ can have $\varOmega (A^2)$ modes. This disproves a recent conjecture of Dytso et al. (2020, IEEE Trans. Inf. Theory, 66, 2006–2022) who showed that such a mixture can have at most $O(A^{2})$ modes and surmised that the upper bound could be improved to $O(A)$. Our result holds even if an additional variance constraint is imposed on the mixing distribution. Extending the result to higher dimensions, we exhibit a mixture of Gaussians in ${\mathbb{R}}^{d}$, with identity covariances and means inside ${[-A,A]}^{d}$, that has $\varOmega (A^{2d})$ modes.


Author(s):  
Tamir Bendory ◽  
Ariel Jaffe ◽  
William Leeb ◽  
Nir Sharon ◽  
Amit Singer

Abstract We study super-resolution multi-reference alignment, the problem of estimating a signal from many circularly shifted, down-sampled and noisy observations. We focus on the low SNR regime, and show that a signal in ${\mathbb{R}}^M$ is uniquely determined when the number $L$ of samples per observation is of the order of the square root of the signal’s length ($L=O(\sqrt{M})$). Phrased more informally, one can square the resolution. This result holds if the number of observations is proportional to $1/\textrm{SNR}^3$. In contrast, with fewer observations recovery is impossible even when the observations are not down-sampled ($L=M$). The analysis combines tools from statistical signal processing and invariant theory. We design an expectation-maximization algorithm and demonstrate that it can super-resolve the signal in challenging SNR regimes.


Author(s):  
Bubacarr Bah ◽  
Holger Rauhut ◽  
Ulrich Terstiege ◽  
Michael Westdickenberg

Abstract We study the convergence of gradient flows related to learning deep linear neural networks (where the activation function is the identity map) from data. In this case, the composition of the network layers amounts to simply multiplying the weight matrices of all layers together, resulting in an overparameterized problem. The gradient flow with respect to these factors can be re-interpreted as a Riemannian gradient flow on the manifold of rank-$r$ matrices endowed with a suitable Riemannian metric. We show that the flow always converges to a critical point of the underlying functional. Moreover, we establish that, for almost all initializations, the flow converges to a global minimum on the manifold of rank $k$ matrices for some $k\leq r$.


Author(s):  
Samet Oymak ◽  
Mahdi Soltanolkotabi

Abstract In this paper, we study the problem of learning the weights of a deep convolutional neural network. We consider a network where convolutions are carried out over non-overlapping patches. We develop an algorithm for simultaneously learning all the kernels from the training data. Our approach dubbed deep tensor decomposition (DeepTD) is based on a low-rank tensor decomposition. We theoretically investigate DeepTD under a realizable model for the training data where the inputs are chosen i.i.d. from a Gaussian distribution and the labels are generated according to planted convolutional kernels. We show that DeepTD is sample efficient and provably works as soon as the sample size exceeds the total number of convolutional weights in the network.


Author(s):  
Massimo Fornasier ◽  
Jan Vybíral ◽  
Ingrid Daubechies

Abstract We address the structure identification and the uniform approximation of sums of ridge functions $f(x)=\sum _{i=1}^m g_i(\langle a_i,x\rangle )$ on ${\mathbb{R}}^d$, representing a general form of a shallow feed-forward neural network, from a small number of query samples. Higher order differentiation, as used in our constructive approximations, of sums of ridge functions or of their compositions, as in deeper neural network, yields a natural connection between neural network weight identification and tensor product decomposition identification. In the case of the shallowest feed-forward neural network, second-order differentiation and tensors of order two (i.e., matrices) suffice as we prove in this paper. We use two sampling schemes to perform approximate differentiation—active sampling, where the sampling points are universal, actively and randomly designed, and passive sampling, where sampling points were preselected at random from a distribution with known density. Based on multiple gathered approximated first- and second-order differentials, our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We first perform an active subspace search by approximating the span of the weight vectors $a_1,\dots ,a_m$. Then we use a straightforward substitution, which reduces the dimensionality of the problem from $d$ to $m$. The core of the construction is then the stable and efficient approximation of weights expressed in terms of rank-$1$ matrices $a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program. We prove the successful identification by this program of weight vectors being close to orthonormal and we also show how we can constructively reduce to this case by a whitening procedure, without loss of any generality. We finally discuss the implementation and the performance of the proposed algorithmic pipeline with extensive numerical experiments, which illustrate and confirm the theoretical results.


Author(s):  
Alessandro Achille ◽  
Giovanni Paolini ◽  
Glen Mbeng ◽  
Stefano Soatto

Abstract We introduce an asymmetric distance in the space of learning tasks and a framework to compute their complexity. These concepts are foundational for the practice of transfer learning, whereby a parametric model is pre-trained for a task, and then fine tuned for another. The framework we develop is non-asymptotic, captures the finite nature of the training dataset and allows distinguishing learning from memorization. It encompasses, as special cases, classical notions from Kolmogorov complexity and Shannon and Fisher information. However, unlike some of those frameworks, it can be applied to large-scale models and real-world datasets. Our framework is the first to measure complexity in a way that accounts for the effect of the optimization scheme, which is critical in deep learning.


Sign in / Sign up

Export Citation Format

Share Document