High-dimensional inference on covariance structures via the extended cross-data-matrix methodology

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

Download Full-text

High Dimensional Inference With Random Maximum A-Posteriori Perturbations

IEEE Transactions on Information Theory ◽

10.1109/tit.2019.2916805 ◽

2019 ◽

Vol 65 (10) ◽

pp. 6539-6560 ◽

Cited By ~ 1

Author(s):

Tamir Hazan ◽

Francesco Orabona ◽

Anand D. Sarwate ◽

Subhransu Maji ◽

Tommi S. Jaakkola

Keyword(s):

High Dimensional ◽

Maximum A Posteriori ◽

A Posteriori ◽

High Dimensional Inference

Download Full-text

Correlation tests for high-dimensional data using extended cross-data-matrix methodology

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2013.03.007 ◽

2013 ◽

Vol 117 ◽

pp. 313-331 ◽

Cited By ~ 14

Author(s):

Kazuyoshi Yata ◽

Makoto Aoshima

Keyword(s):

High Dimensional Data ◽

Data Matrix ◽

High Dimensional ◽

Correlation Tests

Download Full-text

State evolution for approximate message passing with non-separable functions

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iay021 ◽

2019 ◽

Vol 9 (1) ◽

pp. 33-79 ◽

Cited By ~ 7

Author(s):

Raphaël Berthier ◽

Andrea Montanari ◽

Phan-Minh Nguyen

Keyword(s):

Message Passing ◽

Phase Retrieval ◽

Robust Regression ◽

Empirical Work ◽

Data Matrix ◽

High Dimensional ◽

Regularity Conditions ◽

Approximate Message Passing ◽

Separable Functions ◽

State Evolution

Abstract Given a high-dimensional data matrix $\boldsymbol{A}\in{{\mathbb{R}}}^{m\times n}$, approximate message passing (AMP) algorithms construct sequences of vectors $\boldsymbol{u}^{t}\in{{\mathbb{R}}}^{n}$, ${\boldsymbol v}^{t}\in{{\mathbb{R}}}^{m}$, indexed by $t\in \{0,1,2\dots \}$ by iteratively applying $\boldsymbol{A}$ or $\boldsymbol{A}^{{\textsf T}}$ and suitable nonlinear functions, which depend on the specific application. Special instances of this approach have been developed—among other applications—for compressed sensing reconstruction, robust regression, Bayesian estimation, low-rank matrix recovery, phase retrieval and community detection in graphs. For certain classes of random matrices $\boldsymbol{A}$, AMP admits an asymptotically exact description in the high-dimensional limit $m,n\to \infty $, which goes under the name of state evolution. Earlier work established state evolution for separable nonlinearities (under certain regularity conditions). Nevertheless, empirical work demonstrated several important applications that require non-separable functions. In this paper we generalize state evolution to Lipschitz continuous non-separable nonlinearities, for Gaussian matrices $\boldsymbol{A}$. Our proof makes use of Bolthausen’s conditioning technique along with several approximation arguments. In particular, we introduce a modified algorithm (called LoAMP for Long AMP), which is of independent interest.

Download Full-text