Robustness in sparse high-dimensional linear models: Relative efficiency and robust approximate message passing

AbstractHigh-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest-solution guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the shortest least-squares solution of the recursively decimated linear models, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outperforms LASSO, adaptive LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.

Download Full-text

Optimal errors and phase transitions in high-dimensional generalized linear models

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1802705116 ◽

2019 ◽

Vol 116 (12) ◽

pp. 5451-5460 ◽

Cited By ~ 26

Author(s):

Jean Barbier ◽

Florent Krzakala ◽

Nicolas Macris ◽

Léo Miolane ◽

Lenka Zdeborová

Keyword(s):

Phase Transitions ◽

Generalized Linear Models ◽

Message Passing ◽

Statistical Physics ◽

Linear Models ◽

Optimal Estimation ◽

Error Correcting Codes ◽

Data Matrix ◽

High Dimensional ◽

Special Cases

Generalized linear models (GLMs) are used in high-dimensional machine learning, statistics, communications, and signal processing. In this paper we analyze GLMs when the data matrix is random, as relevant in problems such as compressed sensing, error-correcting codes, or benchmark models in neural networks. We evaluate the mutual information (or “free entropy”) from which we deduce the Bayes-optimal estimation and generalization errors. Our analysis applies to the high-dimensional limit where both the number of samples and the dimension are large and their ratio is fixed. Nonrigorous predictions for the optimal errors existed for special cases of GLMs, e.g., for the perceptron, in the field of statistical physics based on the so-called replica method. Our present paper rigorously establishes those decades-old conjectures and brings forward their algorithmic interpretation in terms of performance of the generalized approximate message-passing algorithm. Furthermore, we tightly characterize, for many learning problems, regions of parameters for which this algorithm achieves the optimal performance and locate the associated sharp phase transitions separating learnable and nonlearnable regions. We believe that this random version of GLMs can serve as a challenging benchmark for multipurpose algorithms.

Download Full-text

State evolution for approximate message passing with non-separable functions

Information and Inference A Journal of the IMA ◽

10.1093/imaiai/iay021 ◽

2019 ◽

Vol 9 (1) ◽

pp. 33-79 ◽

Cited By ~ 7

Author(s):

Raphaël Berthier ◽

Andrea Montanari ◽

Phan-Minh Nguyen

Keyword(s):

Message Passing ◽

Phase Retrieval ◽

Robust Regression ◽

Empirical Work ◽

Data Matrix ◽

High Dimensional ◽

Regularity Conditions ◽

Approximate Message Passing ◽

Separable Functions ◽

State Evolution

Abstract Given a high-dimensional data matrix $\boldsymbol{A}\in{{\mathbb{R}}}^{m\times n}$, approximate message passing (AMP) algorithms construct sequences of vectors $\boldsymbol{u}^{t}\in{{\mathbb{R}}}^{n}$, ${\boldsymbol v}^{t}\in{{\mathbb{R}}}^{m}$, indexed by $t\in \{0,1,2\dots \}$ by iteratively applying $\boldsymbol{A}$ or $\boldsymbol{A}^{{\textsf T}}$ and suitable nonlinear functions, which depend on the specific application. Special instances of this approach have been developed—among other applications—for compressed sensing reconstruction, robust regression, Bayesian estimation, low-rank matrix recovery, phase retrieval and community detection in graphs. For certain classes of random matrices $\boldsymbol{A}$, AMP admits an asymptotically exact description in the high-dimensional limit $m,n\to \infty $, which goes under the name of state evolution. Earlier work established state evolution for separable nonlinearities (under certain regularity conditions). Nevertheless, empirical work demonstrated several important applications that require non-separable functions. In this paper we generalize state evolution to Lipschitz continuous non-separable nonlinearities, for Gaussian matrices $\boldsymbol{A}$. Our proof makes use of Bolthausen’s conditioning technique along with several approximation arguments. In particular, we introduce a modified algorithm (called LoAMP for Long AMP), which is of independent interest.

Download Full-text

An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression

10.21203/rs.3.rs-598251/v1 ◽

2021 ◽

Author(s):

Xue Yu ◽

Yifan Sun ◽

Hai-Jun Zhou

Keyword(s):

Linear Regression ◽

Message Passing ◽

Greedy Algorithms ◽

Linear Equations ◽

Regression Coefficients ◽

High Dimensional ◽

Linear Regression Models ◽

Approximate Message Passing ◽

Highly Correlated ◽

Solution Accuracy

Abstract High-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest-solution guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the least-squares solution of the recursively decimated linear equations, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outper-forms LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.

Download Full-text

High dimensional robust M-estimation: asymptotic variance via approximate message passing

Probability Theory and Related Fields ◽

10.1007/s00440-015-0675-z ◽

2015 ◽

Vol 166 (3-4) ◽

pp. 935-969 ◽

Cited By ~ 30

Author(s):

David Donoho ◽

Andrea Montanari

Keyword(s):

Message Passing ◽

Asymptotic Variance ◽

High Dimensional ◽

Approximate Message Passing ◽

M Estimation

Download Full-text

Optimal Combination of Linear and Spectral Estimators for Generalized Linear Models

Foundations of Computational Mathematics ◽

10.1007/s10208-021-09531-x ◽

2021 ◽

Author(s):

Marco Mondelli ◽

Christos Thrampoulidis ◽

Ramji Venkataramanan

Keyword(s):

Message Passing ◽

Linear Models ◽

Optimal Combination ◽

Limiting Distribution ◽

Linear Estimator ◽

Sensing Matrix ◽

Approximate Message Passing ◽

Message Passing Algorithm ◽

Spectral Estimators

AbstractWe study the problem of recovering an unknown signal $${\varvec{x}}$$ x given measurements obtained from a generalized linear model with a Gaussian sensing matrix. Two popular solutions are based on a linear estimator $$\hat{\varvec{x}}^\mathrm{L}$$ x ^ L and a spectral estimator $$\hat{\varvec{x}}^\mathrm{s}$$ x ^ s . The former is a data-dependent linear combination of the columns of the measurement matrix, and its analysis is quite simple. The latter is the principal eigenvector of a data-dependent matrix, and a recent line of work has studied its performance. In this paper, we show how to optimally combine $$\hat{\varvec{x}}^\mathrm{L}$$ x ^ L and $$\hat{\varvec{x}}^\mathrm{s}$$ x ^ s . At the heart of our analysis is the exact characterization of the empirical joint distribution of $$({\varvec{x}}, \hat{\varvec{x}}^\mathrm{L}, \hat{\varvec{x}}^\mathrm{s})$$ ( x , x ^ L , x ^ s ) in the high-dimensional limit. This allows us to compute the Bayes-optimal combination of $$\hat{\varvec{x}}^\mathrm{L}$$ x ^ L and $$\hat{\varvec{x}}^\mathrm{s}$$ x ^ s , given the limiting distribution of the signal $${\varvec{x}}$$ x . When the distribution of the signal is Gaussian, then the Bayes-optimal combination has the form $$\theta \hat{\varvec{x}}^\mathrm{L}+\hat{\varvec{x}}^\mathrm{s}$$ θ x ^ L + x ^ s and we derive the optimal combination coefficient. In order to establish the limiting distribution of $$({\varvec{x}}, \hat{\varvec{x}}^\mathrm{L}, \hat{\varvec{x}}^\mathrm{s})$$ ( x , x ^ L , x ^ s ) , we design and analyze an approximate message passing algorithm whose iterates give $$\hat{\varvec{x}}^\mathrm{L}$$ x ^ L and approach $$\hat{\varvec{x}}^\mathrm{s}$$ x ^ s . Numerical simulations demonstrate the improvement of the proposed combination with respect to the two methods considered separately.

Download Full-text