An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression

Mapping Intimacies ◽

10.21203/rs.3.rs-598251/v1 ◽

2021 ◽

Author(s):

Xue Yu ◽

Yifan Sun ◽

Hai-Jun Zhou

Keyword(s):

Linear Regression ◽

Message Passing ◽

Greedy Algorithms ◽

Linear Equations ◽

Regression Coefficients ◽

High Dimensional ◽

Linear Regression Models ◽

Approximate Message Passing ◽

Highly Correlated ◽

Solution Accuracy

Abstract High-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest-solution guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the least-squares solution of the recursively decimated linear equations, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outper-forms LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.

Download Full-text

An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression

Scientific Reports ◽

10.1038/s41598-021-03323-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xue Yu ◽

Yifan Sun ◽

Hai-Jun Zhou

Keyword(s):

Linear Regression ◽

Message Passing ◽

Linear Models ◽

Greedy Algorithms ◽

Adaptive Lasso ◽

Regression Coefficients ◽

High Dimensional ◽

Linear Regression Models ◽

Approximate Message Passing ◽

Highly Correlated

AbstractHigh-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest-solution guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the shortest least-squares solution of the recursively decimated linear models, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outperforms LASSO, adaptive LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.

Download Full-text

Post-l1-penalized estimators in high-dimensional linear regression models

10.1920/wp.cem.2010.1310 ◽

2010 ◽

Cited By ~ 1

Author(s):

Victor Chernozhukov ◽

Alexandre Belloni

Keyword(s):

Linear Regression ◽

Regression Models ◽

High Dimensional ◽

Linear Regression Models

Download Full-text

Sequential Model Averaging for High Dimensional Linear Regression Models

SSRN Electronic Journal ◽

10.2139/ssrn.2896533 ◽

2017 ◽

Author(s):

Wei Lan ◽

Yingying Ma ◽

Junlong Zhao ◽

Hansheng Wang ◽

Chih-Ling Tsai

Keyword(s):

Linear Regression ◽

Regression Models ◽

Model Averaging ◽

High Dimensional ◽

Linear Regression Models ◽

Sequential Model

Download Full-text

Seagull: lasso, group lasso and sparse-group lasso regularization for linear regression models via proximal gradient descent

BMC Bioinformatics ◽

10.1186/s12859-020-03725-w ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Jan Klosa ◽

Noah Simon ◽

Pål Olof Westermark ◽

Volkmar Liebscher ◽

Dörte Wittenburg

Keyword(s):

Linear Regression ◽

Regression Models ◽

Gradient Descent ◽

Methylation Status ◽

R Package ◽

Group Lasso ◽

High Dimensional ◽

Linear Regression Models ◽

Sparse Group Lasso ◽

Proximal Gradient Descent

Abstract Background Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Results Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. Conclusions The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.

Download Full-text

An Improved Forward Regression Variable Selection Algorithm for High-Dimensional Linear Regression Models

IEEE Access ◽

10.1109/access.2020.3009377 ◽

2020 ◽

Vol 8 ◽

pp. 129032-129042

Author(s):

Yanxi Xie ◽

Yuewen Li ◽

Zhijie Xia ◽

Ruixia Yan

Keyword(s):

Linear Regression ◽

Variable Selection ◽

Regression Models ◽

High Dimensional ◽

Linear Regression Models ◽

Selection Algorithm

Download Full-text

Empirical likelihood for high-dimensional linear regression models

Metrika ◽

10.1007/s00184-013-0479-z ◽

2013 ◽

Vol 77 (7) ◽

pp. 921-945 ◽

Cited By ~ 3

Author(s):

Hong Guo ◽

Changliang Zou ◽

Zhaojun Wang ◽

Bin Chen

Keyword(s):

Linear Regression ◽

Empirical Likelihood ◽

Regression Models ◽

High Dimensional ◽

Linear Regression Models

Download Full-text

Emergency department and ‘Google flu trends’ data as syndromic surveillance indicators for seasonal influenza

Epidemiology and Infection ◽

10.1017/s0950268813003464 ◽

2014 ◽

Vol 142 (11) ◽

pp. 2397-2405 ◽

Cited By ~ 18

Author(s):

L. H. THOMPSON ◽

M. T. MALIK ◽

A. GUMEL ◽

T. STROME ◽

S. M. MAHMUD

Keyword(s):

Emergency Department ◽

Linear Regression ◽

Syndromic Surveillance ◽

Regression Models ◽

Seasonal Influenza ◽

Linear Regression Models ◽

Physician Visits ◽

Influenza Activity ◽

Ed Visits ◽

Highly Correlated

SUMMARYWe evaluated syndromic indicators of influenza disease activity developed using emergency department (ED) data – total ED visits attributed to influenza-like illness (ILI) (‘ED ILI volume’) and percentage of visits attributed to ILI (‘ED ILI percent’) – and Google flu trends (GFT) data (ILI cases/100 000 physician visits). Congruity and correlation among these indicators and between these indicators and weekly count of laboratory-confirmed influenza in Manitoba was assessed graphically using linear regression models. Both ED and GFT data performed well as syndromic indicators of influenza activity, and were highly correlated with each other in real time. The strongest correlations between virological data and ED ILI volume and ED ILI percent, respectively, were 0·77 and 0·71. The strongest correlation of GFT was 0·74. Seasonal influenza activity may be effectively monitored using ED and GFT data.

Download Full-text