An adaptive shortest-solution guided decimation approach to sparse high-dimensional linear regression

Author(s):  
Xue Yu ◽  
Yifan Sun ◽  
Hai-Jun Zhou

Abstract High-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest-solution guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the least-squares solution of the recursively decimated linear equations, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outper-forms LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xue Yu ◽  
Yifan Sun ◽  
Hai-Jun Zhou

AbstractHigh-dimensional linear regression model is the most popular statistical model for high-dimensional data, but it is quite a challenging task to achieve a sparse set of regression coefficients. In this paper, we propose a simple heuristic algorithm to construct sparse high-dimensional linear regression models, which is adapted from the shortest-solution guided decimation algorithm and is referred to as ASSD. This algorithm constructs the support of regression coefficients under the guidance of the shortest least-squares solution of the recursively decimated linear models, and it applies an early-stopping criterion and a second-stage thresholding procedure to refine this support. Our extensive numerical results demonstrate that ASSD outperforms LASSO, adaptive LASSO, vector approximate message passing, and two other representative greedy algorithms in solution accuracy and robustness. ASSD is especially suitable for linear regression problems with highly correlated measurement matrices encountered in real-world applications.


2017 ◽  
Author(s):  
Wei Lan ◽  
Yingying Ma ◽  
Junlong Zhao ◽  
Hansheng Wang ◽  
Chih-Ling Tsai

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Jan Klosa ◽  
Noah Simon ◽  
Pål Olof Westermark ◽  
Volkmar Liebscher ◽  
Dörte Wittenburg

Abstract Background Statistical analyses of biological problems in life sciences often lead to high-dimensional linear models. To solve the corresponding system of equations, penalization approaches are often the methods of choice. They are especially useful in case of multicollinearity, which appears if the number of explanatory variables exceeds the number of observations or for some biological reason. Then, the model goodness of fit is penalized by some suitable function of interest. Prominent examples are the lasso, group lasso and sparse-group lasso. Here, we offer a fast and numerically cheap implementation of these operators via proximal gradient descent. The grid search for the penalty parameter is realized by warm starts. The step size between consecutive iterations is determined with backtracking line search. Finally, seagull -the R package presented here- produces complete regularization paths. Results Publicly available high-dimensional methylation data are used to compare seagull to the established R package SGL. The results of both packages enabled a precise prediction of biological age from DNA methylation status. But even though the results of seagull and SGL were very similar (R2 > 0.99), seagull computed the solution in a fraction of the time needed by SGL. Additionally, seagull enables the incorporation of weights for each penalized feature. Conclusions The following operators for linear regression models are available in seagull: lasso, group lasso, sparse-group lasso and Integrative LASSO with Penalty Factors (IPF-lasso). Thus, seagull is a convenient envelope of lasso variants.


Metrika ◽  
2013 ◽  
Vol 77 (7) ◽  
pp. 921-945 ◽  
Author(s):  
Hong Guo ◽  
Changliang Zou ◽  
Zhaojun Wang ◽  
Bin Chen

2014 ◽  
Vol 142 (11) ◽  
pp. 2397-2405 ◽  
Author(s):  
L. H. THOMPSON ◽  
M. T. MALIK ◽  
A. GUMEL ◽  
T. STROME ◽  
S. M. MAHMUD

SUMMARYWe evaluated syndromic indicators of influenza disease activity developed using emergency department (ED) data – total ED visits attributed to influenza-like illness (ILI) (‘ED ILI volume’) and percentage of visits attributed to ILI (‘ED ILI percent’) – and Google flu trends (GFT) data (ILI cases/100 000 physician visits). Congruity and correlation among these indicators and between these indicators and weekly count of laboratory-confirmed influenza in Manitoba was assessed graphically using linear regression models. Both ED and GFT data performed well as syndromic indicators of influenza activity, and were highly correlated with each other in real time. The strongest correlations between virological data and ED ILI volume and ED ILI percent, respectively, were 0·77 and 0·71. The strongest correlation of GFT was 0·74. Seasonal influenza activity may be effectively monitored using ED and GFT data.


Sign in / Sign up

Export Citation Format

Share Document