Correction for Optimisation Bias in Structured Sparse High-Dimensional Variable Selection

Author(s):  
Bastien Marquis ◽  
Maarten Jansen
Author(s):  
Kevin He ◽  
Xiang Zhou ◽  
Hui Jiang ◽  
Xiaoquan Wen ◽  
Yi Li

Abstract Modern bio-technologies have produced a vast amount of high-throughput data with the number of predictors much exceeding the sample size. Penalized variable selection has emerged as a powerful and efficient dimension reduction tool. However, control of false discoveries (i.e. inclusion of irrelevant variables) for penalized high-dimensional variable selection presents serious challenges. To effectively control the fraction of false discoveries for penalized variable selections, we propose a false discovery controlling procedure. The proposed method is general and flexible, and can work with a broad class of variable selection algorithms, not only for linear regressions, but also for generalized linear models and survival analysis.


2019 ◽  
Vol 38 (13) ◽  
pp. 2413-2427
Author(s):  
Thomas Welchowski ◽  
Verena Zuber ◽  
Matthias Schmid

2018 ◽  
Vol 28 (4) ◽  
pp. 1230-1246 ◽  
Author(s):  
Yoonsuh Jung ◽  
Hong Zhang ◽  
Jianhua Hu

High-dimensional data are often encountered in biomedical, environmental, and other studies. For example, in biomedical studies that involve high-throughput omic data, an important problem is to search for genetic variables that are predictive of a particular phenotype. A conventional solution is to characterize such relationships through regression models in which a phenotype is treated as the response variable and the variables are treated as covariates; this approach becomes particularly challenging when the number of variables exceeds the number of samples. We propose a general framework for expressing the transformed mean of high-dimensional variables in an exponential distribution family via ANOVA models in which a low-rank interaction space captures the association between the phenotype and the variables. This alternative method transforms the variable selection problem into a well-posed problem with the number of observations larger than the number of variables. In addition, we propose a model selection criterion for the new model framework with a diverging number of parameters, and establish the consistency of the selection criterion. We demonstrate the appealing performance of the proposed method in terms of prediction and detection accuracy through simulations and real data analyses.


Sign in / Sign up

Export Citation Format

Share Document