scholarly journals VARIABLE SELECTION IN ANALYZING LIFE INFANT BIRTH IN INDONESIA USING GROUP LASSO AND GROUP SCAD

Author(s):  
Ita Wulandari ◽  
Khairil Anwar Notodiputro ◽  
Bagus Sartono
Biometrika ◽  
2009 ◽  
Vol 96 (2) ◽  
pp. 339-355 ◽  
Author(s):  
Jian Huang ◽  
Shuange Ma ◽  
Huiliang Xie ◽  
Cun-Hui Zhang

Abstract In multiple regression problems when covariates can be naturally grouped, it is important to carry out feature selection at the group and within-group individual variable levels simultaneously. The existing methods, including the lasso and group lasso, are designed for either variable selection or group selection, but not for both. We propose a group bridge approach that is capable of simultaneous selection at both the group and within-group individual variable levels. The proposed approach is a penalized regularization method that uses a specially designed group bridge penalty. It has the oracle group selection property, in that it can correctly select important groups with probability converging to one. In contrast, the group lasso and group least angle regression methods in general do not possess such an oracle property in group selection. Simulation studies indicate that the group bridge has superior performance in group and individual variable selection relative to several existing methods.


2014 ◽  
Vol 85 (13) ◽  
pp. 2750-2760 ◽  
Author(s):  
Kuangnan Fang ◽  
Xiaoyan Wang ◽  
Shengwei Zhang ◽  
Jianping Zhu ◽  
Shuangge Ma

2016 ◽  
Vol 4 (5) ◽  
pp. 476-488
Author(s):  
Xiaodong Xie ◽  
Shaozhi Zheng

AbstractCox’s proportional hazard models with time-varying coefficients have much flexibility for modeling the dynamic of covariate effects. Although many variable selection procedures have been developed for Coxs proportional hazard model, the study of such models with time-varying coefficients appears to be limited. The variable selection methods involving nonconvex penalty function, such as the minimax concave penalty (MCP), introduces numerical challenge, but they still have attractive theoretical properties and were indicated that they are worth to be alternatives of other competitive methods. We propose a group MCP method that uses B-spline basis to expand coefficients and maximizes the log partial likelihood with nonconvex penalties on regression coefficients in groups. A fast, iterative group shooting algorithm is carried out for model selection and estimation. Under some appropriate conditions, the simulated example shows that our method performs competitively with the group lasso method. By comparison, the group MCP method and group lasso select the same amount of important covariates, but group MCP method tends to outperform the group lasso method in selection of unimportant covariates.


2020 ◽  
Author(s):  
Yihuan Huang ◽  
Amanda Kay Montoya

Machine learning methods are being increasingly adopted in psychological research. Lasso performs variable selection and regularization, and is particularly appealing to psychology researchers because of its connection to linear regression. Researchers conflate properties of linear regression with properties of lasso; however, we demonstrate that this is not the case for models with categorical predictors. Specifically, the coding strategy used for categorical predictors impacts lasso’s performance but not linear regression. Group lasso is an alternative to lasso for models with categorical predictors. We demonstrate the inconsistency of lasso and group lasso models using a real data set: lasso performs different variable selection and has different prediction accuracy depending on the coding strategy, and group lasso performs consistent variable selection but has different prediction accuracy. Additionally, group lasso may include many predictors when very few are needed, leading to overfitting. Using Monte Carlo simulation, we show that categorical variables with one group mean differing from all others (one dominant group) are more likely to be included in the model by group lasso than lasso, leading to overfitting. This effect is strongest when the mean difference is large and there are many categories. Researchers primarily focus on the similarity between linear regression and lasso, but pay little attention to their different properties. This project demonstrates that when using lasso and group lasso, the effect of coding strategies should be considered. We conclude with recommended solutions to this issue and future directions of exploration to improve implementation of machine learning approaches in psychological science.


2016 ◽  
Vol 5 (6) ◽  
pp. 57 ◽  
Author(s):  
Kristi Mai ◽  
Qingyang Zhang

Next-generation sequencing has been routinely applied to cancer biology, making it possible for researchers to elucidate the molecular mechanisms underlying cancer initiation and progression. However, how to identify oncomarkers from massive complex genomic data poses a great challenge for both modeling and computing. In this paper, we propose a novel computational pipeline to identify genes related to the overall survival of ovarian cancer patients from the rich Cancer Genome Atlas data. Different from the existing studies, we incorporate dependence structure among genes and pathway information into the variable selection. Firstly, the dimensionality of the ovarian cancer data is reduced by a novel stepwise feature screening which mimics the hierarchy of the underlying causal network. The second step of the pipeline is to divide genes into clusters with distinct cellular functions by k-means, x-means and PAMSAM learning algorithms. In the final step, we fit a cox proportional hazard model with a sparse group lasso penalty for further variable selection. Of the 115 genes in the final list, many were reported to be associated with cancer initiation or progression in the literature. In addition, we find several gene families including the NEK family and RNF family, which are closely associated with the survival of ovarian cancer patients.


Sign in / Sign up

Export Citation Format

Share Document