scholarly journals Variable selection for skew-normal mixture of joint location and scale models

2021 ◽  
Vol 36 (4) ◽  
pp. 475-491
Author(s):  
Liu-cang Wu ◽  
Song-qin Yang ◽  
Ye Tao

AbstractAlthough there are many papers on variable selection methods based on mean model in the finite mixture of regression models, little work has been done on how to select significant explanatory variables in the modeling of the variance parameter. In this paper, we propose and study a novel class of models: a skew-normal mixture of joint location and scale models to analyze the heteroscedastic skew-normal data coming from a heterogeneous population. The problem of variable selection for the proposed models is considered. In particular, a modified Expectation-Maximization(EM) algorithm for estimating the model parameters is developed. The consistency and the oracle property of the penalized estimators is established. Simulation studies are conducted to investigate the finite sample performance of the proposed methodologies. An example is illustrated by the proposed methodologies.

2017 ◽  
Vol 18 (1) ◽  
pp. 3-23 ◽  
Author(s):  
Eva Cantoni ◽  
Marie Auda

When count data exhibit excess zero, that is more zero counts than a simpler parametric distribution can model, the zero-inflated Poisson (ZIP) or zero-inflated negative binomial (ZINB) models are often used. Variable selection for these models is even more challenging than for other regression situations because the availability of p covariates implies 4 p possible models. We adapt to zero-inflated models an approach for variable selection that avoids the screening of all possible models. This approach is based on a stochastic search through the space of all possible models, which generates a chain of interesting models. As an additional novelty, we propose three ways of extracting information from this rich chain and we compare them in two simulation studies, where we also contrast our approach with regularization (penalized) techniques available in the literature. The analysis of a typical dataset that has motivated our research is also presented, before concluding with some recommendations.


2021 ◽  
pp. 096228022110239
Author(s):  
Dongxiao Han ◽  
Lianqiang Qu ◽  
Liuquan Sun ◽  
Yanqing Sun

In HIV vaccine efficacy trials, mark-specific hazards models have important applications and can be used to evaluate the strain-specific vaccine efficacy. Additive hazards models have been widely used in practice, especially when continuous covariates are present. In this article, we conduct variable selection for a mark-specific additive hazards model. The proposed method is based on an estimating equation with the first derivative of the adaptive LASSO penalty function. The asymptotic properties of the resulting estimators are established. The finite sample behavior of the proposed estimators is evaluated through simulation studies, and an application to a dataset from the first HIV vaccine efficacy trial is provided.


2016 ◽  
Author(s):  
Anders Eklund ◽  
Martin A. Lindquist ◽  
Mattias Villani

AbstractWe propose a voxel-wise general linear model with autoregressive noise and heteroscedastic noise innovations (GLMH) for analyzing functional magnetic resonance imaging (fMRI) data. The model is analyzed from a Bayesian perspective and has the benefit of automatically down-weighting time points close to motion spikes in a data-driven manner. We develop a highly efficient Markov Chain Monte Carlo (MCMC) algorithm that allows for Bayesian variable selection among the regressors to model both the mean (i.e., the design matrix) and variance. This makes it possible to include a broad range of explanatory variables in both the mean and variance (e.g., time trends, activation stimuli, head motion parameters and their temporal derivatives), and to compute the posterior probability of inclusion from the MCMC output. Variable selection is also applied to the lags in the autoregressive noise process, making it possible to infer the lag order from the data simultaneously with all other model parameters. We use both simulated data and real fMRI data from OpenfMRI to illustrate the importance of proper modeling of heteroscedasticity in fMRI data analysis. Our results show that the GLMH tends to detect more brain activity, compared to its homoscedastic counterpart, by allowing the variance to change over time depending on the degree of head motion.


2019 ◽  
Vol 45 (2) ◽  
pp. 119-142
Author(s):  
Bryan Keller

Widespread availability of rich educational databases facilitates the use of conditioning strategies to estimate causal effects with nonexperimental data. With dozens, hundreds, or more potential predictors, variable selection can be useful for practical reasons related to communicating results and for statistical reasons related to improving the efficiency of estimators. Background knowledge should take precedence in deciding which variables to retain. However, with many potential predictors, theory may be weak, such that functional form relationships are likely to be unknown. In this article, I propose a nonparametric method for data-driven variable selection based on permutation testing with conditional random forest variable importance. The algorithm automatically handles nonlinear relationships and interactions in its naive implementation. Through a series of Monte Carlo simulation studies and a case study with Early Childhood Longitudinal Study–K data, I find that the method performs well across a variety of scenarios where other methods fail.


Sign in / Sign up

Export Citation Format

Share Document