Interactions of Genetic and Environment Scores: Alternating Lasso Regularization Avoids Overfitting and Finds Interpretable Scores

Regression models with interaction terms are common models for moderating relationships. When several predictors from one group, e.g., genetic variables, are potentially moderated by several predictors from another, e.g., environmental variables, many interaction terms result. This complicates model interpretation, especially when coefficient signs point in different directions. By first forming a score for each group of predictors, the interaction model's dimension is severely reduced. The hierarchical score model is an elegant one step approach: Score weights and regression model coefficients are estimated simultaneously by an alternating optimization (AO) algorithm. Especially in high dimensional settings, scores remain an effective technique to reduce interaction model dimension, and we propose regularization to ensure sparsity and interpretability of the score weights. A non-trivial extension of the original AO algorithm is presented, which adds a lasso penalty, resulting in the alternating lasso optimization algorithm (ALOA). The hierarchical score model with ALOA is an interpretable statistical learning technique for moderation in potentially high dimensional applications, and encompasses generalized linear models for the main interaction model. In addition to the lasso regularization, a screening procedure called regularization and residualization (RR) is proposed to avoid spurious interactions. ALOA tuning parameter choice and the RR screening procedure are investigated by simulations, and an illustrative application to lifetime depression risk and gene x environment interactions is provided.

Download Full-text

Ensemble Linear Subspace Analysis of High-Dimensional Data

Entropy ◽

10.3390/e23030324 ◽

2021 ◽

Vol 23 (3) ◽

pp. 324

Author(s):

S. Ejaz Ahmed ◽

Saeid Amiri ◽

Kjell Doksum

Keyword(s):

Linear Models ◽

Real Data ◽

Penalty Methods ◽

Tuning Parameter ◽

High Dimensional ◽

Prediction Errors ◽

Tuning Parameters ◽

Ensemble Approach ◽

High Dimensional Regression ◽

Multivariate Mutual Information

Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.

Download Full-text