Interactions of Genetic and Environment Scores: Alternating Lasso Regularization Avoids Overfitting and Finds Interpretable Scores
Regression models with interaction terms are common models for moderating relationships. When several predictors from one group, e.g., genetic variables, are potentially moderated by several predictors from another, e.g., environmental variables, many interaction terms result. This complicates model interpretation, especially when coefficient signs point in different directions. By first forming a score for each group of predictors, the interaction model's dimension is severely reduced. The hierarchical score model is an elegant one step approach: Score weights and regression model coefficients are estimated simultaneously by an alternating optimization (AO) algorithm. Especially in high dimensional settings, scores remain an effective technique to reduce interaction model dimension, and we propose regularization to ensure sparsity and interpretability of the score weights. A non-trivial extension of the original AO algorithm is presented, which adds a lasso penalty, resulting in the alternating lasso optimization algorithm (ALOA). The hierarchical score model with ALOA is an interpretable statistical learning technique for moderation in potentially high dimensional applications, and encompasses generalized linear models for the main interaction model. In addition to the lasso regularization, a screening procedure called regularization and residualization (RR) is proposed to avoid spurious interactions. ALOA tuning parameter choice and the RR screening procedure are investigated by simulations, and an illustrative application to lifetime depression risk and gene x environment interactions is provided.