Abstract
Introduction
This study evaluated seven quantitative methods for their predictive accuracy for intersectionally defined subgroups, via a simulation study. The methods were single-level regression with interaction terms, cross-classification, multilevel analysis of individual heterogeneity and discriminatory accuracy (MAIHDA), and four decision tree
Methods
classification and regression trees (CART), conditional inference trees, chi-square automatic interaction detector, and random forest. Also evaluated was how well methods identified variables relevant to the outcome. An example analysis will be presented using data from the U.S. National Health and Nutritional Examination Survey.
Methods
The simulated datasets varied by outcome variable type (binary and continuous), input variable types, sample size, and size and direction of the effects. Accuracy was evaluated using mean squared error or mean absolute percentage error. The secondary outcome was evaluated via significance and confidence interval coverage of regression terms and variable selection of the machine learning methods.
Results
Predictive accuracy improved with increasing sample size for all methods except CART. At small sample sizes random forest and MAIHDA generally created the most precise predictions. Variable selection consistently faced a high type 1 error for CTree and CHAID. While performing well for prediction, variable selection by random forest and confidence interval coverage and power of MAIHDA main effects coefficients were suboptimal.
Discussion
From this study emerge recommendations for applying methods in quantitative intersectionality. Different methodologies are optimal for different purposes, for example while random forest and MAIHDA performed well for prediction, they were less reliable for variable identification. In our discussion, we will work through how to select, apply, and interpret methodologies to achieve analytic goals that align with intersectionality theory.