Power analysis for the Wald, LR, score, and gradient tests in a marginal maximum likelihood framework: Applications in IRT
The Wald, likelihood ratio, score and the recently proposed gradient statistics can be used to assess a broad range of hypotheses in item response theory models, for instance, to check the overall model fit or to detect differential item functioning. We introduce new methods for power analysis and sample size planning that can be applied when marginal maximum likelihood estimation is used. This avails the application to a variety of IRT models, which are increasingly used in practice, e.g., in large-scale educational assessments. An analytical method utilizes the asymptotic distributions of the statistics under alternative hypotheses. For a larger number of items, we also provide a sampling-based method, which is necessary due to an exponentially increasing computational load of the analytical approach. We performed extensive simulation studies in two practically relevant settings, i.e., testing a Rasch model against a 2PL model and testing for differential item functioning. The observed distributions of the test statistics and the power of the tests agreed well with the predictions by the proposed methods. We provide an openly accessible R package that implements the methods for user-supplied hypotheses.