scholarly journals Power analysis for the Wald, LR, score, and gradient tests in a marginal maximum likelihood framework: Applications in IRT

2021 ◽  
Author(s):  
Felix Zimmer ◽  
Clemens Draxler ◽  
Rudolf Debelak

The Wald, likelihood ratio, score and the recently proposed gradient statistics can be used to assess a broad range of hypotheses in item response theory models, for instance, to check the overall model fit or to detect differential item functioning. We introduce new methods for power analysis and sample size planning that can be applied when marginal maximum likelihood estimation is used. This avails the application to a variety of IRT models, which are increasingly used in practice, e.g., in large-scale educational assessments. An analytical method utilizes the asymptotic distributions of the statistics under alternative hypotheses. For a larger number of items, we also provide a sampling-based method, which is necessary due to an exponentially increasing computational load of the analytical approach. We performed extensive simulation studies in two practically relevant settings, i.e., testing a Rasch model against a 2PL model and testing for differential item functioning. The observed distributions of the test statistics and the power of the tests agreed well with the predictions by the proposed methods. We provide an openly accessible R package that implements the methods for user-supplied hypotheses.

2011 ◽  
Vol 71 (6) ◽  
pp. 1023-1046 ◽  
Author(s):  
Insu Paek ◽  
Mark Wilson

This study elaborates the Rasch differential item functioning (DIF) model formulation under the marginal maximum likelihood estimation context. Also, the Rasch DIF model performance was examined and compared with the Mantel–Haenszel (MH) procedure in small sample and short test length conditions through simulations. The theoretically known relationship of the DIF estimators between the Rasch DIF model and the MH procedure was confirmed. In general, the MH method showed a conservative tendency for DIF detection rates compared with the Rasch DIF model approach. When there is DIF, the z test (when the standard error of the DIF estimator is estimated properly) and the likelihood ratio test in the Rasch DIF model approach showed higher DIF detection rates than the MH chi-square test for sample sizes of 100 to 300 per group and test lengths ranging from 4 to 39. In addition, this study discusses proposed Rasch DIF classification rules that accommodate statistical inference on the direction of DIF.


Sign in / Sign up

Export Citation Format

Share Document