scholarly journals Evaluating Equivalence Testing Methods for Measurement Invariance

2020 ◽  
Author(s):  
Alyssa Counsell ◽  
Rob Cribbie

Measurement Invariance (MI) is often concluded from a nonsignificant chi-square difference test. Researchers have also proposed using change in goodness-of-fit indices (ΔGOFs) instead. Both of these commonly used methods for testing MI have important limitations. To combat these issues, To combat these issues, it was proposed using an equivalence test (EQ) to replace the chi-square difference test commonly used to test MI. Due to concerns with the EQ's power, and adjusted version (EQ-A) was created, but provides little evaluation of either procedure. The current study evaluated the Type I error and power of both the EQ and EQ-A, and compared their performance to that of the traditional chi-square difference test and ΔGOFs. The EQ was the only procedure that maintained empirical error rates below the nominal alpha level. Results also highlight that the EQ requires larger sample sizes than traditional difference-based approaches or using equivalence bounds based on larger than conventional RMSEA values (e.g., > .05) to ensure adequate power rates. We do not recommend the proposed adjustment (EQ-A) over the EQ.

2020 ◽  
Author(s):  
Alyssa Counsell ◽  
Rob Cribbie ◽  
David B Flora

Measurement Invariance (MI) is often concluded from a nonsignificant chi-square difference test. Researchers have also proposed using change in goodness of fit indices (∆GOFs) instead. Both of these commonly used methods for testing MI have important limitations. To combat these issues, Yuan and Chan (2016) proposed using an equivalence test (EQ) to replace the chi-square difference test commonly used to test MI. Due to their concerns with the EQ’s power, Yuan and Chan also created an adjusted version (EQ-A), but provide little evaluation of either procedure. The current study evaluated the Type I error and power of both the EQ and EQ-A, and compared their performance to that of the traditional chi-square difference test and ∆GOFs. The EQ for nested model comparisons was the only procedure that always maintained empirical error rates below the nominal alpha level. Results also highlight that the EQ requires larger sample sizes than traditional difference-based approaches or using equivalence bounds based on larger than conventional RMSEA values (e.g., > .05) to ensure adequate power rates. We do not recommend Yuan and Chan’s proposed adjustment (EQ-A) over the EQ.


2019 ◽  
Vol 44 (4) ◽  
pp. 282-295
Author(s):  
HyeSun Lee ◽  
Weldon Z. Smith

This study examined whether cutoffs in fit indices suggested for traditional formats with maximum likelihood estimators can be utilized to assess model fit and to test measurement invariance when a multiple group confirmatory factor analysis was employed for the Thurstonian item response theory (IRT) model. Regarding the performance of the evaluation criteria, detection of measurement non-invariance and Type I error rates were examined. The impact of measurement non-invariance on estimated scores in the Thurstonian IRT model was also examined through accuracy and efficiency in score estimation. The fit indices used for the evaluation of model fit performed well. Among six cutoffs for changes in model fit indices, only ΔCFI > .01 and ΔNCI > .02 detected metric non-invariance when the medium magnitude of non-invariance occurred and none of the cutoffs performed well to detect scalar non-invariance. Based on the generated sampling distributions of fit index differences, this study suggested ΔCFI > .001 and ΔNCI > .004 for scalar non-invariance and ΔCFI > .007 for metric non-invariance. Considering Type I error rate control and detection rates of measurement non-invariance, ΔCFI was recommended for measurement non-invariance tests for forced-choice format data. Challenges in measurement non-invariance tests in the Thurstonian IRT model were discussed along with the direction for future research to enhance the utility of forced-choice formats in test development for cross-cultural and international settings.


2001 ◽  
Vol 26 (1) ◽  
pp. 105-132 ◽  
Author(s):  
Douglas A. Powell ◽  
William D. Schafer

The robustness literature for the structural equation model was synthesized following the method of Harwell which employs meta-analysis as developed by Hedges and Vevea. The study focused on the explanation of empirical Type I error rates for six principal classes of estimators: two that assume multivariate normality (maximum likelihood and generalized least squares), elliptical estimators, two distribution-free estimators (asymptotic and others), and latent projection. Generally, the chi-square tests for overall model fit were found to be sensitive to non-normality and the size of the model for all estimators (with the possible exception of the elliptical estimators with respect to model size and the latent projection techniques with respect to non-normality). The asymptotic distribution-free (ADF) and latent projection techniques were also found to be sensitive to sample sizes. Distribution-free methods other than ADF showed, in general, much less sensitivity to all factors considered.


2019 ◽  
Vol 44 (3) ◽  
pp. 167-181 ◽  
Author(s):  
Wenchao Ma

Limited-information fit measures appear to be promising in assessing the goodness-of-fit of dichotomous response cognitive diagnosis models (CDMs), but their performance has not been examined for polytomous response CDMs. This study investigates the performance of the Mord statistic and standardized root mean square residual (SRMSR) for an ordinal response CDM—the sequential generalized deterministic inputs, noisy “and” gate model. Simulation studies showed that the Mord statistic had well-calibrated Type I error rates, but the correct detection rates were influenced by various factors such as item quality, sample size, and the number of response categories. In addition, the SRMSR was also influenced by many factors and the common practice of comparing the SRMSR against a prespecified cut-off (e.g., .05) may not be appropriate. A set of real data was analyzed as well to illustrate the use of Mord statistic and SRMSR in practice.


2017 ◽  
Vol 78 (3) ◽  
pp. 460-481 ◽  
Author(s):  
Margarita Olivera-Aguilar ◽  
Samuel H. Rikoon ◽  
Oscar Gonzalez ◽  
Yasemin Kisbu-Sakarya ◽  
David P. MacKinnon

When testing a statistical mediation model, it is assumed that factorial measurement invariance holds for the mediating construct across levels of the independent variable X. The consequences of failing to address the violations of measurement invariance in mediation models are largely unknown. The purpose of the present study was to systematically examine the impact of mediator noninvariance on the Type I error rates, statistical power, and relative bias in parameter estimates of the mediated effect in the single mediator model. The results of a large simulation study indicated that, in general, the mediated effect was robust to violations of invariance in loadings. In contrast, most conditions with violations of intercept invariance exhibited severely positively biased mediated effects, Type I error rates above acceptable levels, and statistical power larger than in the invariant conditions. The implications of these results are discussed and recommendations are offered.


2011 ◽  
Vol 72 (3) ◽  
pp. 469-492 ◽  
Author(s):  
Eun Sook Kim ◽  
Myeongsun Yoon ◽  
Taehun Lee

Multiple-indicators multiple-causes (MIMIC) modeling is often used to test a latent group mean difference while assuming the equivalence of factor loadings and intercepts over groups. However, this study demonstrated that MIMIC was insensitive to the presence of factor loading noninvariance, which implies that factor loading invariance should be tested through other measurement invariance testing techniques. MIMIC modeling is also used for measurement invariance testing by allowing a direct path from a grouping covariate to each observed variable. This simulation study with both continuous and categorical variables investigated the performance of MIMIC in detecting noninvariant variables under various study conditions and showed that the likelihood ratio test of MIMIC with Oort adjustment not only controlled Type I error rates below the nominal level but also maintained high power across study conditions.


Psych ◽  
2021 ◽  
Vol 3 (3) ◽  
pp. 542-551
Author(s):  
Tihomir Asparouhov ◽  
Bengt Muthén

In this article we describe a modification of the robust chi-square test of fit that yields more accurate type I error rates when the estimated model is at the boundary of the admissible space.


Sign in / Sign up

Export Citation Format

Share Document