Bayesian modelling of differential item functioning: type I error and power rates in the presence of non-normal ability distributions, impact, and anchor set contamination

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

Download Full-text

Effect of Unequal Variances in Proficiency Distributions on Type-I Error of the Mantel-Haenszel Chi-square Test for Differential Item Functioning

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2005.00006 ◽

2005 ◽

Vol 42 (2) ◽

pp. 101-131 ◽

Cited By ~ 7

Author(s):

Patrick O. Monahan ◽

Robert D. Ankenmann

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Chi Square ◽

Unequal Variances ◽

Item Functioning ◽

Chi Square Test

Download Full-text

Item Discrimination and Type I Error in the Detection of Differential Item Functioning

Educational and Psychological Measurement ◽

10.1177/0013164411432333 ◽

2012 ◽

Vol 72 (5) ◽

pp. 847-861 ◽

Cited By ~ 9

Author(s):

Yanju Li ◽

Gordon P. Brooks ◽

George A. Johanson

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Item Discrimination ◽

Item Functioning

Download Full-text

Effect of Purification Procedures on DIF Analysis in IRTPRO

Educational and Psychological Measurement ◽

10.1177/0013164416645844 ◽

2016 ◽

Vol 77 (3) ◽

pp. 415-428 ◽

Cited By ~ 1

Author(s):

David R. J. Fikis ◽

T. C. Oshima

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Simulation Study ◽

Type I Error ◽

Type I ◽

Web Based ◽

New Methods ◽

Item Functioning ◽

The Web

Purification of the test has been a well-accepted procedure in enhancing the performance of tests for differential item functioning (DIF). As defined by Lord, purification requires reestimation of ability parameters after removing DIF items before conducting the final DIF analysis. IRTPRO 3 is a recently updated program for analyses in item response theory, with built-in DIF tests but not purification procedures. A simulation study was conducted to investigate the effect of two new methods of purification. The results suggested that one of the purification procedures showed significantly improved power and Type I error. The procedure, which can be cumbersome by hand, can be easily applied by practitioners by using the web-based program developed for this study.

Download Full-text

Power and Type I Error of the Mean and Covariance Structure Analysis Model for Detecting Differential Item Functioning in Graded Response Items

Multivariate Behavioral Research ◽

10.1207/s15327906mbr4101_3 ◽

2006 ◽

Vol 41 (1) ◽

pp. 29-53 ◽

Cited By ~ 21

Author(s):

Vicente Gonzalez-Roma ◽

Ana Hernandez ◽

Juana Gomez-Benito

Keyword(s):

Differential Item Functioning ◽

Structure Analysis ◽

Type I Error ◽

Covariance Structure ◽

Covariance Structure Analysis ◽

Type I ◽

Analysis Model ◽

Item Functioning ◽

Graded Response ◽

The Mean

Download Full-text

An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

10.31234/osf.io/ufkqy ◽

2021 ◽

Author(s):

Rudolf Debelak ◽

Dries Debeer

Keyword(s):

Differential Item Functioning ◽

Item Response ◽

Type I Error ◽

Type I ◽

Item Response Models ◽

Ability Estimates ◽

Item Functioning ◽

Continuous Covariates ◽

Efficient Type ◽

Multistage Test

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects.

Download Full-text

Investigating the Performance of Propensity Score Approaches for Differential Item Functioning Analysis

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1556669280 ◽

2020 ◽

Vol 18 (1) ◽

pp. 2-26

Author(s):

Yan Liu ◽

Chanmin Kim ◽

Amery D. Wu ◽

Paul Gustafson ◽

Edward Kroc ◽

...

Keyword(s):

Propensity Score ◽

Differential Item Functioning ◽

Type I Error ◽

Model Misspecification ◽

Type I ◽

Error Type ◽

Item Functioning ◽

Different Types ◽

Differential Item Functioning Analysis ◽

Different Levels

To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification conditions, including different types and missing patterns of covariates.

Download Full-text

A Regression Discontinuity Design Framework for Controlling Selection Bias in Evaluations of Differential Item Functioning

Educational and Psychological Measurement ◽

10.1177/00131644211068440 ◽

2022 ◽

pp. 001316442110684

Author(s):

Natalie A. Koziol ◽

J. Marc Goodrich ◽

HyeonJin Yoon

Keyword(s):

Differential Item Functioning ◽

Selection Bias ◽

Type I Error ◽

Regression Discontinuity ◽

Type I ◽

Regression Discontinuity Design ◽

Test Accommodations ◽

Item Functioning ◽

Minimal Bias ◽

New Framework

Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.

Download Full-text

An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

Psych ◽

10.3390/psych3040040 ◽

2021 ◽

Vol 3 (4) ◽

pp. 619-639

Author(s):

Rudolf Debelak ◽

Dries Debeer

Keyword(s):

Differential Item Functioning ◽

Item Response ◽

Type I Error ◽

Type I ◽

Item Response Models ◽

Ability Estimates ◽

Item Functioning ◽

Continuous Covariates ◽

Efficient Type ◽

Multistage Test

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects.

Download Full-text

The Effects of Purification and the Evaluation of Differential Item Functioning With the Likelihood Ratio Test

Methodology ◽

10.1027/1614-2241/a000046 ◽

2012 ◽

Vol 8 (4) ◽

pp. 134-145 ◽

Cited By ~ 6

Author(s):

Fabiola González-Betanzos ◽

Francisco J. Abad

Keyword(s):

Differential Item Functioning ◽

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Type I Error ◽

Error Rates ◽

Ratio Test ◽

Type I ◽

Two Stage ◽

Item Functioning ◽

Size Type

The current research compares the effects of several strategies to establish the anchor subtest when detecting for differential item functioning (DIF) using the IRT likelihood ratio test in one- and two-stage procedures. Two one-stage strategies were examined: (1) “One item” and (2) “All other items” used as anchor. Additionally, two two-stage strategies were tested: (3) “One anchor item with posterior anchor test augmentation” and (4) “All other items with purification.” The strategies were compared in a simulation study, where sample sizes, DIF size, type of DIF, and software implementation (MULTILOG vs. IRTLRDIF) were manipulated. Results indicated that Procedure (1) was more efficient than (2). Purification was found to improve Type I error rates substantially with the “all other items” strategy, while “posterior anchor test augmentation” did not yield a significant improvement. In relation to the effect of the software used, we found that MULTILOG generally offers better results than IRTLRDIF.

Download Full-text