An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects.

Download Full-text

Effect of Purification Procedures on DIF Analysis in IRTPRO

Educational and Psychological Measurement ◽

10.1177/0013164416645844 ◽

2016 ◽

Vol 77 (3) ◽

pp. 415-428 ◽

Cited By ~ 1

Author(s):

David R. J. Fikis ◽

T. C. Oshima

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Simulation Study ◽

Type I Error ◽

Type I ◽

Web Based ◽

New Methods ◽

Item Functioning ◽

The Web

Purification of the test has been a well-accepted procedure in enhancing the performance of tests for differential item functioning (DIF). As defined by Lord, purification requires reestimation of ability parameters after removing DIF items before conducting the final DIF analysis. IRTPRO 3 is a recently updated program for analyses in item response theory, with built-in DIF tests but not purification procedures. A simulation study was conducted to investigate the effect of two new methods of purification. The results suggested that one of the purification procedures showed significantly improved power and Type I error. The procedure, which can be cumbersome by hand, can be easily applied by practitioners by using the web-based program developed for this study.

Download Full-text

Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models

Applied Psychological Measurement ◽

10.1177/0146621620965745 ◽

2020 ◽

Vol 45 (1) ◽

pp. 37-53

Author(s):

Wenchao Ma ◽

Ragip Terzi ◽

Jimmy de la Torre

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Search Algorithm ◽

Real Data ◽

Error Rates ◽

Cognitive Diagnosis ◽

Type I ◽

Wald Tests ◽

Multiple Group ◽

Item Functioning

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

Download Full-text

Modeling and Testing Differential Item Functioning in Unidimensional Binary Item Response Models with a Single Continuous Covariate: A Functional Data Analysis Approach

Psychometrika ◽

10.1007/s11336-015-9473-x ◽

2015 ◽

Vol 81 (2) ◽

pp. 371-398 ◽

Cited By ~ 2

Author(s):

Yang Liu ◽

Brooke E. Magnus ◽

David Thissen

Keyword(s):

Data Analysis ◽

Differential Item Functioning ◽

Item Response ◽

Functional Data Analysis ◽

Functional Data ◽

Response Models ◽

Item Response Models ◽

Item Functioning ◽

Continuous Covariate ◽

Binary Item

Download Full-text

Bayesian modelling of differential item functioning: type I error and power rates in the presence of non-normal ability distributions, impact, and anchor set contamination

International Journal of Quantitative Research in Education ◽

10.1504/ijqre.2013.058305 ◽

2013 ◽

Vol 1 (4) ◽

pp. 341

Author(s):

W. Holmes Finch ◽

Brian F. French

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Bayesian Modelling ◽

Item Functioning

Download Full-text

Effect of Unequal Variances in Proficiency Distributions on Type-I Error of the Mantel-Haenszel Chi-square Test for Differential Item Functioning

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2005.00006 ◽

2005 ◽

Vol 42 (2) ◽

pp. 101-131 ◽

Cited By ~ 7

Author(s):

Patrick O. Monahan ◽

Robert D. Ankenmann

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Chi Square ◽

Unequal Variances ◽

Item Functioning ◽

Chi Square Test

Download Full-text

Item Discrimination and Type I Error in the Detection of Differential Item Functioning

Educational and Psychological Measurement ◽

10.1177/0013164411432333 ◽

2012 ◽

Vol 72 (5) ◽

pp. 847-861 ◽

Cited By ~ 9

Author(s):

Yanju Li ◽

Gordon P. Brooks ◽

George A. Johanson

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Item Discrimination ◽

Item Functioning

Download Full-text

Power and Type I Error of the Mean and Covariance Structure Analysis Model for Detecting Differential Item Functioning in Graded Response Items

Multivariate Behavioral Research ◽

10.1207/s15327906mbr4101_3 ◽

2006 ◽

Vol 41 (1) ◽

pp. 29-53 ◽

Cited By ~ 21

Author(s):

Vicente Gonzalez-Roma ◽

Ana Hernandez ◽

Juana Gomez-Benito

Keyword(s):

Differential Item Functioning ◽

Structure Analysis ◽

Type I Error ◽

Covariance Structure ◽

Covariance Structure Analysis ◽

Type I ◽

Analysis Model ◽

Item Functioning ◽

Graded Response ◽

The Mean

Download Full-text

Investigating the Performance of Propensity Score Approaches for Differential Item Functioning Analysis

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1556669280 ◽

2020 ◽

Vol 18 (1) ◽

pp. 2-26

Author(s):

Yan Liu ◽

Chanmin Kim ◽

Amery D. Wu ◽

Paul Gustafson ◽

Edward Kroc ◽

...

Keyword(s):

Propensity Score ◽

Differential Item Functioning ◽

Type I Error ◽

Model Misspecification ◽

Type I ◽

Error Type ◽

Item Functioning ◽

Different Types ◽

Differential Item Functioning Analysis ◽

Different Levels

To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification conditions, including different types and missing patterns of covariates.

Download Full-text

A Regression Discontinuity Design Framework for Controlling Selection Bias in Evaluations of Differential Item Functioning

Educational and Psychological Measurement ◽

10.1177/00131644211068440 ◽

2022 ◽

pp. 001316442110684

Author(s):

Natalie A. Koziol ◽

J. Marc Goodrich ◽

HyeonJin Yoon

Keyword(s):

Differential Item Functioning ◽

Selection Bias ◽

Type I Error ◽

Regression Discontinuity ◽

Type I ◽

Regression Discontinuity Design ◽

Test Accommodations ◽

Item Functioning ◽

Minimal Bias ◽

New Framework

Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.

Download Full-text