A Regression Discontinuity Design Framework for Controlling Selection Bias in Evaluations of Differential Item Functioning

Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models

Applied Psychological Measurement ◽

10.1177/0146621620965745 ◽

2020 ◽

Vol 45 (1) ◽

pp. 37-53

Author(s):

Wenchao Ma ◽

Ragip Terzi ◽

Jimmy de la Torre

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Search Algorithm ◽

Real Data ◽

Error Rates ◽

Cognitive Diagnosis ◽

Type I ◽

Wald Tests ◽

Multiple Group ◽

Item Functioning

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

Get full-text (via PubEx)

Bayesian modelling of differential item functioning: type I error and power rates in the presence of non-normal ability distributions, impact, and anchor set contamination

International Journal of Quantitative Research in Education ◽

10.1504/ijqre.2013.058305 ◽

2013 ◽

Vol 1 (4) ◽

pp. 341

Author(s):

W. Holmes Finch ◽

Brian F. French

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Bayesian Modelling ◽

Item Functioning

Get full-text (via PubEx)

Regression Discontinuity and Differential Item Functioning

10.31234/osf.io/2jyqb ◽

2021 ◽

Author(s):

John Marc Goodrich ◽

Natalie Koziol ◽

HyeonJin Yoon

Keyword(s):

Differential Item Functioning ◽

Selection Bias ◽

English Language ◽

Regression Discontinuity ◽

Traditional Approach ◽

Item Bias ◽

Test Items ◽

Mathematics Ability ◽

Novel Approach ◽

Item Functioning

When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English (Tabaku, Carbuccia-Abbott, & Saavedra, 2018). The degree to which alternate-language test items function equivalently must be evaluated, but traditional methods of investigating measurement equivalence may be confounded by group differences on characteristics other than ability level and language form. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Results indicated a minority of items functioned differently across the Spanish and English forms, and subsequent item content scrutiny indicated no plausible evidence of item bias. Evidence of selection bias—differences between groups in SES, age, and country of birth, in addition to mathematics ability and form language—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. Fewer items exhibited DIF when controlling for selection bias (11% vs. 25%), and the type and direction of DIF differed upon controlling for selection bias.

Get full-text (via PubEx)

Effect of Unequal Variances in Proficiency Distributions on Type-I Error of the Mantel-Haenszel Chi-square Test for Differential Item Functioning

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2005.00006 ◽

2005 ◽

Vol 42 (2) ◽

pp. 101-131 ◽

Cited By ~ 7

Author(s):

Patrick O. Monahan ◽

Robert D. Ankenmann

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Chi Square ◽

Unequal Variances ◽

Item Functioning ◽

Chi Square Test

Get full-text (via PubEx)

Item Discrimination and Type I Error in the Detection of Differential Item Functioning

Educational and Psychological Measurement ◽

10.1177/0013164411432333 ◽

2012 ◽

Vol 72 (5) ◽

pp. 847-861 ◽

Cited By ~ 9

Author(s):

Yanju Li ◽

Gordon P. Brooks ◽

George A. Johanson

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Type I ◽

Item Discrimination ◽

Item Functioning

Get full-text (via PubEx)

Effect of Purification Procedures on DIF Analysis in IRTPRO

Educational and Psychological Measurement ◽

10.1177/0013164416645844 ◽

2016 ◽

Vol 77 (3) ◽

pp. 415-428 ◽

Cited By ~ 1

Author(s):

David R. J. Fikis ◽

T. C. Oshima

Keyword(s):

Item Response Theory ◽

Differential Item Functioning ◽

Item Response ◽

Simulation Study ◽

Type I Error ◽

Type I ◽

Web Based ◽

New Methods ◽

Item Functioning ◽

The Web

Purification of the test has been a well-accepted procedure in enhancing the performance of tests for differential item functioning (DIF). As defined by Lord, purification requires reestimation of ability parameters after removing DIF items before conducting the final DIF analysis. IRTPRO 3 is a recently updated program for analyses in item response theory, with built-in DIF tests but not purification procedures. A simulation study was conducted to investigate the effect of two new methods of purification. The results suggested that one of the purification procedures showed significantly improved power and Type I error. The procedure, which can be cumbersome by hand, can be easily applied by practitioners by using the web-based program developed for this study.

Get full-text (via PubEx)

Power and Type I Error of the Mean and Covariance Structure Analysis Model for Detecting Differential Item Functioning in Graded Response Items

Multivariate Behavioral Research ◽

10.1207/s15327906mbr4101_3 ◽

2006 ◽

Vol 41 (1) ◽

pp. 29-53 ◽

Cited By ~ 21

Author(s):

Vicente Gonzalez-Roma ◽

Ana Hernandez ◽

Juana Gomez-Benito

Keyword(s):

Differential Item Functioning ◽

Structure Analysis ◽

Type I Error ◽

Covariance Structure ◽

Covariance Structure Analysis ◽

Type I ◽

Analysis Model ◽

Item Functioning ◽

Graded Response ◽

The Mean

Get full-text (via PubEx)

An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

10.31234/osf.io/ufkqy ◽

2021 ◽

Author(s):

Rudolf Debelak ◽

Dries Debeer

Keyword(s):

Differential Item Functioning ◽

Item Response ◽

Type I Error ◽

Type I ◽

Item Response Models ◽

Ability Estimates ◽

Item Functioning ◽

Continuous Covariates ◽

Efficient Type ◽

Multistage Test

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects.

Get full-text (via PubEx)

Investigating the Performance of Propensity Score Approaches for Differential Item Functioning Analysis

Journal of Modern Applied Statistical Methods ◽

10.22237/jmasm/1556669280 ◽

2020 ◽

Vol 18 (1) ◽

pp. 2-26

Author(s):

Yan Liu ◽

Chanmin Kim ◽

Amery D. Wu ◽

Paul Gustafson ◽

Edward Kroc ◽

...

Keyword(s):

Propensity Score ◽

Differential Item Functioning ◽

Type I Error ◽

Model Misspecification ◽

Type I ◽

Error Type ◽

Item Functioning ◽

Different Types ◽

Differential Item Functioning Analysis ◽

Different Levels

To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification conditions, including different types and missing patterns of covariates.

Get full-text (via PubEx)

An Evaluation of DIF Tests in Multistage Tests for Continuous Covariates

Psych ◽

10.3390/psych3040040 ◽

2021 ◽

Vol 3 (4) ◽

pp. 619-639

Author(s):

Rudolf Debelak ◽

Dries Debeer

Keyword(s):

Differential Item Functioning ◽

Item Response ◽

Type I Error ◽

Type I ◽

Item Response Models ◽

Ability Estimates ◽

Item Functioning ◽

Continuous Covariates ◽

Efficient Type ◽

Multistage Test

Multistage tests are a widely used and efficient type of test presentation that aims to provide accurate ability estimates while keeping the test relatively short. Multistage tests typically rely on the psychometric framework of item response theory. Violations of item response models and other assumptions underlying a multistage test, such as differential item functioning, can lead to inaccurate ability estimates and unfair measurements. There is a practical need for methods to detect problematic model violations to avoid these issues. This study compares and evaluates three methods for the detection of differential item functioning with regard to continuous person covariates in data from multistage tests: a linear logistic regression test and two adaptations of a recently proposed score-based DIF test. While all tests show a satisfactory Type I error rate, the score-based tests show greater power against three types of DIF effects.

Get full-text (via PubEx)