Treatments of Differential Item Functioning: A Comparison of Four Methods

Test Fairness ◽

Multiple Group ◽

Group Modeling ◽

Treatment Conditions ◽

Item Functioning ◽

Wide Range ◽

Testing Environments ◽

Group Comparisons

Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring, multiple-group modeling, and modeling DIF as a secondary dimension. Results of this study provide indications about which approach could be applied for items showing DIF for a wide range of testing environments requiring reliable treatment.

Differential Item Functioning in Brief Instruments of Disordered Eating

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000472 ◽

2019 ◽

Vol 35 (6) ◽

pp. 823-833 ◽

Cited By ~ 4

Author(s):

Desiree Thielemann ◽

Felicitas Richter ◽

Bernd Strauss ◽

Elmar Braehler ◽

Uwe Altmann ◽

...

Keyword(s):

Disordered Eating ◽

Structural Equation ◽

Young Female ◽

Eating Attitudes ◽

Equation Model ◽

German Population ◽

Test Fairness ◽

Item Functioning ◽

Multiple Indicator

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.

Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models

Applied Psychological Measurement ◽

10.1177/0146621620965745 ◽

2020 ◽

Vol 45 (1) ◽

pp. 37-53

Author(s):

Wenchao Ma ◽

Ragip Terzi ◽

Jimmy de la Torre

Keyword(s):

Type I Error ◽

Search Algorithm ◽

Real Data ◽

Error Rates ◽

Cognitive Diagnosis ◽

Type I ◽

Wald Tests ◽

Multiple Group ◽

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

Multiple-Group Noncompensatory Differential Item Functioning in Raju's Differential Functioning of Items and Tests

International Journal of Testing ◽

10.1080/15305058.2015.1009980 ◽

2015 ◽

Vol 15 (3) ◽

pp. 254-273

Author(s):

T. C. Oshima ◽

Keith Wright ◽

Nick White

Keyword(s):

Differential Functioning ◽

Multiple Group ◽

Negatively phrased items of the Autism Spectrum Quotient function differently for groups with and without autism

Autism ◽

10.1177/1362361319828361 ◽

2019 ◽

Vol 23 (7) ◽

pp. 1752-1764 ◽

Cited By ~ 5

Author(s):

Joost A Agelink van Rentergem ◽

Anne Geeke Lever ◽

Hilde M Geurts

Keyword(s):

Latent Trait ◽

Regression Tree ◽

Autistic Traits ◽

Autism Spectrum ◽

Autism Spectrum Quotient ◽

Tree Model ◽

Item Functioning ◽

Regression Tree Model ◽

Group Comparisons

The Autism Spectrum Quotient is a widely used instrument for the detection of autistic traits. However, the validity of comparisons of Autism Spectrum Quotient scores between groups may be threatened by differential item functioning. Differential item functioning entails a bias in items, where participants with equal values of the latent trait give different answers because of their group membership. In this article, items of the Autism Spectrum Quotient were studied for differential item functioning between different groups within a single sample ( N = 408). Three analyses were conducted. First, using a Rasch mixture model, two latent groups were detected that show differential item functioning. Second, using a Rasch regression tree model, four groups were found that show differential item functioning: men without autism, women without autism, people 50 years and younger with autism, and people older than 50 years with autism. Third, using traditional methods, differential item functioning was detected between groups with and without autism. Therefore, group comparisons with the Autism Spectrum Quotient are at risk of being affected by bias. Eight items emerged that consistently show differences in response tendencies between groups across analyses, and these items were generally negatively phrased. Two often-used short forms of the Autism Spectrum Quotient, the AQ-28 and AQ-10, may be more suitable for group comparisons.

Supplemental Material for Permutation Randomization Methods for Testing Measurement Equivalence and Detecting Differential Item Functioning in Multiple-Group Confirmatory Factor Analysis

Psychological Methods ◽

10.1037/met0000152.supp ◽

2017 ◽

Keyword(s):

Factor Analysis ◽

Confirmatory Factor Analysis ◽

Measurement Equivalence ◽

Multiple Group ◽

Item Functioning ◽

Confirmatory Factor

Test fairness and assessment of differential item functioning of mathematics achievement test for senior secondary students in Cross River state, Nigeria using item response theory

Global Journal of Educational Research ◽

10.4314/gjedr.v20i1.6 ◽

2021 ◽

Vol 20 (1) ◽

pp. 55-62

Author(s):

Anthony Pius Effiom

Keyword(s):

Item Response Theory ◽

Mathematics Achievement ◽

Item Response ◽

Female Students ◽

Response Theory ◽

Test Fairness ◽

Cross River State ◽

Test Items ◽

This study used Item Response Theory approach to assess Differential Item Functioning (DIF) and detect item bias in Mathematics Achievement Test (MAT). The MAT was administered to 1,751 SS2 students in public secondary schools in Cross River State. Instrumentation research design was used to develop and validate a 50-item instrument. Data were analysed using the maximum likelihood estimation technique of BILOG-MG V3 software. The result of the study revealed that 6% of the total items exhibited differential item functioning between the male and female students. Based on the analysis, the study observed that there was sex bias on some of the test items in the MAT. DIF analysis attempt at eliminating irrelevant factors and sources of bias from any kind for a test to yield valid results is among the best methods of recent. As such, test developers and policymakers are recommended to take into serious consideration and exercise care in fair test practice by dedicating effort to more unbiased test development and decision making. Examination bodies should adopt the Item Response Theory in educational testing and test developers should therefore be mindful of the test items that can cause bias in response pattern between male and female students or any sub-group of consideration. Keywords: Assessment, Differential Item Functioning, Validity, Reliability, Test Fairness, Item Bias, Item Response Theory.

Differential Item Functioning Detection Across Two Methods of Defining Group Comparisons

Educational and Psychological Measurement ◽

10.1177/0013164414549764 ◽

2014 ◽

Vol 75 (4) ◽

pp. 648-676

Author(s):

Halil Ibrahim Sari ◽

Anne Corinne Huggins

Keyword(s):

Item Functioning ◽

Group Comparisons

Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

Multivariate Behavioral Research ◽

10.1080/00273171.2011.606757 ◽

2011 ◽

Vol 46 (5) ◽

pp. 733-755 ◽

Cited By ~ 11

Author(s):

David Magis ◽

Paul De Boeck

Keyword(s):

Outlier Detection ◽

Multivariate Outlier Detection ◽

Group Settings ◽

Multiple Group ◽

Item Functioning ◽

Detection Approach

Modeling Differential Item Functioning Using a Generalization of the Multiple-Group Bifactor Model

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998611432173 ◽

2013 ◽

Vol 38 (1) ◽

pp. 32-60 ◽

Cited By ~ 21

Author(s):

Minjeong Jeon ◽

Frank Rijmen ◽

Sophia Rabe-Hesketh

Keyword(s):

Bifactor Model ◽

Multiple Group ◽

Checking Equity: Why Differential Item Functioning Analysis Should Be a Routine Part of Developing Conceptual Assessments

CBE—Life Sciences Education ◽

10.1187/cbe.16-10-0307 ◽

2017 ◽

Vol 16 (2) ◽

pp. rm2 ◽

Cited By ~ 16

Author(s):

Patrícia Martinková ◽

Adéla Drabinová ◽

Yuan-Ling Liaw ◽

Elizabeth A. Sanders ◽

Jenny L. McFarland ◽

...

Keyword(s):

Simulated Data ◽

Biology Education ◽

Test Fairness ◽

Data Set ◽

Education Literature ◽

Item Functioning ◽

Significant Difference ◽

Differential Item Functioning Analysis ◽

Methodological Approaches

We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments.