Treatments of Differential Item Functioning: A Comparison of Four Methods

2021 ◽  
pp. 001316442110120
Author(s):  
Xiaowen Liu ◽  
H. Jane Rogers

Test fairness is critical to the validity of group comparisons involving gender, ethnicities, culture, or treatment conditions. Detection of differential item functioning (DIF) is one component of efforts to ensure test fairness. The current study compared four treatments for items that have been identified as showing DIF: deleting, ignoring, multiple-group modeling, and modeling DIF as a secondary dimension. Results of this study provide indications about which approach could be applied for items showing DIF for a wide range of testing environments requiring reliable treatment.

2019 ◽  
Vol 35 (6) ◽  
pp. 823-833 ◽  
Author(s):  
Desiree Thielemann ◽  
Felicitas Richter ◽  
Bernd Strauss ◽  
Elmar Braehler ◽  
Uwe Altmann ◽  
...  

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.


2020 ◽  
Vol 45 (1) ◽  
pp. 37-53
Author(s):  
Wenchao Ma ◽  
Ragip Terzi ◽  
Jimmy de la Torre

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.


Autism ◽  
2019 ◽  
Vol 23 (7) ◽  
pp. 1752-1764 ◽  
Author(s):  
Joost A Agelink van Rentergem ◽  
Anne Geeke Lever ◽  
Hilde M Geurts

The Autism Spectrum Quotient is a widely used instrument for the detection of autistic traits. However, the validity of comparisons of Autism Spectrum Quotient scores between groups may be threatened by differential item functioning. Differential item functioning entails a bias in items, where participants with equal values of the latent trait give different answers because of their group membership. In this article, items of the Autism Spectrum Quotient were studied for differential item functioning between different groups within a single sample ( N = 408). Three analyses were conducted. First, using a Rasch mixture model, two latent groups were detected that show differential item functioning. Second, using a Rasch regression tree model, four groups were found that show differential item functioning: men without autism, women without autism, people 50 years and younger with autism, and people older than 50 years with autism. Third, using traditional methods, differential item functioning was detected between groups with and without autism. Therefore, group comparisons with the Autism Spectrum Quotient are at risk of being affected by bias. Eight items emerged that consistently show differences in response tendencies between groups across analyses, and these items were generally negatively phrased. Two often-used short forms of the Autism Spectrum Quotient, the AQ-28 and AQ-10, may be more suitable for group comparisons.


2021 ◽  
Vol 20 (1) ◽  
pp. 55-62
Author(s):  
Anthony Pius Effiom

This study used Item Response Theory approach to assess Differential Item Functioning (DIF) and detect item bias in Mathematics Achievement Test (MAT). The MAT was administered to 1,751 SS2 students in public secondary schools in Cross River State. Instrumentation research design was used to develop and validate a 50-item instrument. Data were analysed using the maximum likelihood estimation technique of BILOG-MG V3 software. The result of the study revealed that 6% of the total items exhibited differential item functioning between the male and female students. Based on the analysis, the study observed that there was sex bias on some of the test items in the MAT. DIF analysis attempt at eliminating irrelevant factors and sources of bias from any kind for a test to yield valid results is among the best methods of recent. As such, test developers and policymakers are recommended to take into serious consideration and exercise care in fair test practice by dedicating effort to more unbiased test development and decision making. Examination bodies should adopt the Item Response Theory in educational testing and test developers should therefore be mindful of the test items that can cause bias in response pattern between male and female students or any sub-group of consideration. Keywords: Assessment, Differential Item Functioning, Validity, Reliability, Test Fairness, Item Bias, Item Response Theory.


2017 ◽  
Vol 16 (2) ◽  
pp. rm2 ◽  
Author(s):  
Patrícia Martinková ◽  
Adéla Drabinová ◽  
Yuan-Ling Liaw ◽  
Elizabeth A. Sanders ◽  
Jenny L. McFarland ◽  
...  

We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments.


Sign in / Sign up

Export Citation Format

Share Document