General Statistical Techniques for Detecting Differential Item Functioning Based on Gender Subgroups: A Comparison of the Mantel-Haenszel Procedure, IRT, and Logistic Regression

Author(s):  
Lisa K. Kendhammer ◽  
Kristen L. Murphy
2018 ◽  
Vol 13 (2) ◽  
pp. 137-148
Author(s):  
Nuri Doğan ◽  
Ronald K Hambleton ◽  
Meltem Yurtcu ◽  
Sinan Yavuz

Validity is one of the psychometric properties of the achievement tests. To determine the validity, one of the examinations is item bias studies, which are based on Differential Item Functioning (DIF) analyses and field experts’ opinion. In this study, field experts were asked to estimate the DIF levels of the items to compare the estimations obtained from different statistical techniques. Firstly, the experts were asked to examine the questions and make the DIF level estimations according to the gender variable for the DIF estimation, and the agreement of the experts was examined. Secondly, DIF levels were calculated by using logistic regression and the Mantel-Haenszel (MH) statistical method. Thirdly, the experts’ estimations and the statistical analysis results were compared. As a conclusion, it was observed that the experts and the statistical techniques were in agreement among themselves, and they were partially different from each other for the Sciences test and equal for the Social Sciences test. Keywords: Item bias, differential item functioning (DIF), expert estimation.


Psych ◽  
2020 ◽  
Vol 2 (1) ◽  
pp. 44-51
Author(s):  
Vladimir Shibaev ◽  
Andrei Grigoriev ◽  
Ekaterina Valueva ◽  
Anatoly Karlin

National IQ estimates are based on psychometric measurements carried out in a variety of cultural contexts and are often obtained from Raven’s Progressive Matrices tests. In a series of studies, J. Philippe Rushton et al. have argued that these tests are not biased with respect to ethnicity or race. Critics claimed their methods were inappropriate and suggested differential item functioning (DIF) analysis as a more suitable alternative. In the present study, we conduct a DIF analysis on Raven’s Standard Progressive Matrices Plus (SPM+) tests administered to convenience samples of Yakuts and ethnic Russians. The Yakuts scored lower than the Russians by 4.8 IQ points, a difference that can be attributed to the selectiveness of the Russian sample. Data from the Yakut (n = 518) and Russian (n = 956) samples were analyzed for DIF using logistic regression. Although items B9, B10, B11, B12, and C11 were identified as having uniform DIF, all of these DIF effects can be regarded as negligible (R2 <0.13). This is consistent with Rushton et al.’s arguments that the Raven’s Progressive Matrices tests are ethnically unbiased.


2019 ◽  
Vol 27 (1) ◽  
pp. 90-96 ◽  
Author(s):  
Nynke F Kalkers ◽  
Ingrid Galan ◽  
Anne Kerbrat ◽  
Andrea Tacchino ◽  
Christian P Kamm ◽  
...  

Background: The Arm function in Multiple Sclerosis Questionnaire (AMSQ) has been developed as a self-reported measure of arm and hand functioning for patients with multiple sclerosis (MS). The AMSQ was originally developed in Dutch and to date translated into five languages (i.e. English, German, Spanish, French, and Italian). Objective: The aim of this study was to evaluate differential item functioning (DIF) of the AMSQ in these languages. Methods: We performed DIF analyses, using “language” as the polytomous group variable. To detect DIF, logistic regression and item response theory principles were applied. Multiple logistic regression models were evaluated. We used a pseudo R2 value of 0.02 or more as the DIF threshold. Results: A total of 1733 male and female patients with all subtypes of MS were included. The DIF analysis for the whole dataset showed no uniform or non-uniform DIF on any of the 31 items. All R2 values were below 0.02. Conclusion: The AMSQ is validated in six languages. All items have the same meaning to MS patients in Dutch, English, German, Spanish, French, and Italian. This validation study enables use of the AMSQ in international studies, for monitoring treatment response and disease progression.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Elahe Allahyari ◽  
Peyman Jafari ◽  
Zahra Bagheri

Objective.The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research.Methods.A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups.Results.Whenθwas distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively.Conclusions.The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected.


Sign in / Sign up

Export Citation Format

Share Document