General Statistical Techniques for Detecting Differential Item Functioning Based on Gender Subgroups: A Comparison of the Mantel-Haenszel Procedure, IRT, and Logistic Regression

Validity is one of the psychometric properties of the achievement tests. To determine the validity, one of the examinations is item bias studies, which are based on Differential Item Functioning (DIF) analyses and field experts’ opinion. In this study, field experts were asked to estimate the DIF levels of the items to compare the estimations obtained from different statistical techniques. Firstly, the experts were asked to examine the questions and make the DIF level estimations according to the gender variable for the DIF estimation, and the agreement of the experts was examined. Secondly, DIF levels were calculated by using logistic regression and the Mantel-Haenszel (MH) statistical method. Thirdly, the experts’ estimations and the statistical analysis results were compared. As a conclusion, it was observed that the experts and the statistical techniques were in agreement among themselves, and they were partially different from each other for the Sciences test and equal for the Social Sciences test. Keywords: Item bias, differential item functioning (DIF), expert estimation.

Download Full-text

EZDIF: Detection of Uniform and Nonuniform Differential Item Functioning With the Mantel-Haenszel and Logistic Regression Procedures

Applied Psychological Measurement ◽

10.1177/014662169802200409 ◽

1998 ◽

Vol 22 (4) ◽

pp. 391-391 ◽

Cited By ~ 12

Author(s):

N. G. Waller

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Item Functioning

Download Full-text

Differential Item Functioning in Grade 8 Math Using Logistic Regression, Mantel-Haenszel and Logical Data Analysis

10.7176/jep/10-12-17 ◽

2019 ◽

Keyword(s):

Logistic Regression ◽

Data Analysis ◽

Differential Item Functioning ◽

Item Functioning ◽

Logical Data ◽

Grade 8

Download Full-text

A Logistic Regression for Differential Item Functioning Primer

JLTA Journal Kiyo ◽

10.20622/jltaj.7.0_110 ◽

2005 ◽

Vol 7 (0) ◽

pp. 110-124

Author(s):

Yuko Shimizu ◽

Bruno D. Zumbo

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Item Functioning

Download Full-text

Differential Item Functioning on Raven’s SPM+ Amongst Two Convenience Samples of Yakuts and Russians

Psych ◽

10.3390/psych2010005 ◽

2020 ◽

Vol 2 (1) ◽

pp. 44-51

Author(s):

Vladimir Shibaev ◽

Andrei Grigoriev ◽

Ekaterina Valueva ◽

Anatoly Karlin

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Suitable Alternative ◽

Cultural Contexts ◽

Item Functioning ◽

Sample Data ◽

Psychometric Measurements ◽

Raven's Standard Progressive Matrices ◽

Raven's Progressive Matrices

National IQ estimates are based on psychometric measurements carried out in a variety of cultural contexts and are often obtained from Raven’s Progressive Matrices tests. In a series of studies, J. Philippe Rushton et al. have argued that these tests are not biased with respect to ethnicity or race. Critics claimed their methods were inappropriate and suggested differential item functioning (DIF) analysis as a more suitable alternative. In the present study, we conduct a DIF analysis on Raven’s Standard Progressive Matrices Plus (SPM+) tests administered to convenience samples of Yakuts and ethnic Russians. The Yakuts scored lower than the Russians by 4.8 IQ points, a difference that can be attributed to the selectiveness of the Russian sample. Data from the Yakut (n = 518) and Russian (n = 956) samples were analyzed for DIF using logistic regression. Although items B9, B10, B11, B12, and C11 were identified as having uniform DIF, all of these DIF effects can be regarded as negligible (R2 <0.13). This is consistent with Rushton et al.’s arguments that the Raven’s Progressive Matrices tests are ethnically unbiased.

Download Full-text

Assessing Differential Item Functioning in Multiple Grouping Variables with Factorial Logistic Regression

Quantitative Psychology Research - Springer Proceedings in Mathematics & Statistics ◽

10.1007/978-3-319-07503-7_15 ◽

2015 ◽

pp. 243-259

Author(s):

Kuan-Yu Jin ◽

Hui-Fang Chen ◽

Wen-Chung Wang

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Item Functioning

Download Full-text

Differential item functioning of the Arm function in Multiple Sclerosis Questionnaire (AMSQ) by language, a study in six countries

Multiple Sclerosis Journal ◽

10.1177/1352458519895450 ◽

2019 ◽

Vol 27 (1) ◽

pp. 90-96 ◽

Cited By ~ 1

Author(s):

Nynke F Kalkers ◽

Ingrid Galan ◽

Anne Kerbrat ◽

Andrea Tacchino ◽

Christian P Kamm ◽

...

Keyword(s):

Multiple Sclerosis ◽

Logistic Regression ◽

Differential Item Functioning ◽

International Studies ◽

Logistic Regression Models ◽

Item Functioning ◽

Arm Function ◽

Monitoring Treatment ◽

Group Variable ◽

Monitoring Treatment Response

Background: The Arm function in Multiple Sclerosis Questionnaire (AMSQ) has been developed as a self-reported measure of arm and hand functioning for patients with multiple sclerosis (MS). The AMSQ was originally developed in Dutch and to date translated into five languages (i.e. English, German, Spanish, French, and Italian). Objective: The aim of this study was to evaluate differential item functioning (DIF) of the AMSQ in these languages. Methods: We performed DIF analyses, using “language” as the polytomous group variable. To detect DIF, logistic regression and item response theory principles were applied. Multiple logistic regression models were evaluated. We used a pseudo R2 value of 0.02 or more as the DIF threshold. Results: A total of 1733 male and female patients with all subtypes of MS were included. The DIF analysis for the whole dataset showed no uniform or non-uniform DIF on any of the 31 items. All R2 values were below 0.02. Conclusion: The AMSQ is validated in six languages. All items have the same meaning to MS patients in Dutch, English, German, Spanish, French, and Italian. This validation study enables use of the AMSQ in international studies, for monitoring treatment response and disease progression.

Download Full-text

Binary Logistic Regression Analysis for Detecting Differential Item Functioning

Educational and Psychological Measurement ◽

10.1177/0013164414523618 ◽

2014 ◽

Vol 74 (6) ◽

pp. 927-949 ◽

Cited By ~ 3

Author(s):

Mª Dolores Hidalgo ◽

Juana Gómez-Benito ◽

Bruno D. Zumbo

Keyword(s):

Logistic Regression ◽

Regression Analysis ◽

Differential Item Functioning ◽

Logistic Regression Analysis ◽

Binary Logistic Regression ◽

Binary Logistic Regression Analysis ◽

Item Functioning

Download Full-text

Using Logistic Regression and the Mantel-Haenszel With Multiple Ability Estimates to Detect Differential Item Functioning

Journal of Educational Measurement ◽

10.1111/j.1745-3984.1995.tb00459.x ◽

1995 ◽

Vol 32 (2) ◽

pp. 131-144 ◽

Cited By ~ 32

Author(s):

Kathleen M. Mazor ◽

Anil Kanjee ◽

Brian E. Clauser

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Ability Estimates ◽

Item Functioning

Download Full-text

A Simulation Study to Assess the Effect of the Number of Response Categories on the Power of Ordinal Logistic Regression for Differential Item Functioning Analysis in Rating Scales

Computational and Mathematical Methods in Medicine ◽

10.1155/2016/5080826 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Elahe Allahyari ◽

Peyman Jafari ◽

Zahra Bagheri

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Differential Item Functioning ◽

Rating Scales ◽

Error Rates ◽

Ordinal Logistic Regression ◽

Type I ◽

Item Functioning ◽

Quality Of Life Scale ◽

The Impact

Objective.The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research.Methods.A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups.Results.Whenθwas distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively.Conclusions.The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected.

Download Full-text