Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

Background. The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.

Download Full-text

A Simulation Study to Assess the Effect of the Number of Response Categories on the Power of Ordinal Logistic Regression for Differential Item Functioning Analysis in Rating Scales

Computational and Mathematical Methods in Medicine ◽

10.1155/2016/5080826 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Elahe Allahyari ◽

Peyman Jafari ◽

Zahra Bagheri

Keyword(s):

Logistic Regression ◽

Sample Size ◽

Differential Item Functioning ◽

Rating Scales ◽

Error Rates ◽

Ordinal Logistic Regression ◽

Type I ◽

Item Functioning ◽

Quality Of Life Scale ◽

The Impact

Objective.The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research.Methods.A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups.Results.Whenθwas distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively.Conclusions.The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected.

Download Full-text

A Log-Linear Modeling Approach for Differential Item Functioning Detection in Polytomously Scored Items

Educational and Psychological Measurement ◽

10.1177/0013164419853000 ◽

2019 ◽

Vol 80 (1) ◽

pp. 145-162

Author(s):

Gonca Yesiltas ◽

Insu Paek

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Function Analysis ◽

Ordinal Logistic Regression ◽

Categorical Variables ◽

Type I ◽

Detection Rates ◽

Item Functioning ◽

Log Linear ◽

Observed Score

A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were manipulated. Also, the performance of LLM was compared with that of other observed score–based DIF methods, namely ordinal logistic regression, logistic discriminant function analysis, Mantel, and generalized Mantel-Haenszel, regarding their Type I error (rejection rates) and power (DIF detection rates). For the observed score matching stratification in LLM, 5 and 10 strata were used. Overall, generalized Mantel-Haenszel and LLM with 10 strata showed better performance than other methods, whereas ordinal logistic regression and Mantel showed poor performance in detecting balanced DIF where the DIF direction is opposite in the two pairs of categories and partial DIF where DIF exists only in some of the categories.

Download Full-text

Detecting Differential Item Functioning Using Multiple-Group Cognitive Diagnosis Models

Applied Psychological Measurement ◽

10.1177/0146621620965745 ◽

2020 ◽

Vol 45 (1) ◽

pp. 37-53

Author(s):

Wenchao Ma ◽

Ragip Terzi ◽

Jimmy de la Torre

Keyword(s):

Differential Item Functioning ◽

Type I Error ◽

Search Algorithm ◽

Real Data ◽

Error Rates ◽

Cognitive Diagnosis ◽

Type I ◽

Wald Tests ◽

Multiple Group ◽

Item Functioning

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.

Download Full-text

A Comparative Study of the Bias Correction Methods for Differential Item Functioning Analysis in Logistic Regression with Rare Events Data

BioMed Research International ◽

10.1155/2020/1632350 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Marjan Faghih ◽

Zahra Bagheri ◽

Dejan Stevanovic ◽

Seyyed Mohhamad Taghi Ayatollahi ◽

Peyman Jafari

Keyword(s):

Logistic Regression ◽

Maximum Likelihood ◽

Sample Size ◽

Differential Item Functioning ◽

Bias Correction ◽

Rare Events ◽

Estimation Methods ◽

Type I ◽

Item Functioning ◽

Events Data

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.

Download Full-text

lordif: AnRPackage for Detecting Differential Item Functioning Using Iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations

Journal of Statistical Software ◽

10.18637/jss.v039.i08 ◽

2011 ◽

Vol 39 (8) ◽

Cited By ~ 198

Author(s):

Seung W. Choi ◽

Laura E. Gibbons ◽

Paul K. Crane

Keyword(s):

Monte Carlo ◽

Logistic Regression ◽

Item Response Theory ◽

Monte Carlo Simulations ◽

Differential Item Functioning ◽

Item Response ◽

Ordinal Logistic Regression ◽

Response Theory ◽

Item Functioning

Download Full-text

A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure

Quality of Life Research ◽

10.1007/s11136-014-0719-3 ◽

2014 ◽

Vol 23 (10) ◽

pp. 2883-2888 ◽

Cited By ~ 17

Author(s):

Isobel M. Cameron ◽

Neil W. Scott ◽

Mats Adler ◽

Ian C. Reid

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Rasch Analysis ◽

Depression Scale ◽

Ordinal Logistic Regression ◽

Hospital Anxiety ◽

Chi Square ◽

Item Functioning ◽

Hospital Anxiety Depression Scale ◽

Anxiety Depression

Download Full-text

A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression

Quality of Life Research ◽

10.1007/s11136-007-9185-5 ◽

2007 ◽

Vol 16 (S1) ◽

pp. 69-84 ◽

Cited By ~ 82

Author(s):

Paul K. Crane ◽

Laura E. Gibbons ◽

Katja Ocepek-Welikson ◽

Karon Cook ◽

David Cella ◽

...

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Ordinal Logistic Regression ◽

Item Functioning

Download Full-text

Improving the Assessment of Differential Item Functioning in Large-Scale Programs With Dual-Scale Purification of Rasch Models: The PISA Example

Applied Psychological Measurement ◽

10.1177/0146621617726786 ◽

2017 ◽

Vol 42 (3) ◽

pp. 206-220 ◽

Cited By ~ 1

Author(s):

Cheng-Te Chen ◽

Bo-Sien Hwu

Keyword(s):

Missing Data ◽

Differential Item Functioning ◽

Large Scale ◽

Student Assessment ◽

Error Rates ◽

Type I ◽

Purification Procedure ◽

International Student Assessment ◽

Item Functioning ◽

Dual Scale

By design, large-scale educational testing programs often have a large proportion of missing data. Since the effect of missing data on differential item functioning (DIF) assessment has been investigated in recent years and it has been found that Type I error rates tend to be inflated, it is of great importance to adapt existing DIF assessment methods to the inflation. The DIF-free-then-DIF (DFTD) strategy, which originally involved one single-scale purification procedure to identify DIF-free items, has been extended to involve another scale purification procedure for the DIF assessment in this study, and this new method is called the dual-scale purification (DSP) procedure. The performance of the DSP procedure in assessing DIF in large-scale programs, such as Program for International Student Assessment (PISA), was compared with the DFTD strategy through a series of simulation studies. Results showed the superiority of the DSP procedure over the DFTD strategy when tests consisted of many DIF items and when data were missing by design as in large-scale programs. Moreover, an empirical study of the PISA 2009 Taiwan sample was provided to show the implications of the DSP procedure. The applications as well as further studies of DSP procedure are also discussed.

Download Full-text

DIF Analysis with Multilevel Data: A Simulation Study Using the Latent Variable Approach

Journal of Educational Issues ◽

10.5296/jei.v2i2.10045 ◽

2016 ◽

Vol 2 (2) ◽

pp. 290

Author(s):

Ying Jin ◽

Hershel Eason

Keyword(s):

Logistic Regression ◽

Latent Variable ◽

Large Scale ◽

Type I Error ◽

Type I ◽

Multilevel Data ◽

Hierarchical Logistic Regression ◽

Multilevel Structure ◽

Variable Approach ◽

Inflated Type

The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling units were more likely to be clusters (e.g., schools). With short tests, regular DIF methods under MAD-present conditions might suffer from inflated type I error rate due to low reliability of test scores, which would adversely impact the matching ability of the covariate (i.e., the total score) in DIF analysis. The current study compared the performance of three DIF methods: logistic regression (LR), hierarchical logistic regression (HLR) taking multilevel structure into account, and hierarchical logistic regression with latent covariate (HLR-LC) taking multilevel structure into account as well as accounting for low reliability and MAD. The results indicated that HLR-LC outperformed both LR and HLR under most simulated conditions, especially under the MAD-present conditions when tests were short. Practical implications of the implementation of HLR-LC were also discussed.

Download Full-text

Ordinal Logistic Regression to Detect Differential Item Functioning for Gender in the Institutional Integration Scale

Journal of College Student Retention Research Theory & Practice ◽

10.2190/cs.12.3.e ◽

2010 ◽

Vol 12 (3) ◽

pp. 339-352

Author(s):

Daniel H. Breidenbach ◽

Brian F. French

Keyword(s):

Logistic Regression ◽

Differential Item Functioning ◽

Ordinal Logistic Regression ◽

Institutional Integration ◽

Item Functioning

Download Full-text