scholarly journals Assessment of Differential Item Functioning in Health-Related Outcomes: A Simulation and Empirical Analysis with Hierarchical Polytomous Data

2017 ◽  
Vol 2017 ◽  
pp. 1-11
Author(s):  
Zahra Sharafi ◽  
Amin Mousavi ◽  
Seyyed Mohammad Taghi Ayatollahi ◽  
Peyman Jafari

Background. The purpose of this study was to evaluate the effectiveness of two methods of detecting differential item functioning (DIF) in the presence of multilevel data and polytomously scored items. The assessment of DIF with multilevel data (e.g., patients nested within hospitals, hospitals nested within districts) from large-scale assessment programs has received considerable attention but very few studies evaluated the effect of hierarchical structure of data on DIF detection for polytomously scored items. Methods. The ordinal logistic regression (OLR) and hierarchical ordinal logistic regression (HOLR) were utilized to assess DIF in simulated and real multilevel polytomous data. Six factors (DIF magnitude, grouping variable, intraclass correlation coefficient, number of clusters, number of participants per cluster, and item discrimination parameter) with a fully crossed design were considered in the simulation study. Furthermore, data of Pediatric Quality of Life Inventory™ (PedsQL™) 4.0 collected from 576 healthy school children were analyzed. Results. Overall, results indicate that both methods performed equivalently in terms of controlling Type I error and detection power rates. Conclusions. The current study showed negligible difference between OLR and HOLR in detecting DIF with polytomously scored items in a hierarchical structure. Implications and considerations while analyzing real data were also discussed.

2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Elahe Allahyari ◽  
Peyman Jafari ◽  
Zahra Bagheri

Objective.The present study uses simulated data to find what the optimal number of response categories is to achieve adequate power in ordinal logistic regression (OLR) model for differential item functioning (DIF) analysis in psychometric research.Methods.A hypothetical ten-item quality of life scale with three, four, and five response categories was simulated. The power and type I error rates of OLR model for detecting uniform DIF were investigated under different combinations of ability distribution (θ), sample size, sample size ratio, and the magnitude of uniform DIF across reference and focal groups.Results.Whenθwas distributed identically in the reference and focal groups, increasing the number of response categories from 3 to 5 resulted in an increase of approximately 8% in power of OLR model for detecting uniform DIF. The power of OLR was less than 0.36 when ability distribution in the reference and focal groups was highly skewed to the left and right, respectively.Conclusions.The clearest conclusion from this research is that the minimum number of response categories for DIF analysis using OLR is five. However, the impact of the number of response categories in detecting DIF was lower than might be expected.


2019 ◽  
Vol 80 (1) ◽  
pp. 145-162
Author(s):  
Gonca Yesiltas ◽  
Insu Paek

A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were manipulated. Also, the performance of LLM was compared with that of other observed score–based DIF methods, namely ordinal logistic regression, logistic discriminant function analysis, Mantel, and generalized Mantel-Haenszel, regarding their Type I error (rejection rates) and power (DIF detection rates). For the observed score matching stratification in LLM, 5 and 10 strata were used. Overall, generalized Mantel-Haenszel and LLM with 10 strata showed better performance than other methods, whereas ordinal logistic regression and Mantel showed poor performance in detecting balanced DIF where the DIF direction is opposite in the two pairs of categories and partial DIF where DIF exists only in some of the categories.


2020 ◽  
Vol 45 (1) ◽  
pp. 37-53
Author(s):  
Wenchao Ma ◽  
Ragip Terzi ◽  
Jimmy de la Torre

This study proposes a multiple-group cognitive diagnosis model to account for the fact that students in different groups may use distinct attributes or use the same attributes but in different manners (e.g., conjunctive, disjunctive, and compensatory) to solve problems. Based on the proposed model, this study systematically investigates the performance of the likelihood ratio (LR) test and Wald test in detecting differential item functioning (DIF). A forward anchor item search procedure was also proposed to identify a set of anchor items with invariant item parameters across groups. Results showed that the LR and Wald tests with the forward anchor item search algorithm produced better calibrated Type I error rates than the ordinary LR and Wald tests, especially when items were of low quality. A set of real data were also analyzed to illustrate the use of these DIF detection procedures.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Marjan Faghih ◽  
Zahra Bagheri ◽  
Dejan Stevanovic ◽  
Seyyed Mohhamad Taghi Ayatollahi ◽  
Peyman Jafari

The logistic regression (LR) model for assessing differential item functioning (DIF) is highly dependent on the asymptotic sampling distributions. However, for rare events data, the maximum likelihood estimation method may be biased and the asymptotic distributions may not be reliable. In this study, the performance of the regular maximum likelihood (ML) estimation is compared with two bias correction methods including weighted logistic regression (WLR) and Firth's penalized maximum likelihood (PML) to assess DIF for imbalanced or rare events data. The power and type I error rate of the LR model for detecting DIF were investigated under different combinations of sample size, moderate and severe magnitudes of uniform DIF (DIF = 0.4 and 0.8), sample size ratio, number of items, and the imbalanced degree (τ). Indeed, as compared with WLR and for severe imbalanced degree (τ = 0.069), there were reductions of approximately 30% and 24% under DIF = 0.4 and 27% and 23% under DIF = 0.8 in the power of the PML and ML, respectively. The present study revealed that the WLR outperforms both the ML and PML estimation methods when logistic regression is used to evaluate DIF for imbalanced or rare events data.


2007 ◽  
Vol 16 (S1) ◽  
pp. 69-84 ◽  
Author(s):  
Paul K. Crane ◽  
Laura E. Gibbons ◽  
Katja Ocepek-Welikson ◽  
Karon Cook ◽  
David Cella ◽  
...  

2017 ◽  
Vol 42 (3) ◽  
pp. 206-220 ◽  
Author(s):  
Cheng-Te Chen ◽  
Bo-Sien Hwu

By design, large-scale educational testing programs often have a large proportion of missing data. Since the effect of missing data on differential item functioning (DIF) assessment has been investigated in recent years and it has been found that Type I error rates tend to be inflated, it is of great importance to adapt existing DIF assessment methods to the inflation. The DIF-free-then-DIF (DFTD) strategy, which originally involved one single-scale purification procedure to identify DIF-free items, has been extended to involve another scale purification procedure for the DIF assessment in this study, and this new method is called the dual-scale purification (DSP) procedure. The performance of the DSP procedure in assessing DIF in large-scale programs, such as Program for International Student Assessment (PISA), was compared with the DFTD strategy through a series of simulation studies. Results showed the superiority of the DSP procedure over the DFTD strategy when tests consisted of many DIF items and when data were missing by design as in large-scale programs. Moreover, an empirical study of the PISA 2009 Taiwan sample was provided to show the implications of the DSP procedure. The applications as well as further studies of DSP procedure are also discussed.


2016 ◽  
Vol 2 (2) ◽  
pp. 290
Author(s):  
Ying Jin ◽  
Hershel Eason

<p>The effects of mean ability difference (MAD) and short tests on the performance of various DIF methods have been studied extensively in previous simulation studies. Their effects, however, have not been studied under multilevel data structure. MAD was frequently observed in large-scale cross-country comparison studies where the primary sampling units were more likely to be clusters (<em>e.g.</em>, schools). With short tests, regular DIF methods under MAD-present conditions might suffer from inflated type I error rate due to low reliability of test scores, which would adversely impact the matching ability of the covariate (<em>i.e.</em>, the total score) in DIF analysis. The current study compared the performance of three DIF methods: logistic regression (LR), hierarchical logistic regression (HLR) taking multilevel structure into account, and hierarchical logistic regression with latent covariate (HLR-LC) taking multilevel structure into account as well as accounting for low reliability and MAD. The results indicated that HLR-LC outperformed both LR and HLR under most simulated conditions, especially under the MAD-present conditions when tests were short. Practical implications of the implementation of HLR-LC were also discussed. <strong></strong></p>


Sign in / Sign up

Export Citation Format

Share Document