The Effect of Differential Item Functioning in Common Items on the Ability Parameter Estimates of IRT Vertical Scale

2021 ◽  
Vol 34 (1) ◽  
pp. 101-129
Author(s):  
Hyesung Shin ◽  
Guemin Lee ◽  
Sang-Jin Kang
2014 ◽  
Vol 114 (1) ◽  
pp. 104-125 ◽  
Author(s):  
Hung-Yu Huang

This study compares three methods of detecting differential item functioning (DIF), the equal mean difficulty (EMD), all-other-item (AOI), and constant item (CI) methods, in terms of estimation bias and rank order change of ability estimates using a series of simulations and two empirical examples. The CI method generated accurate DIF parameter estimates, whereas the EMD and AOI methods produced biased estimates. Moreover, as the percentage of DIF items in a test increased, the superiority of the CI method over the EMD and AOI methods became more apparent. The superiority of the CI method is independent of the sample size, test length, and item type (dichotomous or polytomous). Two empirical examples, a mathematics test and a hostility questionnaire, demonstrated that these three methods yielded inconsistent DIF detections and produced different ability estimate rankings.


2021 ◽  
Vol 6 ◽  
Author(s):  
Stephen Humphry ◽  
Paul Montuoro

This article demonstrates that the Rasch model cannot reveal systematic differential item functioning (DIF) in single tests. The person total score is the sufficient statistic for the person parameter estimate, eliminating the possibility for residuals at the test level. An alternative approach is to use subset DIF analysis to search for DIF in item subsets that form the components of the broader latent trait. In this methodology, person parameter estimates are initially calculated using all test items. Then, in separate analyses, these person estimates are compared to the observed means in each subset, and the residuals assessed. As such, this methodology tests the assumption that the person locations in each factor group are invariant across subsets. The first objective is to demonstrate that in single tests differences in factor groups will appear as differences in the mean person estimates and the distributions of these estimates. The second objective is to demonstrate how subset DIF analysis reveals differences between person estimates and the observed means in subsets. Implications for practitioners are discussed.


Assessment ◽  
2021 ◽  
pp. 107319112098661
Author(s):  
Colin E. Vize ◽  
Sean P. Lane

Numerous studies leverage item response theory (IRT) methods to examine measurement characteristics of alcohol use disorder (AUD) diagnostic criteria. Less work has examined the consistency of AUD IRT parameter estimates, an essential step for establishing measurement invariance, making statements about symptom diagnosticity, and validating the theoretical construct. A Bayesian meta-analysis of IRT discrimination values for AUD criteria across 33 independent samples (Total N = 321,998) revealed that overall consistency of AUD criteria discriminations was low (generalized intraclass correlation range = .105-.249). However, specific study characteristics accounted for substantial variability, suggesting that the unreliability is partially systematic. We replicated evidence of differential item functioning (DIF) via established factors (e.g., age, gender), but the magnitudes were small compared with DIF associated with assessment instrument. These results offer practical recommendations regarding which instruments to use when specific AUD criteria are of interest and which criteria are most sensitive when comparing demographic groups.


Diagnostica ◽  
2021 ◽  
Vol 67 (1) ◽  
pp. 13-23
Author(s):  
Ariana Garrote ◽  
Elisabeth Moser Opitz

Zusammenfassung. In dieser Studie wurde der Test MARKO-D (Mathematik- und Rechenkonzepte im Vorschulalter–Diagnose) mit einer Stichprobe von Kindern aus der deutschsprachigen Schweiz ( N = 555) im ersten und zweiten Kindergartenjahr erprobt und es wurde analysiert, ob sich die Altersnormen der deutschen Stichprobe auf die Schweiz übertragen lassen. Zudem wurde der Test mit einer Teilstichprobe ( n = 87) hinsichtlich Messinvarianz über die Zeit untersucht. Die Ergebnisse des eindimensionalen Rasch-Modells zeigen, dass das Instrument für die Schweiz geeignet ist. Die Testleistungen hängen jedoch vom Kindergartenbesuch ab. Für die Schweiz müssten deshalb nebst Altersnormen auch Normen pro Kindergartenhalbjahr verwendet werden. Die Analyse mittels Differential Item Functioning ergab, dass 17 von 55 Items von großer Messvarianz über die Zeit betroffen sind. Um das Instrument für Längsschnittuntersuchungen einsetzen zu können, müsste es weiterentwickelt werden.


2019 ◽  
Vol 35 (6) ◽  
pp. 823-833 ◽  
Author(s):  
Desiree Thielemann ◽  
Felicitas Richter ◽  
Bernd Strauss ◽  
Elmar Braehler ◽  
Uwe Altmann ◽  
...  

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.


1995 ◽  
Vol 11 (1) ◽  
pp. 14-20 ◽  
Author(s):  
Sean M. Hammond

This paper presents an IRT analysis of the Beck Depression Inventory which was carried out to assess the assumption of an underlying latent trait common to non-clinical and patient samples. A one parameter rating scale model was fitted to data drawn from a patient and non-patient sample. Findings suggest that while the BDI fits the model reasonably well for the two samples separately there is sufficient differential item functioning to raise serious duobts of the viability of using it analogously with patient and non-patient groups.


Sign in / Sign up

Export Citation Format

Share Document