Detecting Gender-Based Differential Item Functioning on a Constructed- Response Science Test

1999 ◽  
Vol 12 (3) ◽  
pp. 211-235 ◽  
Author(s):  
Laura S. Hamilton
2020 ◽  
Vol 80 (4) ◽  
pp. 808-820
Author(s):  
Cindy M. Walker ◽  
Sakine Göçer Şahin

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared with traditional interrater reliability measures. Three different procedures that can be used as measures of interrater reliability were compared: (1) intraclass correlation coefficient (ICC), (2) Cohen’s kappa statistic, and (3) DIF statistic obtained from Poly-SIBTEST. The results of this investigation indicated that DIF procedures appear to be a promising alternative to assess the interrater reliability of constructed response items, or other polytomous types of items, such as rating scales. Furthermore, using DIF to assess interrater reliability does not require a fully crossed design and allows one to determine if a rater is either more severe, or more lenient, in their scoring of each individual polytomous item on a test or rating scale.


1995 ◽  
Vol 80 (3_suppl) ◽  
pp. 1071-1074 ◽  
Author(s):  
Thomas Uttaro

The Mantel-Haenszel chi-square (χ2MH) is widely used to detect differential item functioning (item bias) between ethnic and gender-based subgroups on educational and psychological tests. The empirical behavior of χ2MH has been incompletely understood; previous research is inconclusive. The present simulation study explored the effects of sample size, number of items, and trait distributions on the power of χ2MH to detect modeled differential item functioning. A significant effect was obtained for sample size with unacceptably low power for 250 subjects each in the focal and reference groups. The discussion supports the 1990 recommendations of Swaminathan and Rogers, opposes the 1993 view of Zieky that a sample size of 250 for each group is adequate.


2021 ◽  
Vol 112 ◽  
pp. 106658
Author(s):  
Brianna R. Altman ◽  
Maha N. Mian ◽  
Dev Dalal ◽  
Luna F. Ueno ◽  
Rachel Luba ◽  
...  

Body Image ◽  
2014 ◽  
Vol 11 (3) ◽  
pp. 206-209 ◽  
Author(s):  
Erin E. Reilly ◽  
Lisa M. Anderson ◽  
Katherine Schaumberg ◽  
Drew A. Anderson

2019 ◽  
Vol 52 (9) ◽  
pp. 1047-1051 ◽  
Author(s):  
Lauren M. Schaefer ◽  
Lisa M. Anderson ◽  
Melissa Simone ◽  
Shannon M. O'Connor ◽  
Hana Zickgraf ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document