Evaluating the Magnitude of Differential Item Functioning in Polytomous Items

1996 ◽  
Vol 21 (3) ◽  
pp. 187-201 ◽  
Author(s):  
Rebecca Zwick ◽  
Dorothy T. Thayer

Several recent studies have investigated the application of statistical inference procedures to the analysis of differential item functioning (DIF) in polytomous test items that are scored on an ordinal scale. Mantel’s extension of the Mantel-Haenszel test is one of several hypothesis-testing methods for this purpose. The development of descriptive statistics for characterizing DIF in polytomous test items has received less attention. As a step in this direction, two possible standard error formulas for the polytomous DIF index proposed by Dorans and Schmitt were derived. These standard errors, as well as associated hypothesis-testing procedures, were evaluated though application to simulated data. The standard error that performed better is based on Mantel’s hypergeometric model. The alternative standard error, based on a multinomial model, tended to yield values that were too small.

Author(s):  
Abdul Wahab Ibrahim

The study used statistical procedures based on Item Response Theory to detect Differential Item Functioning (DIF) in polytomous tests. These were with a view to improving the quality of test items construction. The sample consisted of an intact class of 513 Part 3 undergraduate students who registered for the course EDU 304: Tests and Measurement at Sule Lamido University during 2017/2018 Second Semester. A self-developed polytomous research instrument was used to collect data. Data collected were analysed using Generalized Mantel Haenszel, Simultaneous Item Bias Test, and Logistic Discriminant Function Analysis. The results showed that there was no significant relationship between the proportions of test items that function differentially in the polytomous test when the different statistical methods are used.  Further, the three parametric and non-parametric methods complement each other in their ability to detect DIF in the polytomous test format as all of them have capacity to detect DIF but perform differently. The study concluded that there was a high degree of correspondence between the three procedures in their ability to detect DIF in polytomous tests. It was recommended that test experts and developers should consider using procedure based on Item Response Theory in DIF detection.


2021 ◽  
Author(s):  
John Marc Goodrich ◽  
Natalie Koziol ◽  
HyeonJin Yoon

When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English (Tabaku, Carbuccia-Abbott, & Saavedra, 2018). The degree to which alternate-language test items function equivalently must be evaluated, but traditional methods of investigating measurement equivalence may be confounded by group differences on characteristics other than ability level and language form. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Results indicated a minority of items functioned differently across the Spanish and English forms, and subsequent item content scrutiny indicated no plausible evidence of item bias. Evidence of selection bias—differences between groups in SES, age, and country of birth, in addition to mathematics ability and form language—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. Fewer items exhibited DIF when controlling for selection bias (11% vs. 25%), and the type and direction of DIF differed upon controlling for selection bias.


2018 ◽  
Vol 12 (4) ◽  
pp. 5
Author(s):  
Andreas Alm Fjellborg ◽  
Lena Molin

Elever med utländsk bakgrund tenderar att prestera sämre än svenskfödda elever i skolan primärt på grund av sämre kunskaper i det svenska språket. Utifrån statistisk analys (Differential item functioning) identifieras uppgifter från de nationella proven i geografi (2014 – 2017) där elever som följer kursplanen i svenska som andraspråk klarar sig avsevärt mycket bättre - eller sämre - än förväntat. Tidigare forskning har visat att geografiska begrepp är särskilt svåra för elever som inte har svenska som modersmål, vilket också  påvisas i denna studie. Den visar att det särskilt är uppgifter med lite text som handlar om geografiska begrepp som uppvisar större skillnader i prestationer mellan elever som följer kursplanen i svenska respektive svenska som andraspråk. Resultaten kan stödja såväl lärare som provkonstruktörer att bättre anpassa undervisning och prov genom att undvika att skapa uppgifter som mäter irrelevanta bakgrundsfaktorer som påverkar elevernas möjligheter att besvara uppgifter på ett adekvat vis utifrån deras kunskapsnivåer.Nyckelord: Nationella prov i geografi, uppgiftsformat, elever med utländsk bakgrund, svenskfödda elever, DIF-analysWhat types of test items benefit students who follow the syllabus in Swedish as a second language? A study using data from the Swedish national assessments in geography.AbstractPupils born outside Sweden are likely to accomplish less in comparison to native pupils, primarily as a result of inferior knowledge of the Swedish language. Based on a statistical analysis (Differential item functioning) of questions given at national tests in geography (2014-2017), it was possible to identify questions where pupils following the syllabus of Swedish as a second language attain either considerably better or more inferior results than expected. Earlier research has shown that pupils whose native language is not Swedish find it particularly hard to comprehend geographic concepts, which was confirmed by the present study. This study furthermore revealed that in particular questions containing a limited amount of text concerning geographic concepts resulted in larger differences than expected between native pupils following the syllabus in Swedish and foreign born pupils following the syllabus in Swedish as a second language. These findings could aid teachers and test constructors in their efforts to adjust teaching and tests by not formulating questions that measure irrelevant background factors, which might affect the pupils’ ability to answer questions adequately, based on their level of knowledge.Keywords: National tests in geography, question format, pupils born outside Sweden, Swedish-born pupils, DIF-analysis


2019 ◽  
Vol 13 ◽  
Author(s):  
Yifang Wu ◽  
Yan Cai ◽  
Dongbo Tu

AbstractThis article aimed at developing an adaptive version of the subjective well-being (SWB) scale to measure a comprehensive concept of SWB among Chinese university students. Item response theory was employed to formulate the item bank of the SWB scale and computerized adaptive testing (CAT) for SWB (CAT-SWB), based on several commonly used SWB scales, after unidimensionality testing, model selection, local dependence testing, parameter estimation, item fit test and differential item functioning (DIF) analysis were performed. Finally, two CAT simulations using simulated-data and real-data were carried out to verify and evaluate the CAT-SWB. Results indicated that the proposed CAT-SWB had an excellent performance in that it largely reduces the number of test items and the length of test time without losing measurement precision.


2005 ◽  
Vol 74 ◽  
pp. 135-145
Author(s):  
Tamara van Schilt-Mol ◽  
Ton Vallen ◽  
Henny Uiterwijk

Previous research has shown that the Dutch 'Final Test of Primary Education' contains a number of unintentionally and therefore unwanted, difficult test items, leading to Differential Item Functioning (DIF) for immigrant minority students whose parents' dominant language is Turkish or Arab/Berber. Two statistical procedures were used to identify DIF-items in the Final Test of 1997. Subsequently, five experiments were conducted to detect causes of DIF, revealing a number of hypotheses concerning possible linguistic, cultural, and textual sources. These hypotheses were used to manipulate original DIF-items into intentionally DIF-free items. The article discusses three possible sources of DIF: (1) the use of fixed (misleading) answer-options and (2) of misleading illustrations (both in the disadvantage of the minority students), and (3) the fact that questions concerning past tense often lead to DIF (in their advantage).


2019 ◽  
Vol 31 (12) ◽  
pp. 1769-1779
Author(s):  
Nahathai Wongpakaran ◽  
Tinakon Wongpakaran ◽  
Surang Lertkachatarn ◽  
Thanitha Sirirak ◽  
Pimolpun Kuntawong

ABSTRACTObjectives:The Core Symptom Index (CSI) is designed to measure anxiety, depression and somatization symptoms. This study examined the construct validity of CSI using confirmatory factor analysis (CFA) including a bifactor model and explored differential item functioning (DIF) of the CSI. The criterion and concurrent validity were evaluated.Methods:In all, 803 elderly patients, average age 69.24 years, 70% female, were assessed for depressive disorders and completed the CSI and the geriatric depression scale (GDS). A series involving CFA for ordinal scale was applied. Factor loadings and explained common variance were analyzed for general and specific factors; and Omega was calculated for model-based reliability. DIF was analyzed using the Multiple-Indicator Multiple-Cause model. Pearson’s correlation, ANOVA, and ROC analysis were used for associations and to compare CSI and GDS in predicting major depressive disorders (MDD).Results:The bifactor model provided the best fit to the data. Most items loaded on general rather than specific factors. The explained common variance was acceptable, while Omega hierarchical for the subscale and explained common variance for the subscales were low. Two DIF items were identified; ‘crying’ for sex items and ‘self-blaming’ for education items. Correlation among CSI and clinical disorders and the GDS were found. AUC for the GDS was 0.83, and for the CSI was 0.81.Conclusion:CSI appears sufficiently unidimensional. Its total score reflected a single general factor, permitting users to interpret the total score as a sufficient reliable measure of the general factors. CSI could serve as a screening tool for MDD.


2008 ◽  
Vol 68 (6) ◽  
pp. 940-958 ◽  
Author(s):  
Angel M. Fidalgo ◽  
Jaqueline M. Madeira

Mantel-Haenszel methods comprise a highly flexible methodology for assessing the degree of association between two categorical variables, whether they are nominal or ordinal, while controlling for other variables. The versatility of Mantel-Haenszel analytical approaches has made them very popular in the assessment of the differential functioning of both dichotomous and polytomous items. Up to now, researchers have limited the use of Mantel-Haenszel statistics to analyzing contingency tables of dimensions 2 × 2 (by means of the Mantel-Haenszel chi-square statistic) and of dimensions of 2 × C (by means of either the generalized Mantel-Haenszel test or Mantel's test). The main objective of this article is to illustrate a unified framework for the analysis of differential item functioning using the Mantel-Haenszel methods. This is done by means of the generalized Mantel-Haenszel statistic for the analysis of the general case of Q contingency tables with dimensions R × C. Moreover, with the new formulation in consideration, this article reviews the most recent research on differential item functioning and suggests new applications and research lines in relation to the statistics proposed.


Sign in / Sign up

Export Citation Format

Share Document