scholarly journals Regression Discontinuity and Differential Item Functioning

2021 ◽  
Author(s):  
John Marc Goodrich ◽  
Natalie Koziol ◽  
HyeonJin Yoon

When measuring academic skills among students whose primary language is not English, standardized assessments are often provided in languages other than English (Tabaku, Carbuccia-Abbott, & Saavedra, 2018). The degree to which alternate-language test items function equivalently must be evaluated, but traditional methods of investigating measurement equivalence may be confounded by group differences on characteristics other than ability level and language form. The primary purposes of this study were to investigate differential item functioning (DIF) and item bias across Spanish and English forms of an assessment of early mathematics skills. Secondary purposes were to investigate the presence of selection bias and demonstrate a novel approach for investigating DIF that uses a regression discontinuity design framework to control for selection bias. Data were drawn from 1,750 Spanish-speaking Kindergarteners participating in the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99, who were administered either the Spanish or English version of the mathematics assessment based on their performance on an English language screening measure. Results indicated a minority of items functioned differently across the Spanish and English forms, and subsequent item content scrutiny indicated no plausible evidence of item bias. Evidence of selection bias—differences between groups in SES, age, and country of birth, in addition to mathematics ability and form language—highlighted limitations of a traditional approach for investigating DIF that only controlled for ability. Fewer items exhibited DIF when controlling for selection bias (11% vs. 25%), and the type and direction of DIF differed upon controlling for selection bias.

2022 ◽  
pp. 001316442110684
Author(s):  
Natalie A. Koziol ◽  
J. Marc Goodrich ◽  
HyeonJin Yoon

Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.


2002 ◽  
Vol 11 (3) ◽  
pp. 274-284 ◽  
Author(s):  
Carol Scheffner Hammer ◽  
Maria Pennock-Roman ◽  
Sarah Rzasa ◽  
J. Bruce Tomblin

The purpose of this research was to examine the Test of Language Development-P:2 (TOLD-P:2; Newcomer & Hammill, 1991) for item bias. The TOLD-P:2 was administered to 235 African American and 1,481 White kindergarten children living in the Midwest. Test items were examined for evidence of differential item functioning (DIF) using inferential and descriptive methods. Sixteen percent of all items of the TOLD-P:2 were found to have DIF. Of these items, 75% were found to be harder for the African American group. The percentages of items on the five core subtests identified as having DIF were as follows: Picture Vocabulary, 17%; Oral Vocabulary, 17%; Grammatic Understanding, 12%; Sentence Imitation, 20%; and Grammatic Completion, 13%. The implications of these findings are discussed in relation to the TOLD-P:3.


1995 ◽  
Vol 80 (3_suppl) ◽  
pp. 1071-1074 ◽  
Author(s):  
Thomas Uttaro

The Mantel-Haenszel chi-square (χ2MH) is widely used to detect differential item functioning (item bias) between ethnic and gender-based subgroups on educational and psychological tests. The empirical behavior of χ2MH has been incompletely understood; previous research is inconclusive. The present simulation study explored the effects of sample size, number of items, and trait distributions on the power of χ2MH to detect modeled differential item functioning. A significant effect was obtained for sample size with unacceptably low power for 250 subjects each in the focal and reference groups. The discussion supports the 1990 recommendations of Swaminathan and Rogers, opposes the 1993 view of Zieky that a sample size of 250 for each group is adequate.


Author(s):  
Stella Eteng-Uket

The study investigated detecting differential item functioning using item response theory in West African Senior School Certificate English language test in south-south Nigeria. 2 research questions were formulated to guide the study. Using descriptive research survey design for the study, study population was 117845 Senior Secondary 3 students in Edo, Delta, Rivers and Bayelsa state. A sample of 1309 (604 males, 705 females) drawn through multi stage sampling technique was used for the study. Two valid instruments titled: Socio-economic status questionnaire (SSQ) and WASSCE/SSCE English language objective test (ELOT) were used to collect data for the study. The reliability indices of the instruments were estimated using the Cronbach Alpha method of internal consistency and Richard Kuderson 20 with coefficient values of .84 for the English Language objective test and .71 for the socio-economic status questionnaire respectively. Chi-square and Lord Wald test statistics statistical technique employed by Item Response Theory for Patient Reported Outcome (IRTPRO) was the technique used in data analysis which provided answers to the research questions at.05 level of significance. On analysis, the result revealed that 13 items functioned differently significant between the male and female group and significantly 23 items differentially functioned between High and low socio-economic status group. Thus, this shows 18% DIF based on gender and 32% based on socio-economic status indicating large DIF and items that are potentially biased. Based on the findings, recommendation were made and one among others was that Item Response theory should be used as DIF detection method by large scale public examination and test developers.


Author(s):  
Abdul Wahab Ibrahim

The study used statistical procedures based on Item Response Theory to detect Differential Item Functioning (DIF) in polytomous tests. These were with a view to improving the quality of test items construction. The sample consisted of an intact class of 513 Part 3 undergraduate students who registered for the course EDU 304: Tests and Measurement at Sule Lamido University during 2017/2018 Second Semester. A self-developed polytomous research instrument was used to collect data. Data collected were analysed using Generalized Mantel Haenszel, Simultaneous Item Bias Test, and Logistic Discriminant Function Analysis. The results showed that there was no significant relationship between the proportions of test items that function differentially in the polytomous test when the different statistical methods are used.  Further, the three parametric and non-parametric methods complement each other in their ability to detect DIF in the polytomous test format as all of them have capacity to detect DIF but perform differently. The study concluded that there was a high degree of correspondence between the three procedures in their ability to detect DIF in polytomous tests. It was recommended that test experts and developers should consider using procedure based on Item Response Theory in DIF detection.


2018 ◽  
Vol 12 (4) ◽  
pp. 5
Author(s):  
Andreas Alm Fjellborg ◽  
Lena Molin

Elever med utländsk bakgrund tenderar att prestera sämre än svenskfödda elever i skolan primärt på grund av sämre kunskaper i det svenska språket. Utifrån statistisk analys (Differential item functioning) identifieras uppgifter från de nationella proven i geografi (2014 – 2017) där elever som följer kursplanen i svenska som andraspråk klarar sig avsevärt mycket bättre - eller sämre - än förväntat. Tidigare forskning har visat att geografiska begrepp är särskilt svåra för elever som inte har svenska som modersmål, vilket också  påvisas i denna studie. Den visar att det särskilt är uppgifter med lite text som handlar om geografiska begrepp som uppvisar större skillnader i prestationer mellan elever som följer kursplanen i svenska respektive svenska som andraspråk. Resultaten kan stödja såväl lärare som provkonstruktörer att bättre anpassa undervisning och prov genom att undvika att skapa uppgifter som mäter irrelevanta bakgrundsfaktorer som påverkar elevernas möjligheter att besvara uppgifter på ett adekvat vis utifrån deras kunskapsnivåer.Nyckelord: Nationella prov i geografi, uppgiftsformat, elever med utländsk bakgrund, svenskfödda elever, DIF-analysWhat types of test items benefit students who follow the syllabus in Swedish as a second language? A study using data from the Swedish national assessments in geography.AbstractPupils born outside Sweden are likely to accomplish less in comparison to native pupils, primarily as a result of inferior knowledge of the Swedish language. Based on a statistical analysis (Differential item functioning) of questions given at national tests in geography (2014-2017), it was possible to identify questions where pupils following the syllabus of Swedish as a second language attain either considerably better or more inferior results than expected. Earlier research has shown that pupils whose native language is not Swedish find it particularly hard to comprehend geographic concepts, which was confirmed by the present study. This study furthermore revealed that in particular questions containing a limited amount of text concerning geographic concepts resulted in larger differences than expected between native pupils following the syllabus in Swedish and foreign born pupils following the syllabus in Swedish as a second language. These findings could aid teachers and test constructors in their efforts to adjust teaching and tests by not formulating questions that measure irrelevant background factors, which might affect the pupils’ ability to answer questions adequately, based on their level of knowledge.Keywords: National tests in geography, question format, pupils born outside Sweden, Swedish-born pupils, DIF-analysis


2005 ◽  
Vol 74 ◽  
pp. 135-145
Author(s):  
Tamara van Schilt-Mol ◽  
Ton Vallen ◽  
Henny Uiterwijk

Previous research has shown that the Dutch 'Final Test of Primary Education' contains a number of unintentionally and therefore unwanted, difficult test items, leading to Differential Item Functioning (DIF) for immigrant minority students whose parents' dominant language is Turkish or Arab/Berber. Two statistical procedures were used to identify DIF-items in the Final Test of 1997. Subsequently, five experiments were conducted to detect causes of DIF, revealing a number of hypotheses concerning possible linguistic, cultural, and textual sources. These hypotheses were used to manipulate original DIF-items into intentionally DIF-free items. The article discusses three possible sources of DIF: (1) the use of fixed (misleading) answer-options and (2) of misleading illustrations (both in the disadvantage of the minority students), and (3) the fact that questions concerning past tense often lead to DIF (in their advantage).


Sign in / Sign up

Export Citation Format

Share Document