Exploring Crossing Differential Item Functioning by Gender in Mathematics Assessment

2015 ◽  
Vol 15 (4) ◽  
pp. 337-355 ◽  
Author(s):  
Yoke Mooi Ong ◽  
Julian Williams ◽  
Iasonas Lamprianou
2021 ◽  
pp. 073428292110105
Author(s):  
Semirhan Gökçe ◽  
Giray Berberoğlu ◽  
Craig S. Wells ◽  
Stephen G. Sireci

The 2015 Trends in International Mathematics and Science Study (TIMSS) involved 57 countries and 43 different languages to assess students’ achievement in mathematics and science. The purpose of this study is to evaluate whether items and test scores are affected as the differences between language families and cultures increase. Using differential item functioning (DIF) procedures, we compared the consistency of students’ performance across three combinations of languages and countries: (a) same language but different countries, (b) same countries but different languages, and (c) different languages and different countries. The analyses consisted of the detection of the number of DIF items for all paired comparisons within each condition, the direction of DIF, the magnitude of DIF, and the differences between test characteristic curves. As the countries were more distant with respect to cultures and language families, the presence of DIF increased. The magnitude of DIF was greatest when both language and country differed, and smallest when the languages were same, but the countries were different. Results suggest that when TIMSS results are compared across countries, the language- and country-specific differences which could reflect cultural, curriculum, or other differences should be considered.


2020 ◽  
Vol 5 (1) ◽  
pp. 51-60
Author(s):  
Elizar Elizar ◽  
Cut Khairunnisak

Mathematics assessments should be designed for all students, regardless of their background or gender. Rasch analysis, developed based on Item Response Theory (IRT), is one of the primary tools to analyse the inclusiveness of mathematics assessment. However, the mathematics test development has been dominated by Classical Test Theory (CTT). This study is a preliminary study to evaluate the mathematics comprehensive test. This study aims to demonstrate the use of Rasch analysis by assessing the appropriateness of the mathematics comprehensive test to measure students' mathematical understanding. Data were collected from one cycle of mathematics comprehensive test involving 48 undergraduate students of mathematics education department. Rasch analysis was conducted using ACER Conquest 4 software to assess the item difficulty and differential item functioning (DIF). The findings show that the item related to geometry is the easiest question for students, while item concerning calculus as the hardest question. The test is viable to measure students’ mathematical understanding as it shows no evidence of Differential Item Functioning (DIF). Gender has been drawn for each of the test items. The assessment showed that the test was inclusive. More application of Rasch analysis should be conducted to create a thorough and robust mathematics assessment.


Diagnostica ◽  
2021 ◽  
Vol 67 (1) ◽  
pp. 13-23
Author(s):  
Ariana Garrote ◽  
Elisabeth Moser Opitz

Zusammenfassung. In dieser Studie wurde der Test MARKO-D (Mathematik- und Rechenkonzepte im Vorschulalter–Diagnose) mit einer Stichprobe von Kindern aus der deutschsprachigen Schweiz ( N = 555) im ersten und zweiten Kindergartenjahr erprobt und es wurde analysiert, ob sich die Altersnormen der deutschen Stichprobe auf die Schweiz übertragen lassen. Zudem wurde der Test mit einer Teilstichprobe ( n = 87) hinsichtlich Messinvarianz über die Zeit untersucht. Die Ergebnisse des eindimensionalen Rasch-Modells zeigen, dass das Instrument für die Schweiz geeignet ist. Die Testleistungen hängen jedoch vom Kindergartenbesuch ab. Für die Schweiz müssten deshalb nebst Altersnormen auch Normen pro Kindergartenhalbjahr verwendet werden. Die Analyse mittels Differential Item Functioning ergab, dass 17 von 55 Items von großer Messvarianz über die Zeit betroffen sind. Um das Instrument für Längsschnittuntersuchungen einsetzen zu können, müsste es weiterentwickelt werden.


2019 ◽  
Vol 35 (6) ◽  
pp. 823-833 ◽  
Author(s):  
Desiree Thielemann ◽  
Felicitas Richter ◽  
Bernd Strauss ◽  
Elmar Braehler ◽  
Uwe Altmann ◽  
...  

Abstract. Most instruments for the assessment of disordered eating were developed and validated in young female samples. However, they are often used in heterogeneous general population samples. Therefore, brief instruments of disordered eating should assess the severity of disordered eating equally well between individuals with different gender, age, body mass index (BMI), and socioeconomic status (SES). Differential item functioning (DIF) of two brief instruments of disordered eating (SCOFF, Eating Attitudes Test [EAT-8]) was modeled in a representative sample of the German population ( N = 2,527) using a multigroup item response theory (IRT) and a multiple-indicator multiple-cause (MIMIC) structural equation model (SEM) approach. No DIF by age was found in both questionnaires. Three items of the EAT-8 showed DIF across gender, indicating that females are more likely to agree than males, given the same severity of disordered eating. One item of the EAT-8 revealed slight DIF by BMI. DIF with respect to the SCOFF seemed to be negligible. Both questionnaires are equally fair across people with different age and SES. The DIF by gender that we found with respect to the EAT-8 as screening instrument may be also reflected in the use of different cutoff values for men and women. In general, both brief instruments assessing disordered eating revealed their strengths and limitations concerning test fairness for different groups.


1995 ◽  
Vol 11 (1) ◽  
pp. 14-20 ◽  
Author(s):  
Sean M. Hammond

This paper presents an IRT analysis of the Beck Depression Inventory which was carried out to assess the assumption of an underlying latent trait common to non-clinical and patient samples. A one parameter rating scale model was fitted to data drawn from a patient and non-patient sample. Findings suggest that while the BDI fits the model reasonably well for the two samples separately there is sufficient differential item functioning to raise serious duobts of the viability of using it analogously with patient and non-patient groups.


Sign in / Sign up

Export Citation Format

Share Document