Comparison of the Accuracy of Item Response Theory Models in Estimating Student’s Ability

2020 ◽  
Vol 6 (2) ◽  
pp. 178
Author(s):  
Ilham Falani ◽  
Makruf Akbar ◽  
Dali Santun Naga

This study aims to determine the item response theory model which is more accurate in estimating students' mathematical abilities. The models compared in this study are Multiple Choice Model and Three-Parameter Logistic Model. Data used in this study are the responses of a mathematical test of 1704 eighth-grade junior high school students from six schools in the Depok City, West Java. The Sampling is done by using a purposive random sampling technique. The mathematics test used for research data collection consisted of 30 multiple choice format items. After the data is obtained, Research hypotheses were tested using the variance test method (F-test) to find out which model is more accurate in estimating ability parameters. The results showed that Fvalue is obtained 1.089, and  Ftable is 1.087, the value of Fvalue > Ftable, so it concluded that Ho rejected. That means Multiple Choice Model is more accurate than Three-Parameter Logistic Model in estimating the parameters of students' mathematical abilities. This makes the Multiple-Choice Model a recommended model for estimating mathematical ability in MC item format tests, especially in the field of mathematics and other fields that have similar characteristics.

2020 ◽  
Vol 78 (4) ◽  
pp. 576-594
Author(s):  
Bing Jia ◽  
Dan He ◽  
Zhemin Zhu

The quality of multiple-choice questions (MCQs) as well as the student's solve behavior in MCQs are educational concerns. MCQs cover wide educational content and can be immediately and accurately scored. However, many studies have found some flawed items in this exam type, thereby possibly resulting in misleading insights into students’ performance and affecting important decisions. This research sought to determine the characteristics of MCQs and factors that may affect the quality of MCQs by using item response theory (IRT) to evaluate data. For this, four samples of different sizes from US and China in secondary and higher education were chosen. Item difficulty and discrimination were determined using item response theory statistical item analysis models. Results were as follows. First, only a few guessing behaviors are included in MCQ exams because all data fit the two-parameter logistic model better than the three-parameter logistic model. Second, the quality of MCQs depended more on the degree of training of examiners and less on middle or higher education levels. Lastly, MCQs must be evaluated to ensure that high-quality items can be used as bases of inference in middle and higher education. Keywords: higher education, item evaluation, item response theory, multiple-choice test, secondary education


Assessment ◽  
2019 ◽  
pp. 107319111986465
Author(s):  
Maria Anna Donati ◽  
Elisa Borace ◽  
Edoardo Franchi ◽  
Caterina Primi

The Multidimensional State Boredom Scale (MSBS) is widely used, but evidence regarding its psychometric properties among adolescents is lacking. In particular, the functioning of the scale across genders is unknown. As a result, we used item response theory (IRT) to investigate gender invariance of the Short Form of the MSBS (MSBS-SF) among adolescents. Four hundred and sixty-six Italian high school students (51% male; M = 16.7, SD = 1.44) were recruited. A confirmatory factor analysis demonstrated the unidimensionality of the scale, and IRT analyses indicated that the scale was sufficiently informative. Differential item functioning (DIF) across genders showed that only one item had DIF that was both nonuniform and small in size. Additionally, relationships with negative/positive urgency and present/future-oriented time perspectives were found. Overall, this study offers evidence that the MSBS-SF is a valuable and useful scale for measuring state boredom among male and female adolescents.


2015 ◽  
Vol 58 (3) ◽  
pp. 865-877 ◽  
Author(s):  
Gerasimos Fergadiotis ◽  
Stacey Kellough ◽  
William D. Hula

Purpose In this study, we investigated the fit of the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996) to an item-response-theory measurement model, estimated the precision of the resulting scores and item parameters, and provided a theoretical rationale for the interpretation of PNT overall scores by relating explanatory variables to item difficulty. This article describes the statistical model underlying the computer adaptive PNT presented in a companion article (Hula, Kellough, & Fergadiotis, 2015). Method Using archival data, we evaluated the fit of the PNT to 1- and 2-parameter logistic models and examined the precision of the resulting parameter estimates. We regressed the item difficulty estimates on three predictor variables: word length, age of acquisition, and contextual diversity. Results The 2-parameter logistic model demonstrated marginally better fit, but the fit of the 1-parameter logistic model was adequate. Precision was excellent for both person ability and item difficulty estimates. Word length, age of acquisition, and contextual diversity all independently contributed to variance in item difficulty. Conclusions Item-response-theory methods can be productively used to analyze and quantify anomia severity in aphasia. Regression of item difficulty on lexical variables supported the validity of the PNT and interpretation of anomia severity scores in the context of current word-finding models.


2021 ◽  
Vol 4 (1) ◽  
pp. 8-13
Author(s):  
Duden Sepuzaman ◽  
Edi Istiyono ◽  
Haryanto ◽  
Heri Retnawati ◽  
Yustiandi

This study compares students' abilities using the Item Response Theory (IRT) approach to dichotomous and polytomous scoring. This research is quantitative descriptive. The research subjects were 1175 high school students in class XI spread across West Java and Banten provinces, consisting of 450 male students and 725 female students. Response data with dichotomous scoring were analyzed using the item response theory approach with the BILOG-MG program, while the polytomous scoring was analyzed using the GPCM approach using the R program. The results of the model fit test showed that the items most fit with the 2PL model. The instrument used is a work and energy material test instrument tested for validity d, reliability, distinguishing power, and difficulty level. The results showed that the average ability with a polytomous score was greater than that of a dichotomous, even though compared to having almost the same ability but with a relatively different distribution. The distribution of students' abilities with polytomous scoring is closer to the normal curve than the dichotomous scoring. The relationship between students' ability scores with these two scoring approaches is shown by a correlation coefficient score of 0.990 and a determination index of 0.9808 with a prediction line y= 0.9735 x +0.0036.


2021 ◽  
Vol 15 (3) ◽  
pp. 17
Author(s):  
Hani Alkhaldi ◽  
Malek Alkhutaba ◽  
Mohammad Al-Dlalah

This study aimed to build self-confidence for high school students in Al-Mafraq Governorate in Jordan following the Item Response Theory (IRT). The scale included its initial version (50) items. To ensure the external validity of the scale, it was reviewed by several experts. According to the experts’ feedback, some items should be deleted or modified. The final version of the scale included (44) items. The scale was further applied to an experimental sample of (310) male and female students to verify psychometricians’ characteristics. Finally, the scale was administered to a sample of (1060) male and female high school students in Al-Mafraq Governorate. Data were collected, coded, and analyzed using statistical programs (SPSS and WINSTEPS). The most important results were the following: the self-confidence measure was one-dimensional, which means it measures only a single dimension. The results further revealed identical to the partial estimation model, and the index of average matching of individuals and the external and internal items approached zero, and the standard deviation approached the correct one. The estimated values of the distinct thresholds for the scale items showed a clear discriminatory ability and the emergence of particular threshold scores on the scale. After deleting the paragraphs that did not fit the study's model, the scale's final version included 39 items. The results also showed that the transfer values of logistical capacity units were within (-2.88 -2.77), within the IRT's accepted range.


Sign in / Sign up

Export Citation Format

Share Document