Modeling Guessing Components in the Measurement of Political Knowledge

2017 ◽  
Vol 25 (4) ◽  
pp. 483-504
Author(s):  
Tsung-han Tsai ◽  
Chang-chih Lin

Due to the crucial role of political knowledge in democratic participation, the measurement of political knowledge has been a major concern in the discipline of political science. Common formats used for political knowledge questions include multiple-choice items and open-ended identification questions. The conventional wisdom holds that multiple-choice items induce guessing behavior, which leads to underestimated item-difficulty parameters and biased estimates of political knowledge. This article examines guessing behavior in multiple-choice items and argues that a successful guess requires certain levels of knowledge conditional on the difficulties of items. To deal with this issue, we propose a Bayesian IRT guessing model that accommodates the guessing components of item responses. The proposed model is applied to analyzing survey data in Taiwan, and the results show that the proposed model appropriately describes the guessing components based on respondents’ levels of political knowledge and item characteristics. That is, in general, partially informed respondents are more likely to have a successful guess because well-informed respondents do not need to guess and barely informed ones are highly seducible by the attractive distractors. We also examine the gender gap in political knowledge and find that, even when the guessing effect is accounted for, men are more knowledgeable than women about political affairs, which is consistent with the literature.

2019 ◽  
Vol 44 (1) ◽  
pp. 33-48
Author(s):  
Daniel M. Bolt ◽  
Nana Kim ◽  
James Wollack ◽  
Yiqin Pan ◽  
Carol Eckerly ◽  
...  

Discrete-option multiple-choice (DOMC) items differ from traditional multiple-choice (MC) items in the sequential administration of response options (up to display of the correct option). DOMC can be appealing in computer-based test administrations due to its protection of item security and its potential to reduce testwiseness effects. A psychometric model for DOMC items that attends to the random positioning of key location across different administrations of the same item is proposed, a feature that has been shown to affect DOMC item difficulty. Using two empirical data sets having items administered in both DOMC and MC formats, the variability in key location effects across both items and persons is considered. The proposed model exploits the capacity of the DOMC format to isolate both (a) distinct sources of item difficulty (i.e., related to the identification of keyed responses versus the ruling out of distractor options) and (b) distinct person proficiencies related to the same two components. Practical implications in terms of the randomized process applied to schedule item key location in DOMC test administrations are considered.


2020 ◽  
Vol 2 (4) ◽  
pp. p59
Author(s):  
Michael Joseph Wise

The effectiveness of multiple-choice (MC) items depends on the quality of the response options—particularly how well the incorrect options (“distractors”) attract students who have incomplete knowledge. It is often contended that test-writers are unable to devise more than two plausible distractors for most MC items, and that the effort needed to do so is not worthwhile in terms of the items’ psychometric qualities. To test these contentions, I analyzed students’ performance on 545 MC items across six science courses that I have taught over the past decade. Each MC item contained four distractors, and the dataset included more than 19,000 individual responses. All four distractors were deemed plausible in one-third of the items, and three distractors were plausible in another third. Each increase in plausible distractor led to an average of a 13% increase in item difficulty. Moreover, an increase in plausible distractors led to a significant increase in the discriminability of the items, with a leveling off by the fourth distractor. These results suggest that—at least for teachers writing tests to assess mastery of course content—it may be worthwhile to eschew recent skepticism and continue to attempt to write MC items with three or four distractors.


2022 ◽  
Author(s):  
Achmad Shabir

The aim of this study was to describe the quality of English testing intrument used in Try Out National Exam conducted by 40 Junior High Schools in Makassar-Sulawesi Selatan, using Item Response Theory (IRT) especially based on one (1PL), two (2PL), and three (3PL) parameters models. The data consist of 1.267 student’s answer sheets and the test has 50 multiple choice items. Results showed that the test is preferably good at both item difficulty and item dicrimination as suggest by 1PL and 2PL estimation. But at 3PL estimation, the test unable to discriminate students ability, while 38 % of the items were easy to guess.


2020 ◽  
Vol 3 (1) ◽  
pp. 102-113
Author(s):  
Sutami

This research aims to produce a valid and reliable Indonesian language assessment instrument in form of HOTS test items and it describes the quality of HOTS test items to measure HOTS skill for the tenth grade of SMA and SMK students. This study was a research and development study adapted from Borg & Gall’s development model, including the following steps: research and information collection, planning, early product development, limited try out, revising the early product, field try out, and revising the final product. The research’s result shows that the HOTS assessment instrument in the form of HOTS test consists of 40 multiple choice items and 5 essay test items. Based on the judgment of the materials, construction, and language was valid and appropriate to be used. The reliability coefficients were 0.88 for the multiple-choice items, and 0.79 for essays. The multiple-choice items have the average difficulty 0.57 (average), the average of item discrimination 0.44 (good), and the distractors function well. The essay items have the average of item difficulty 0.60 (average) and the average of item discrimination 0.45 (good)


1991 ◽  
Vol 69 (3) ◽  
pp. 739-743 ◽  
Author(s):  
Claudio Violato

The effects on item difficulty and discrimination of stem completeness (complete stem or incomplete stem) for multiple-choice items were studied experimentally. Subjects (166 junior education students) were classified into three achievement groups (low, medium, high) and one of two forms of a multiple-choice test was randomly assigned to each subject. A two-way factorial design (completeness × achievement) was used as the experimental model. Analysis indicated that stem completeness had no effect on either item discrimination or difficulty and there was no interaction effect with achievement. It was concluded that multiple-choice items may be very robust in measuring knowledge in a subject area irrespective of variations in stem construction.


2011 ◽  
Vol 35 (4) ◽  
pp. 396-401 ◽  
Author(s):  
Jonathan D. Kibble ◽  
Teresa Johnson

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors “easy,” “moderate,” or “hard” and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ = −0.19, P < 0.01), indicating that, as intended item difficulty increased, the resulting student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2 = 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ = −0.09, P = 0.14) or item discrimination (ρ = 0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examinations were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70–0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.


2019 ◽  
Vol 12 (3) ◽  
pp. 475-494
Author(s):  
Alireza Akbari

Purpose The purpose of this paper is to measure the degree of item difficulty in translation multiple-choice items in terms of 1-parameter logistic (1-PL) model of the item response theory (IRT). Also, the paper proposes a hypothesis in which a participant who answers a translation test possesses some amount of translation competence which affects the end-result. Design/methodology/approach In total, 150 translation students from the Bachelor of Arts in Translation Studies from the three Iranian universities participated in this research paper. The translation participants were requested to answer the questions. The items were formulated in such a way that the question was stated in English and the four choices were written in Farsi. To interpret the obtained results, this research paper employed 1-PL and 2-parameter logistic (2-PL) models using Stata (2016). In addition, to demonstrate results in terms of 1-PL, item characteristic curves (a graphical representation showing the degree of difficulty of each item) was used. Findings Using Stata platform, the findings of this research paper showed that through the application of IRT, evaluators were able to calculate the difficulty degree of each items (1-PL) and correspondingly the translation competence (2-PL) of each participant. Research limitations/implications One of the limitations is the proportionately small number of translation participants at the Bachelor of Arts. Originality/value Although a few number of studies concentered on the role of translation competence, there did not exist any research focusing on translation competence empirically in higher education.


1987 ◽  
Vol 60 (3_part_2) ◽  
pp. 1259-1262
Author(s):  
Claudio Violato ◽  
Peter H. Harasym

The effects of stem orientation (positively stated stem or negatively stated) and completeness (closed or complete stem and incomplete stem) of multiple-choice items on difficulty and discrimination, were studied experimentally employing 142 senior students in education (82 women and 60 men). Incomplete versus complete stems increased item difficulty but had no effect on discrimination. Stem orientation had no effect on either difficulty or discrimination. The implications of the results are discussed.


Sign in / Sign up

Export Citation Format

Share Document