The Reliability, Discrimination and Difficulty of Word-Knowledge Tests Employing Multiple Choice Items Containing Three, Four, or Five Alternatives

1973 ◽  
Vol 17 (1) ◽  
pp. 63-68 ◽  
Author(s):  
Donald Hogben

The multiple-choice item found on standardized and teacher-made tests typically requires the examinee to choose among either four or five alternative answers. The present study offers some evidence favouring three-option over four-option items. Word knowledge tests containing three, four, and five-option items were developed and administered to pupils in ten upper primary classes from schools in the Adelaide metropolitan area. In each of the classes, item discrimination was better and reliability was higher for the three-option test compared to the test employing four-option items. The five-option test was also superior to the four-option test. Differences between the three and five-option tests on reliability and item discrimination were not significant.

2011 ◽  
Vol 35 (4) ◽  
pp. 396-401 ◽  
Author(s):  
Jonathan D. Kibble ◽  
Teresa Johnson

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors “easy,” “moderate,” or “hard” and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ = −0.19, P < 0.01), indicating that, as intended item difficulty increased, the resulting student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2 = 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ = −0.09, P = 0.14) or item discrimination (ρ = 0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examinations were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70–0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.


1968 ◽  
Vol 23 (1) ◽  
pp. 301-302
Author(s):  
Lewis R. Aiken

It is demonstrated that the item discrimination index (d) and an index of the uniformity of the distribution of choices of distracters (U) provide useful information about the effectiveness of distracters on multiple-choice items.


2020 ◽  
Vol 3 (1) ◽  
pp. 102-113
Author(s):  
Sutami

This research aims to produce a valid and reliable Indonesian language assessment instrument in form of HOTS test items and it describes the quality of HOTS test items to measure HOTS skill for the tenth grade of SMA and SMK students. This study was a research and development study adapted from Borg & Gall’s development model, including the following steps: research and information collection, planning, early product development, limited try out, revising the early product, field try out, and revising the final product. The research’s result shows that the HOTS assessment instrument in the form of HOTS test consists of 40 multiple choice items and 5 essay test items. Based on the judgment of the materials, construction, and language was valid and appropriate to be used. The reliability coefficients were 0.88 for the multiple-choice items, and 0.79 for essays. The multiple-choice items have the average difficulty 0.57 (average), the average of item discrimination 0.44 (good), and the distractors function well. The essay items have the average of item difficulty 0.60 (average) and the average of item discrimination 0.45 (good)


2017 ◽  
Vol 8 (6) ◽  
pp. 141 ◽  
Author(s):  
Sibel Toksöz ◽  
Ayşe Ertunç

Although foreign language testing has been subject to some changes in line with the different perspectives on learning and language teaching, multiple-choice items have been considerably popular regardless of these perspectives and trends in foreign language teaching. There have been some studies focusing on the efficiency of multiple choice items in different contexts. In Turkish context multiple choice items have been commonly used as standardized stake holder tests as a requirement for undergraduate level for the departments such as English Language Teaching, Western Languages and Literatures and Translation Studies and academic progress of the students in departments. Moreover, multiple choice items have been used noticeably in all levels of language instruction. However, there hasn’t been enough item analysis of multiple-choice tests in terms of item discrimination, item facility and distractor efficiency. The present study aims to analyze the multiple choice items aiming to test grammar, vocabulary and reading comprehension and administrated at a state university to preptory class students. In the study, 453 students’ responses have been analyzed in terms of item facility, item discrimination and distractor efficiency by using the frequency showing the distribution of the responses of prepatory students. The study results reveal that, most of the items are at the moderate level in terms of item facility. Besides, the results show that 28% of the items have a low item discrimination value. Finally, the frequency results were analyzed in terms of distractor efficiency and it has been found that some distractors in the exam are significantly ineffective and they should be revised.


2017 ◽  
Vol 33 (3) ◽  
pp. 530
Author(s):  
Tao Xin ◽  
Mengcheng Wang ◽  
Tour Liu

<p>Multiple-choice item is wildly used in psychological and educational test. The present study investigated that if a multiple-choice item have an advantage than a dichotomous item on ability evaluation.An item response model,nested logitmodel (NLM),was used to fit the multiple-choice data. Both simulation study and empirical study indicated that the accuracy and the stability of ability estimation were enhanced by using multiple-choice model rather than dichotomous model, because more information was included in multiple-choice items’ distractors. But the accuracy of ability parameter estimation showed little differences in 4-choice items, 5-choice items and 6-choice items. Moreover, NLM could extract more information from low-level respondents than from high-level ones, because they hadmore distractor chosen behaviors. Furthermore, respondents at different trait levels would be attracted by different distractors in an empirical study of a Chinese Vocabulary Test for Grade 1 by using the changing traces of distractor probabilities calculated from NLM. It is suggested that the responses of students at different levelsmight reflect the students’ vocabulary development process.</p>


Sign in / Sign up

Export Citation Format

Share Document