Are faculty predictions or item taxonomies useful for estimating the outcome of multiple-choice examinations?

The purpose of this study was to evaluate whether multiple-choice item difficulty could be predicted either by a subjective judgment by the question author or by applying a learning taxonomy to the items. Eight physiology faculty members teaching an upper-level undergraduate human physiology course consented to participate in the study. The faculty members annotated questions before exams with the descriptors “easy,” “moderate,” or “hard” and classified them according to whether they tested knowledge, comprehension, or application. Overall analysis showed a statistically significant, but relatively low, correlation between the intended item difficulty and actual student scores (ρ = −0.19, P < 0.01), indicating that, as intended item difficulty increased, the resulting student scores on items tended to decrease. Although this expected inverse relationship was detected, faculty members were correct only 48% of the time when estimating difficulty. There was also significant individual variation among faculty members in the ability to predict item difficulty (χ2 = 16.84, P = 0.02). With regard to the cognitive level of items, no significant correlation was found between the item cognitive level and either actual student scores (ρ = −0.09, P = 0.14) or item discrimination (ρ = 0.05, P = 0.42). Despite the inability of faculty members to accurately predict item difficulty, the examinations were of high quality, as evidenced by reliability coefficients (Cronbach's α) of 0.70–0.92, the rejection of only 4 of 300 items in the postexamination review, and a mean item discrimination (point biserial) of 0.37. In conclusion, the effort of assigning annotations describing intended difficulty and cognitive levels to multiple-choice items is of doubtful value in terms of controlling examination difficulty. However, we also report that the process of annotating questions may enhance examination validity and can reveal aspects of the hidden curriculum.

Download Full-text

The Reliability, Discrimination and Difficulty of Word-Knowledge Tests Employing Multiple Choice Items Containing Three, Four, or Five Alternatives

Australian Journal of Education ◽

10.1177/000494417301700107 ◽

1973 ◽

Vol 17 (1) ◽

pp. 63-68 ◽

Cited By ~ 6

Author(s):

Donald Hogben

Keyword(s):

Metropolitan Area ◽

Multiple Choice ◽

Word Knowledge ◽

Item Discrimination ◽

Multiple Choice Item ◽

Multiple Choice Items

The multiple-choice item found on standardized and teacher-made tests typically requires the examinee to choose among either four or five alternative answers. The present study offers some evidence favouring three-option over four-option items. Word knowledge tests containing three, four, and five-option items were developed and administered to pupils in ten upper primary classes from schools in the Adelaide metropolitan area. In each of the classes, item discrimination was better and reliability was higher for the three-option test compared to the test employing four-option items. The five-option test was also superior to the four-option test. Differences between the three and five-option tests on reliability and item discrimination were not significant.

Download Full-text

Pengembangan Instrumen Asesmen Higher Order Thinking Skills (HOTS) pada Mata Pelajaran Bahasa Indonesia SMA dan SMK

DIGLOSIA Jurnal Kajian Bahasa Sastra dan Pengajarannya ◽

10.30872/diglosia.v3i1.24 ◽

2020 ◽

Vol 3 (1) ◽

pp. 102-113

Author(s):

Sutami

Keyword(s):

Item Difficulty ◽

Multiple Choice ◽

Thinking Skills ◽

Assessment Instrument ◽

Item Discrimination ◽

Test Items ◽

Tenth Grade ◽

Multiple Choice Items ◽

Essay Test

This research aims to produce a valid and reliable Indonesian language assessment instrument in form of HOTS test items and it describes the quality of HOTS test items to measure HOTS skill for the tenth grade of SMA and SMK students. This study was a research and development study adapted from Borg & Gall’s development model, including the following steps: research and information collection, planning, early product development, limited try out, revising the early product, field try out, and revising the final product. The research’s result shows that the HOTS assessment instrument in the form of HOTS test consists of 40 multiple choice items and 5 essay test items. Based on the judgment of the materials, construction, and language was valid and appropriate to be used. The reliability coefficients were 0.88 for the multiple-choice items, and 0.79 for essays. The multiple-choice items have the average difficulty 0.57 (average), the average of item discrimination 0.44 (good), and the distractors function well. The essay items have the average of item difficulty 0.60 (average) and the average of item discrimination 0.45 (good)

Download Full-text

A Comparison of the Item Difficulty and Item Discrimination of Multiple-Choice Items Using the "None of the Above" and One Correct Response Options

Educational and Psychological Measurement ◽

10.1177/0013164487472010 ◽

1987 ◽

Vol 47 (2) ◽

pp. 377-383 ◽

Cited By ~ 10

Author(s):

Nona Tollefson

Keyword(s):

Correct Response ◽

Item Difficulty ◽

Multiple Choice ◽

Item Discrimination ◽

Response Options ◽

Multiple Choice Items

Download Full-text

EFFECTS OF STRUCTURAL CHARACTERISTICS OF STEM FORMAT OF MULTIPLE-CHOICE ITEMS ON ITEM DIFFICULTY AND DISCRIMINATION

Psychological Reports ◽

10.2466/pr0.1987.60.3c.1259 ◽

1987 ◽

Vol 60 (3c) ◽

pp. 1259-1262 ◽

Cited By ~ 3

Author(s):

CLAUDIO VIOLATO ◽

PETER H. HARASYM

Keyword(s):

Item Difficulty ◽

Structural Characteristics ◽

Multiple Choice ◽

Multiple Choice Items

Download Full-text

Discrimination and Uniformity of Test Item Distracters

Psychological Reports ◽

10.2466/pr0.1968.23.1.301 ◽

1968 ◽

Vol 23 (1) ◽

pp. 301-302

Author(s):

Lewis R. Aiken

Keyword(s):

Test Item ◽

Multiple Choice ◽

Discrimination Index ◽

Item Discrimination ◽

Multiple Choice Items

It is demonstrated that the item discrimination index (d) and an index of the uniformity of the distribution of choices of distracters (U) provide useful information about the effectiveness of distracters on multiple-choice items.

Download Full-text

A Meta-Analytic Review of Item Discrimination and Difficulty in Multiple-Choice Items Using "None-Of-The-Above"

Educational and Psychological Measurement ◽

10.1177/0013164492052003006 ◽

1992 ◽

Vol 52 (3) ◽

pp. 571-577 ◽

Cited By ~ 4

Author(s):

Susan L. Knowles ◽

Cynthia A. Welch

Keyword(s):

Multiple Choice ◽

Item Discrimination ◽

Analytic Review ◽

Multiple Choice Items

Download Full-text

On the Impact of the Response Options’ Position on Item Difficulty in Multiple-Choice-Items

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000615 ◽

2020 ◽

pp. 1-10

Author(s):

Bettina Hagenmüller

Keyword(s):

Large Scale ◽

Item Difficulty ◽

Multiple Choice ◽

Test Construction ◽

Test Model ◽

Group Setting ◽

Response Options ◽

Large Scale Assessment ◽

Multiple Choice Item ◽

The Impact

Abstract. The multiple-choice item format is widely used in test construction and Large-Scale Assessment. So far, there has been little research on the impact of the position of the solution among the response options and the few existing results are even inconsistent. Since it would be an easy way to create parallel items for group setting by altering the response options, the influence of the response options’ position on item difficulty should be examined. The Linear Logistic Test Model ( Fischer, 1972 ) was used to analyze the data of 829 students aged 8–20 years, who worked on general knowledge items. It was found that the position of the solution among the response options has an influence on item difficulty. Items are easiest when the solution is in first place and more difficult when the solution is placed in a middle position or at the end of the set of response options.

Download Full-text

Effects of Item Characteristics on Multiple-Choice Item Difficulty

Educational and Psychological Measurement ◽

10.1177/0013164484443002 ◽

1984 ◽

Vol 44 (3) ◽

pp. 551-561 ◽

Cited By ~ 16

Author(s):

Kathy Green

Keyword(s):

Item Difficulty ◽

Multiple Choice ◽

Multiple Choice Item ◽

Item Characteristics

Download Full-text

The Effect of the Most-Attractive-Distractor Location on Multiple-Choice Item Difficulty

The Journal of Experimental Education ◽

10.1080/00220973.2019.1629577 ◽

2019 ◽

Vol 88 (4) ◽

pp. 643-659

Author(s):

Jinnie Shin ◽

Okan Bulut ◽

Mark J. Gierl

Keyword(s):

Item Difficulty ◽

Multiple Choice ◽

Distractor Location ◽

Multiple Choice Item

Download Full-text

The Advantages of Five-Option Multiple-Choice Items in Classroom Tests of Student Mastery

Journal of Education, Teaching and Social Studies ◽

10.22158/jetss.v2n4p59 ◽

2020 ◽

Vol 2 (4) ◽

pp. p59

Author(s):

Michael Joseph Wise

Keyword(s):

Item Difficulty ◽

Multiple Choice ◽

Response Options ◽

Course Content ◽

Science Courses ◽

The Past ◽

Multiple Choice Items ◽

Writing Tests ◽

Do So

The effectiveness of multiple-choice (MC) items depends on the quality of the response options—particularly how well the incorrect options (“distractors”) attract students who have incomplete knowledge. It is often contended that test-writers are unable to devise more than two plausible distractors for most MC items, and that the effort needed to do so is not worthwhile in terms of the items’ psychometric qualities. To test these contentions, I analyzed students’ performance on 545 MC items across six science courses that I have taught over the past decade. Each MC item contained four distractors, and the dataset included more than 19,000 individual responses. All four distractors were deemed plausible in one-third of the items, and three distractors were plausible in another third. Each increase in plausible distractor led to an average of a 13% increase in item difficulty. Moreover, an increase in plausible distractors led to a significant increase in the discriminability of the items, with a leveling off by the fourth distractor. These results suggest that—at least for teachers writing tests to assess mastery of course content—it may be worthwhile to eschew recent skepticism and continue to attempt to write MC items with three or four distractors.

Download Full-text