scholarly journals A Baseline for Multiple-Choice Testing in the University Classroom

SAGE Open ◽  
2021 ◽  
Vol 11 (2) ◽  
pp. 215824402110168
Author(s):  
A. D. Slepkov ◽  
M. L. Van Bussel ◽  
K. M. Fitze ◽  
W. S. Burr

There is a broad literature in multiple-choice test development, both in terms of item-writing guidelines, and psychometric functionality as a measurement tool. However, most of the published literature concerns multiple-choice testing in the context of expert-designed high-stakes standardized assessments, with little attention being paid to the use of the technique within non-expert instructor-created classroom examinations. In this work, we present a quantitative analysis of a large corpus of multiple-choice tests deployed in the classrooms of a primarily undergraduate university in Canada. Our report aims to establish three related things. First, reporting on the functional and psychometric operation of 182 multiple-choice tests deployed in a variety of courses at all undergraduate levels of education establishes a much-needed baseline for actual as-deployed classroom tests. Second, we motivate and present modified statistical measures—such as item-excluded correlation measures of discrimination and length-normalized measures of reliability—that should serve as useful parameters for future comparisons of classroom test psychometrics. Finally, we use the broad empirical data from our survey of tests to update widely used item-quality guidelines.

Author(s):  
V. L. Kiselev ◽  
V. V. Maretskaya ◽  
O. V. Spiridonov

Testing is one of the most effective ways for monitoring of students՚ current academic performance. Multiple choice tests are the most common and most often used tasks in the practical activities of higher education teachers. The approaches to the test development are shown and examples of test tasks for students of engineering specialties of highereducational institution are presented in the article.


2021 ◽  
Author(s):  
Rasmus Persson

In multiple-choice tests, guessing is a source of test error which can be suppressed if its expected score is made negative by either penalizing wrong answers or rewarding expressions of partial knowledge. We consider an arbitrarymultiple-choice test taken by a rational test-taker that knows an arbitrary fraction of its keys and distractors. For this model, we compare the relation between the obtained score for standard marking (where guessing is not penalized), marking where guessing is suppressed either by expensive score penalties for incorrect answers or by marking schemes that reward partial knowledge. While the “best” scoring system (in the sense that latent ability and test score are linearly related) will depend on the underlying ability distribution, we find a superiority of the scoring rule of Zapechelnyuk (Economics Letters, 132, 2015) but, except for item-level discrimination among test-takers, a single penalty for wrong answers seems to yield just as good or better results as more intricate schemes with partial credit.


2018 ◽  
Vol 8 (9) ◽  
pp. 1152
Author(s):  
Qingsong Gu ◽  
Michael W. Schwartz

In taking traditional multiple-choice tests, random guessing is unavoidable yet nonnegligible. To uncover the “unfairness” caused by random guessing, this paper designed a Microsoft Excel template with the use of relevant functions to automatically quantify the probability of answering correctly at random, eventually figuring out the least scores a testee should get to pass a traditional multiple-choice test with different probabilities of answering correctly at random and the “luckiness” for passing it. This paper concludes that, although random guessing is nonnegligible, it is unnecessary to remove traditional multiple-choice items from all testing activities, because it can be controlled through changing the passing score and the number of options or reducing its percentage in a test.


2020 ◽  
Vol 4 (3) ◽  
pp. 272
Author(s):  
M.S.D, Indrayani ◽  
A.A.I.N, Marhaeini ◽  
A.A.G.Y, Paramartha ◽  
L.G.E, Wahyuni

This study aimed at investigating and analyze the quality of teacher-made multiple-choice tests used as summative assessment for English subject. The quality of the tests was seen from the norms in constructing a good multiple-choice test. The research design used was descriptive research. Document study and interview were used as methods of collecting the data. The data was analyzed by comparing the 18 norms in constructing a good multiple-choice test and the multiple-choice tests, then, analyzed by using formula suggested by Nurkencana. The result showed the quality of the teacher-made multiple-choice tests a is very good with 79 items (99%) qualified as very good and I item (1%) qualified good. There were still found some problems referring to some norms. Therefore, it is suggested that the teachers have to pay attention to these unfulfilled norms. To minimize the issues, it is further suggested to do peer review, rechecking, and editing process.


1999 ◽  
Vol 15 (2) ◽  
pp. 143-150 ◽  
Author(s):  
Gerardo Prieto ◽  
Ana R. Delgado

Summary: Most standardized tests instruct subjects to guess under scoring procedures that do not correct for guessing or correct only for expected random guessing. Other scoring rules, such as offering a small reward for omissions or punishing errors by discounting more than expected from random guessing, have been proposed. This study was designed to test the effects of these four instruction/scoring conditions on performance indicators and on score reliability of multiple-choice tests. Some 240 participants were randomly assigned to four conditions differing in how much they discourage guessing. Subjects performed two psychometric computerized tests, which differed only in the instructions provided and the associated scoring procedure. For both tests, our hypotheses predicted (0) an increasing trend in omissions (showing that instructions were effective); (1) decreasing trends in wrong and right responses; (2) an increase in reliability estimates of both number right and scores. Predictions regarding performance indicators were mostly fulfilled, but expected differences in reliability failed to appear. The discussion of results takes into account not only psychometric issues related to guessing, but also the misleading educational implications of recommendations to guess in testing contexts.


1998 ◽  
Vol 14 (3) ◽  
pp. 197-201 ◽  
Author(s):  
Ana R. Delgado ◽  
Gerardo Prieto

This study examined the validity of an item-writing rule concerning the optimal number of options in the design of multiple-choice test items. Although measurement textbooks typically recommend the use of four or five options - and most ability and achievement tests still follow this rule - theoretical papers as well as empirical research over a period of more than half a century reveal that three options may be more suitable for most ability and achievement test items. Previous results show that three-option items, compared with their four-option versions, tend to be slightly easier (i. e., with higher traditional difficulty indexes) without showing any decrease in discrimination. In this study, two versions (with four and three options) of 90 items comprising three computerized examinations were applied in successive years, showing the expected trend. In addition, there were no systematic changes in reliability for the tests, which adds to the evidence favoring the use of the three-option test item.


1991 ◽  
Vol 69 (3) ◽  
pp. 769-770
Author(s):  
John Trinkaus

A number of studies performed primarily with students studying education and psychology suggest a generally held belief that more points are to be lost than gained by changing initial answers on multiple-choice tests. A survey of 442 undergraduate business students tended to confirm the results of a recent inquiry that implied business administration students appear to hold a similar belief.


1965 ◽  
Vol 16 (3_suppl) ◽  
pp. 1193-1196 ◽  
Author(s):  
Donald W. Zimmerman ◽  
Richard H. Williams

Chance success due to guessing is treated as a component of the error variance of a multiple-choice test score. It is shown that for a test of given item structure the minimum standard error of measurement can be estimated by the formula (N−X)/a. where N is the total number of items, X is the score, and a is the number of alternative choices per item. The significance of non-independence of true score and this component of error score on multiple-choice tests is discussed.


2010 ◽  
Vol 1 (4) ◽  
pp. 32-41 ◽  
Author(s):  
E. Serradell-Lopez ◽  
P. Lara ◽  
D. Castillo ◽  
I. González

The purpose of this paper is to determine the effectiveness of using multiple choice tests in subjects related to the administration and business management. To this end the authors used a multiple-choice test with specific questions to verify the extent of knowledge gained and the confidence and trust in the answers. The analysis made, conducted by tests given out to a group of 200 students, has been implemented in one subject related with investment analysis and has measured the level of knowledge gained and the degree of trust and security in the responses at two different times of the business administration and management course. Measurements were taken into account at different levels of difficulty in the questions asked and the time spent by students to complete the test. Results confirm that students are generally able to obtain more knowledge along the way and get increases in the degree of trust and confidence. It is estimated that improvement in skills learned is viewed favourably by businesses and are important for job placement. Finally, the authors proceed to analyze a multi-choice test using a combination of knowledge and confidence levels.


2001 ◽  
Vol 23 (3) ◽  
pp. 275-292 ◽  
Author(s):  
Ronald H. Heck ◽  
Marian Crislip

Performance tests are increasingly used as alternatives to, or in connection with, standardized multiple-choice tests as a means of assessing student learning and school accountability. Besides their proposed equity advantages over multiple-choice tests in measuring student learning across groups of students, performance assessments have also been viewed as having greater utility for monitoring school progress because of their proposed closer correspondence to the curriculum that is actually taught. We examined these assumptions by comparing third-grade student performance on a performance-based writing test and a multiple-choice test of language skills. We observed smaller differences in achievement on the writing performance assessment for some groups of students (e.g., low socioeconomic status, various ethnic backgrounds) than are commonly observed on multiple-choice tests. Girls, however, had higher mean scores than boys on both types of assessments. Moreover, the school's identification and commitment over time to improving its students' writing skills positively related to its students' outcomes on the writing performance test. Overall, our examination of performance-based writing assessment is encouraging with respect to providing a relatively fair assessment and measuring learning tasks that are related to the school's curricular practices.


Sign in / Sign up

Export Citation Format

Share Document