Empirical Option Weights Improve the Validity of a Multiple-Choice 					Knowledge Test

Abstract. Standard dichotomous scoring of multiple-choice test items grants no partial credit for partial knowledge. Empirical option weighting is an alternative, polychotomous scoring method that uses the point-biserial correlation between option choices and total score as a weight for each answer alternative. Extant studies demonstrate that the method increases reliability of multiple-choice tests in comparison to conventional scoring. Most previous studies employed a correlational validation approach, however, and provided mixed findings with regard to the validity of empirical option weighting. The present study is the first investigation using an experimental approach to determine the reliability and validity of empirical option weighting. To obtain an external validation criterion, we experimentally induced various degrees of knowledge in a domain of which participants had no knowledge. We found that in comparison to dichotomous scoring, empirical option weighting increased both reliability and validity of a multiple-choice knowledge test employing distractors that were appealing to test takers with different levels of knowledge. A potential application of the present results is the computation and publication of empirical option weights for existing multiple-choice knowledge tests that have previously been scored dichotomously.

Download Full-text

Increasing Alpha Reliabilities of Multiple-Choice Tests with Linear Polychotomous Scoring

Psychological Reports ◽

10.2466/pr0.1995.77.3.760 ◽

1995 ◽

Vol 77 (3) ◽

pp. 760-762

Author(s):

Kenneth S. Shultz

Keyword(s):

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test ◽

Multiple Choice Tests ◽

Test Items ◽

Choice Tests ◽

Dichotomous Scoring

Little research has been conducted on the use of linear polychotomous scoring of multiple-choice test items. Therefore, several tests were analyzed using both dichotomous and polychotomous scoring of test items to assess how the alpha reliabilities of the tests change based on the type of scoring used. In each case, the alpha reliabilities of the tests increased, with the same or fewer number of items in each test, when polychotomous (vs dichotomous) scoring of multiple-choice test items was used.

Download Full-text

PRACTICAL EXPERIENCE IN DEVELOPING MULTIPLE CHOICE TEST ITEMS FOR MONITORING THE CURRENT PROGRESS OF STUDENTS OF MECHANICAL ENGINEERING SPECIALTIES

Spravochnik Inzhenernyi zhurnal ◽

10.14489/hb.2021.09.pp.041-052 ◽

2021 ◽

pp. 41-52

Author(s):

V. L. Kiselev ◽

V. V. Maretskaya ◽

O. V. Spiridonov

Keyword(s):

Higher Education ◽

Academic Performance ◽

Mechanical Engineering ◽

Multiple Choice ◽

Practical Experience ◽

Choice Test ◽

Multiple Choice Test ◽

Multiple Choice Tests ◽

Test Items ◽

Choice Tests

Testing is one of the most effective ways for monitoring of students՚ current academic performance. Multiple choice tests are the most common and most often used tasks in the practical activities of higher education teachers. The approaches to the test development are shown and examples of test tasks for students of engineering specialties of highereducational institution are presented in the article.

Download Full-text

Further Evidence Favoring Three-Option Items in Multiple-Choice Tests

European Journal of Psychological Assessment ◽

10.1027/1015-5759.14.3.197 ◽

1998 ◽

Vol 14 (3) ◽

pp. 197-201 ◽

Cited By ~ 20

Author(s):

Ana R. Delgado ◽

Gerardo Prieto

Keyword(s):

Test Item ◽

Multiple Choice ◽

Achievement Test ◽

Optimal Number ◽

Choice Test ◽

Multiple Choice Test ◽

Multiple Choice Tests ◽

Test Items ◽

Choice Tests ◽

Item Writing

This study examined the validity of an item-writing rule concerning the optimal number of options in the design of multiple-choice test items. Although measurement textbooks typically recommend the use of four or five options - and most ability and achievement tests still follow this rule - theoretical papers as well as empirical research over a period of more than half a century reveal that three options may be more suitable for most ability and achievement test items. Previous results show that three-option items, compared with their four-option versions, tend to be slightly easier (i. e., with higher traditional difficulty indexes) without showing any decrease in discrimination. In this study, two versions (with four and three options) of 90 items comprising three computerized examinations were applied in successive years, showing the expected trend. In addition, there were no systematic changes in reliability for the tests, which adds to the evidence favoring the use of the three-option test item.

Download Full-text

Test-Retest Reliability of a Formula-Scored Multiple-Choice Test

Psychological Reports ◽

10.2466/pr0.1984.54.2.419 ◽

1984 ◽

Vol 54 (2) ◽

pp. 419-425

Author(s):

R. A. Weitzman

Keyword(s):

Correct Answer ◽

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test ◽

Monte Carlo Data ◽

Retest Reliability ◽

Multiple Choice Tests ◽

Test Items ◽

Choice Tests ◽

Test Retest Reliability

In an ideal multiple-choice test, defined as a multiple-choice test containing only items with options that are all equally guessworthy, the probability of guessing the correct answer to an item is equal to the reciprocal of the number of the item's options. This article presents an asymptotically exact estimator of the test-retest reliability of an ideal multiple-choice test. When all test items have the same number of options, computation of the estimator requires, in addition to the number of options per item, the same information as computation of the Kuder-Richardson Formula 21: the total number of items answered correctly on a single testing occasion by each person tested. Both for ideal multiple-choice tests and for nonideal multiple-choice tests for which the average probability of guessing the correct answer to an item is equal to the reciprocal of the number of options per item, Monte Carlo data show that the estimator is considerably more accurate than the Kuder-Richardson Formula 21 and, in fact, is very nearly exact in populations of the order of 1000 persons.

Download Full-text

A new scoring method for multiple choice tests in learning management systems

International Journal of Academic Research ◽

10.7813/2075-4124.2014/6-1/a.36 ◽

2014 ◽

Vol 6 (1) ◽

pp. 274-280 ◽

Cited By ~ 1

Author(s):

Waheeb A. Abu-Dawwas

Keyword(s):

Multiple Choice ◽

Learning Management Systems ◽

Management Systems ◽

Scoring Method ◽

Multiple Choice Tests ◽

Choice Tests ◽

Learning Management ◽

New Scoring

Download Full-text

INCREMENTAL RELIABILITY AND VALIDITY OF MULTIPLE-CHOICE TESTS WITH AN ANSWER-UNTIL-CORRECT PROCEDURE1

Journal of Educational Measurement ◽

10.1111/j.1745-3984.1975.tb01019.x ◽

1975 ◽

Vol 12 (3) ◽

pp. 175-178 ◽

Cited By ~ 15

Author(s):

GERALD S. HANNA

Keyword(s):

Multiple Choice ◽

Reliability And Validity ◽

Multiple Choice Tests ◽

Choice Tests

Download Full-text

Reliability and Validity of Multiple-Choice Tests Developed by Four Distractor Selection Procedures 1

The Journal of Educational Research ◽

10.1080/00220671.1978.10885071 ◽

1978 ◽

Vol 71 (4) ◽

pp. 203-206 ◽

Cited By ~ 5

Author(s):

Gerald S. Hanna ◽

Fred R. Johnson

Keyword(s):

Multiple Choice ◽

Reliability And Validity ◽

Selection Procedures ◽

Multiple Choice Tests ◽

Choice Tests

Download Full-text

A theoretical formalization of the probability of solving multiple-choice tests and its application to different scoring rules

10.31234/osf.io/4pk6x ◽

2021 ◽

Author(s):

Rasmus Persson

Keyword(s):

Test Score ◽

Multiple Choice ◽

Choice Test ◽

Partial Knowledge ◽

Test Taker ◽

Test Error ◽

Multiple Choice Tests ◽

Item Level ◽

Choice Tests ◽

Latent Ability

In multiple-choice tests, guessing is a source of test error which can be suppressed if its expected score is made negative by either penalizing wrong answers or rewarding expressions of partial knowledge. We consider an arbitrarymultiple-choice test taken by a rational test-taker that knows an arbitrary fraction of its keys and distractors. For this model, we compare the relation between the obtained score for standard marking (where guessing is not penalized), marking where guessing is suppressed either by expensive score penalties for incorrect answers or by marking schemes that reward partial knowledge. While the “best” scoring system (in the sense that latent ability and test score are linearly related) will depend on the underlying ability distribution, we find a superiority of the scoring rule of Zapechelnyuk (Economics Letters, 132, 2015) but, except for item-level discrimination among test-takers, a single penalty for wrong answers seems to yield just as good or better results as more intricate schemes with partial credit.

Download Full-text

An Automatic Quantification of the Randomness of Answering Correctly in Taking Traditional Multiple-choice Tests

Theory and Practice in Language Studies ◽

10.17507/tpls.0809.07 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1152

Author(s):

Qingsong Gu ◽

Michael W. Schwartz

Keyword(s):

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test ◽

Microsoft Excel ◽

Automatic Quantification ◽

Multiple Choice Tests ◽

Choice Tests ◽

Multiple Choice Items ◽

Passing Score ◽

Random Guessing

In taking traditional multiple-choice tests, random guessing is unavoidable yet nonnegligible. To uncover the “unfairness” caused by random guessing, this paper designed a Microsoft Excel template with the use of relevant functions to automatically quantify the probability of answering correctly at random, eventually figuring out the least scores a testee should get to pass a traditional multiple-choice test with different probabilities of answering correctly at random and the “luckiness” for passing it. This paper concludes that, although random guessing is nonnegligible, it is unnecessary to remove traditional multiple-choice items from all testing activities, because it can be controlled through changing the passing score and the number of options or reducing its percentage in a test.

Download Full-text

The Analysis of the Teacher-Made Multiple-Choice Tests Quality for English Subject

Journal of Education Research and Evaluation ◽

10.23887/jere.v4i3.25814 ◽

2020 ◽

Vol 4 (3) ◽

pp. 272

Author(s):

M.S.D, Indrayani ◽

A.A.I.N, Marhaeini ◽

A.A.G.Y, Paramartha ◽

L.G.E, Wahyuni

Keyword(s):

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test ◽

Multiple Choice Tests ◽

Descriptive Research ◽

Choice Tests ◽

Research Document ◽

Editing Process ◽

English Subject

This study aimed at investigating and analyze the quality of teacher-made multiple-choice tests used as summative assessment for English subject. The quality of the tests was seen from the norms in constructing a good multiple-choice test. The research design used was descriptive research. Document study and interview were used as methods of collecting the data. The data was analyzed by comparing the 18 norms in constructing a good multiple-choice test and the multiple-choice tests, then, analyzed by using formula suggested by Nurkencana. The result showed the quality of the teacher-made multiple-choice tests a is very good with 79 items (99%) qualified as very good and I item (1%) qualified good. There were still found some problems referring to some norms. Therefore, it is suggested that the teachers have to pay attention to these unfulfilled norms. To minimize the issues, it is further suggested to do peer review, rechecking, and editing process.

Download Full-text