scholarly journals Setting Standards With Multiple-Choice Tests: A Preliminary Intended-User Evaluation of SmartStandardSet

2021 ◽  
Vol 6 ◽  
Author(s):  
Gavin T. L. Brown ◽  
Paul Denny ◽  
David L. San Jose ◽  
Ellen Li

Software that easily helps higher education instructors to remove poor quality items and set appropriate grade boundaries is generally lacking. To address these challenges, the SmartStandardSet system provides a graphical-user interface for removing defective items, weighting student scores using a two-parameter model IRT score analysis, and a mechanism for standard-setting. We evaluated the system through a series of six interviews with teachers and six focus groups involving 19 students to understand how key stakeholders would view the use of the tool in practice. Generally, both groups of participants reported high levels of feasibility, accuracy, and utility in SmartStandardSet’s statistical scoring of items and score calculation for test-takers. Teachers indicated the data displays would help them improve future test items; students indicated the system would be fairer and would motivate greater effort on more difficult test items. However, both groups had concerns about implementing the system without institutional policy endorsement. Students specifically were concerned that academics may set grade boundaries on arbitrary and invalid grounds. Our results provide useful insights into the perceived benefits of using the tool for standard setting, and suggest concrete next steps for gaining wider acceptance that will be the focus of future work.

2020 ◽  
pp. 016327872090891
Author(s):  
Eric Shappell ◽  
Gregory Podolej ◽  
James Ahn ◽  
Ara Tekian ◽  
Yoon Soo Park

Mastery learning assessments have been described in simulation-based educational interventions; however, studies applying mastery learning to multiple-choice tests (MCTs) are lacking. This study investigates an approach to item generation and standard setting for mastery learning MCTs and evaluates the consistency of learner performance across sequential tests. Item models, variables for question stems, and mastery standards were established using a consensus process. Two test forms were created using item models. Tests were administered at two training programs. The primary outcome, the test–retest consistency of pass–fail decisions across versions of the test, was 94% (κ = .54). Decision-consistency classification was .85. Item-level consistency was 90% (κ = .77, SE = .03). These findings support the use of automatic item generation to create mastery MCTs which produce consistent pass–fail decisions. This technique broadens the range of assessment methods available to educators that require serial MCT testing, including mastery learning curricula.


Author(s):  
V. L. Kiselev ◽  
V. V. Maretskaya ◽  
O. V. Spiridonov

Testing is one of the most effective ways for monitoring of students՚ current academic performance. Multiple choice tests are the most common and most often used tasks in the practical activities of higher education teachers. The approaches to the test development are shown and examples of test tasks for students of engineering specialties of highereducational institution are presented in the article.


2020 ◽  
Vol 1 (1) ◽  
pp. 17
Author(s):  
Ariza Fitriani Safira ◽  
Haratua Tiur Maria S ◽  
Syukran Mursyid

This research aims to examine the effectiveness of the integrated remediation using booklet-assisted CORE leraning model to reduce students' learning difficulties about vibrations and waves at SMP Negeri 2 Pontianak. The research design is One-Group Pretest-Posttest Design with 27 students in class VIII C as samples using the intact group technique. The instruments are two-tiered multiple choice tests to analyze students’ learning difficulties related to concepts and essay tests to analyze students’ learning difficulties related to errors in solving problems. Based on the results, it is found that: (1) Students’ total misconceptions decreasing by 84.2%. (2) There is a significant conceptual change in all test items. (3) All students’ total errors in solving questions decreased by 78.32% (4) There are significant difference between the number of students' initial and final mistakes. (5) The percentage of all students’ learning difficulties decreased by 78.91%. (6) The remediation integration using CORE model assisted by a booklet has a high effectiveness to reduce students’ learning difficulties. The result of this study is expected to be a consideration for teachers in designing learning difficulties’ remediation activities for students, especially about vibrations and waves.Keywords: CORE Learning Model, Booklet, Students’ Learning Difficulties, Vibration and Waves


Seminar.net ◽  
2010 ◽  
Vol 6 (3) ◽  
Author(s):  
Bjørn Klefstad ◽  
Geir Maribu ◽  
Svend Andreas Horgen ◽  
Thorleif Hjeltnes

The use of digital multiple-choice tests in formative and summative assessment has many advantages. Such tests are effective, objective, and flexible. However, it is still challenging to create tests that are valid and reliable. Bloom’s taxonomy is used as a framework for assessment in higher education and therefore has a great deal of influence on how the learning outcomes are formulated. Using digital tools to create tests has been common for some time, yet the tests are still mostly answered on paper. Our hypothesis has two parts: first, it is possible to create summative tests that match different levels and learning outcomes within a chosen subject; second, a test tool of some kind is necessary to enable teachers and examiners to take a more proactive attitude to(wards) different levels and learning outcomes in a subject and so ensure the quality of digital test designing. Based on an analysis of several digital tests we examine to what degree learning outcomes and levels are reflected in the different test questions. We also suggest functionality for a future test tool to support an improved design process.


1998 ◽  
Vol 14 (3) ◽  
pp. 197-201 ◽  
Author(s):  
Ana R. Delgado ◽  
Gerardo Prieto

This study examined the validity of an item-writing rule concerning the optimal number of options in the design of multiple-choice test items. Although measurement textbooks typically recommend the use of four or five options - and most ability and achievement tests still follow this rule - theoretical papers as well as empirical research over a period of more than half a century reveal that three options may be more suitable for most ability and achievement test items. Previous results show that three-option items, compared with their four-option versions, tend to be slightly easier (i. e., with higher traditional difficulty indexes) without showing any decrease in discrimination. In this study, two versions (with four and three options) of 90 items comprising three computerized examinations were applied in successive years, showing the expected trend. In addition, there were no systematic changes in reliability for the tests, which adds to the evidence favoring the use of the three-option test item.


2017 ◽  
Vol 33 (5) ◽  
pp. 336-344 ◽  
Author(s):  
Birk Diedenhofen ◽  
Jochen Musch

Abstract. Standard dichotomous scoring of multiple-choice test items grants no partial credit for partial knowledge. Empirical option weighting is an alternative, polychotomous scoring method that uses the point-biserial correlation between option choices and total score as a weight for each answer alternative. Extant studies demonstrate that the method increases reliability of multiple-choice tests in comparison to conventional scoring. Most previous studies employed a correlational validation approach, however, and provided mixed findings with regard to the validity of empirical option weighting. The present study is the first investigation using an experimental approach to determine the reliability and validity of empirical option weighting. To obtain an external validation criterion, we experimentally induced various degrees of knowledge in a domain of which participants had no knowledge. We found that in comparison to dichotomous scoring, empirical option weighting increased both reliability and validity of a multiple-choice knowledge test employing distractors that were appealing to test takers with different levels of knowledge. A potential application of the present results is the computation and publication of empirical option weights for existing multiple-choice knowledge tests that have previously been scored dichotomously.


1970 ◽  
Vol 27 (1) ◽  
pp. 91-98 ◽  
Author(s):  
Robert M. Rippey

A system for responding to and scoring multiple-choice tests is proposed. This system asks students to express their distribution of preference for options as well as their certainty in that distribution. Such a system of scoring allows the use of types of test items which have previously been ignored.


1995 ◽  
Vol 77 (3) ◽  
pp. 760-762
Author(s):  
Kenneth S. Shultz

Little research has been conducted on the use of linear polychotomous scoring of multiple-choice test items. Therefore, several tests were analyzed using both dichotomous and polychotomous scoring of test items to assess how the alpha reliabilities of the tests change based on the type of scoring used. In each case, the alpha reliabilities of the tests increased, with the same or fewer number of items in each test, when polychotomous (vs dichotomous) scoring of multiple-choice test items was used.


1978 ◽  
Vol 5 (3) ◽  
pp. 144-146 ◽  
Author(s):  
Andrew S. Bondy

Reviewing test items improves subsequent scores on identical items, but does not generalize to a rewrite of those items.


1984 ◽  
Vol 54 (2) ◽  
pp. 419-425
Author(s):  
R. A. Weitzman

In an ideal multiple-choice test, defined as a multiple-choice test containing only items with options that are all equally guessworthy, the probability of guessing the correct answer to an item is equal to the reciprocal of the number of the item's options. This article presents an asymptotically exact estimator of the test-retest reliability of an ideal multiple-choice test. When all test items have the same number of options, computation of the estimator requires, in addition to the number of options per item, the same information as computation of the Kuder-Richardson Formula 21: the total number of items answered correctly on a single testing occasion by each person tested. Both for ideal multiple-choice tests and for nonideal multiple-choice tests for which the average probability of guessing the correct answer to an item is equal to the reciprocal of the number of options per item, Monte Carlo data show that the estimator is considerably more accurate than the Kuder-Richardson Formula 21 and, in fact, is very nearly exact in populations of the order of 1000 persons.


Sign in / Sign up

Export Citation Format

Share Document