Creating Diagnostic Assessments

The goal of this paper is to propose a new method to generate multiple-choice items that can make creating quality assessments faster and more efficient, solving a practical issue that many instructors face. There are currently no systematic, efficient methods available to generate quality distractors (plausible but incorrect options), which are necessary for multiple-choice assessments that accurately assess students’ knowledge. We propose two methods to use technology to generate quality multiple-choice assessments: (1) manipulating the mathematical problem to emulate common student misconceptions or errors and (2) disguising options to protect the integrity of multiple-choice tests. By linking options to common student misconceptions and errors, instructors can use assessments as personalized diagnostic tools that can target and modify underlying misconceptions. Moreover, using technology to generate these quality distractors would allow for assessments to be developed efficiently, in terms of both time and resources. The method to disguise the options generated would have the added benefit of preventing students from working backwards from options to solution and thus would protect the integrity of the assessment.

Download Full-text

Screening for technical flaws in multiple-choice items. A generalizability study.

Dansk Universitetspædagogisk Tidsskrift ◽

10.7146/dut.v14i26.106496 ◽

2019 ◽

Vol 14 (26) ◽

pp. 51-65

Author(s):

Lotte Dyhrberg O'Neill ◽

Sara Mathilde Radl Mortensen ◽

Cita Nørgård ◽

Anne Lindebo Holm Øvrehus ◽

Ulla Glenert Friis

Keyword(s):

Test Validity ◽

Multiple Choice ◽

Sound Quality ◽

Review Committee ◽

Multiple Choice Tests ◽

Local Contexts ◽

Choice Tests ◽

Local Test ◽

Multiple Choice Items ◽

Generalizability Study

Construction errors in multiple-choice items are quite prevalent and constitute threats to test validity of multiple-choice tests. Currently very little research on the usefulness of systematic item screening by local review committees before test administration seem to exist. The aim of this study was therefore to examine validity and feasibility aspects of review committee screening for item flaws. We examined the reliability of item reviewers’ independent judgments of the presence/absence of item flaws with a generalizability study design and found only moderate reliability using five reviewers. Statistical analyses of actual exam scores could be a more efficient way of identifying flaws and improving average item discrimination of tests in local contexts. The question of validity of human judgments of item flaws is important - not just for sufficiently sound quality assurance procedures of tests in local test contexts - but also for the global research on item flaws.

Download Full-text

Experience Versus Potential of Multiple-Choice Tests in Marketing Courses

Journal of Marketing Education ◽

10.1177/027347537900100205 ◽

1979 ◽

Vol 1 (2) ◽

pp. 24-33 ◽

Cited By ~ 3

Author(s):

James R. McMillan

Keyword(s):

Multiple Choice ◽

Multiple Choice Questions ◽

High Quality ◽

Test Form ◽

Multiple Choice Tests ◽

Choice Tests ◽

Classroom Evaluation ◽

Quality Objective ◽

Multiple Choice Items

Most educators agree that classroom evaluation practices need improvement. One way to improve testing is to use high-quality objective multiple-choice exams. Almost any understanding or ability which can be tested by another test form can also be tested by means of multiple-choice items. Based on a survey of 173 respondents, it appears that marketing teachers are disenchanted with multiple-choice questions and use them sparingly. Further, their limited use is largely in the introductory marketing course even though there are emerging pressures for universities to take a closer look at the quality of classroom evaluation at all levels.

Download Full-text

Examination of the Quality of Multiple-choice Items on Classroom Tests

The Canadian Journal for the Scholarship of Teaching and Learning ◽

10.5206/cjsotl-rcacea.2011.2.4 ◽

2011 ◽

Vol 2 (2) ◽

Cited By ~ 26

Author(s):

David DiBattista ◽

Laura Kurzawa

Keyword(s):

Nous Avons ◽

Test Scores ◽

Item Analysis ◽

Multiple Choice ◽

Discriminatory Power ◽

Multiple Choice Tests ◽

Choice Tests ◽

Discrimination Coefficient ◽

Multiple Choice Items

Because multiple-choice testing is so widespread in higher education, we assessed the quality of items used on classroom tests by carrying out a statistical item analysis. We examined undergraduates’ responses to 1198 multiple-choice items on sixteen classroom tests in various disciplines. The mean item discrimination coefficient was +0.25, with more than 30% of items having unsatisfactory coefficients less than +0.20. Of the 3819 distractors, 45% were flawed either because less than 5% of examinees selected them or because their selection was positively rather than negatively correlated with test scores. In three tests, more than 40% of the items had an unsatisfactory discrimination coefficient, and in six tests, more than half of the distractors were flawed. Discriminatory power suffered dramatically when the selection of one or more distractors was positively correlated with test scores, but it was only minimally affected by the presence of distractors that were selected by less than 5% of examinees. Our findings indicate that there is considerable room for improvement in the quality of many multiple-choice tests. We suggest that instructors consider improving the quality of their multiple-choice tests by conducting an item analysis and by modifying distractors that impair the discriminatory power of items. Étant donné que les examens à choix multiple sont tellement généralisés dans l’enseignement supérieur, nous avons effectué une analyse statistique des items utilisés dans les examens en classe afin d’en évaluer la qualité. Nous avons analysé les réponses des étudiants de premier cycle à 1198 questions à choix multiples dans 16 examens effectués en classe dans diverses disciplines. Le coefficient moyen de discrimination de l’item était +0.25. Plus de 30 % des items avaient des coefficients insatisfaisants inférieurs à + 0.20. Sur les 3819 distracteurs, 45 % étaient imparfaits parce que moins de 5 % des étudiants les ont choisis ou à cause d’une corrélation négative plutôt que positive avec les résultats des examens. Dans trois examens, le coefficient de discrimination de plus de 40 % des items était insatisfaisant et dans six examens, plus de la moitié des distracteurs était imparfaits. Le pouvoir de discrimination était considérablement affecté en cas de corrélation positive entre un distracteur ou plus et les résultatsde l’examen, mais la présence de distracteurs choisis par moins de 5 % des étudiants avait une influence minime sur ce pouvoir. Nos résultats indiquent que les examens à choix multiple peuvent être considérablement améliorés. Nous suggérons que les enseignants procèdent à une analyse des items et modifient les distracteurs qui compromettent le pouvoir de discrimination des items.

Download Full-text

Effect of Number of Alternatives and Scoring Instructions on Examinees' Reactions to Multiple-Choice Tests

Psychological Reports ◽

10.2466/pr0.1971.29.3f.1229 ◽

1971 ◽

Vol 29 (3_suppl) ◽

pp. 1229-1230

Author(s):

Carrie Wherry Waters ◽

L. K. Waters

Keyword(s):

Multiple Choice ◽

Multiple Choice Tests ◽

Choice Tests ◽

Multiple Choice Items

Reactions of examinees to 2 scoring instructions were evaluated for 2-, 3-, and 5-alternative multiple-choice items. Examinees were more favorable toward the “reward for omitted items” than the “penalty for wrongs” instructions across all numbers of item alternatives.

Download Full-text

An Automatic Quantification of the Randomness of Answering Correctly in Taking Traditional Multiple-choice Tests

Theory and Practice in Language Studies ◽

10.17507/tpls.0809.07 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1152

Author(s):

Qingsong Gu ◽

Michael W. Schwartz

Keyword(s):

Multiple Choice ◽

Choice Test ◽

Multiple Choice Test ◽

Microsoft Excel ◽

Automatic Quantification ◽

Multiple Choice Tests ◽

Choice Tests ◽

Multiple Choice Items ◽

Passing Score ◽

Random Guessing

In taking traditional multiple-choice tests, random guessing is unavoidable yet nonnegligible. To uncover the “unfairness” caused by random guessing, this paper designed a Microsoft Excel template with the use of relevant functions to automatically quantify the probability of answering correctly at random, eventually figuring out the least scores a testee should get to pass a traditional multiple-choice test with different probabilities of answering correctly at random and the “luckiness” for passing it. This paper concludes that, although random guessing is nonnegligible, it is unnecessary to remove traditional multiple-choice items from all testing activities, because it can be controlled through changing the passing score and the number of options or reducing its percentage in a test.

Download Full-text

On the Risk of Certain Psychotechnological Response Options in Multiple-Choice Tests

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000040 ◽

2010 ◽

Vol 26 (4) ◽

pp. 302-308 ◽

Cited By ~ 2

Author(s):

Klaus D. Kubinger ◽

Christine Wolfsbauer

Keyword(s):

Test Scores ◽

Multivariate Analyses ◽

Multiple Choice ◽

Response Options ◽

Multiple Choice Tests ◽

Choice Tests ◽

Different Types ◽

Moderate Effect ◽

Multiple Choice Items ◽

Lower Test

Test authors may think about adding the response options “I don’t know the solution” and “none of the other options is correct” in order to reduce a high guessing probability for multiple-choice items. However, in this paper it was expected that different types of personality would use these response options differently, as a consequence of which they would do more or less guessing and, therefore, achieve higher or lower test scores, on average. An experiment was performed based on randomizing participants into two groups, one of them being warned that it is better to admit being unable to solve the item, and the participants were classified according to their personality scores into high-, medium-, and low-scoring. Multivariate analyses of variance (195 pupils between 14 and 19 years) disclosed that only Openness to Experience showed any (moderate) effect, and even this only for a single subtest (Cattell’s culture fair test).

Download Full-text

Assessing vocabulary size through multiple-choice formats

ITL - International Journal of Applied Linguistics ◽

10.1075/itl.166.2.04gyl ◽

2015 ◽

Vol 166 (2) ◽

pp. 278-306 ◽

Cited By ~ 34

Author(s):

Henrik Gyllstad ◽

Laura Vilkaitė ◽

Norbert Schmitt

Keyword(s):

Test Scores ◽

Representative Sample ◽

Sampling Rate ◽

Multiple Choice ◽

Criterion Measure ◽

Vocabulary Size ◽

Multiple Choice Tests ◽

Choice Tests ◽

Multiple Choice Items

In most tests of vocabulary size, knowledge is assessed through multiple-choice formats. Despite advantages such as ease of scoring, multiple-choice tests (MCT) are accompanied with problems. One of the more central issues has to do with guessing and the presence of other construct-irrelevant strategies that can lead to overestimation of scores. A further challenge when designing vocabulary size tests is that of sampling rate. How many words constitute a representative sample of the underlying population of words that the test is intended to measure? This paper addresses these two issues through a case study based on data from a recent and increasingly used MCT of vocabulary size: the Vocabulary Size Test. Using a criterion-related validity approach, our results show that for multiple-choice items sampled from this test, there is a discrepancy between the test scores and the scores obtained from the criterion measure, and that a higher sampling rate would be needed in order to better represent knowledge of the underlying population of words. We offer two main interpretations of these results, and discuss their implications for the construction and use of vocabulary size tests.

Download Full-text

KAJIAN MISKONSEPSI SISWA MELALUI TES MULTIPLE CHOICE MENGGUNAKAN CERTAINTY OF RESPONSE INDEX (CRI) PADA MATERI REAKSI REDUKSI OKSIDASI KELAS X MIPA SMAN 1 PONTIANAK

AR-RAZI Jurnal Ilmiah ◽

10.29406/arz.v5i2.635 ◽

2017 ◽

Vol 5 (2) ◽

Author(s):

Nurlela Nurlela ◽

Mawardi Mawardi ◽

Tuti Kurniati

Keyword(s):

Data Collection ◽

Teaching Methods ◽

Multiple Choice ◽

Sampling Techniques ◽

Response Index ◽

Multiple Choice Tests ◽

Oxidation Reduction ◽

Student Misconceptions ◽

Measurement Results ◽

Choice Tests

ABSTRACTThis study aimed to describe the student misconceptions and find out the cause of misconceptions MIPA class X SMAN 1 Pontianak. The method used is descriptive qualitative approach. The technique of taking informants using purposive sampling techniques so that the informant obtained, namely class X MIPA 5 and 6. The data collection techniques such as measurement results of diagnostic tests and interviews. Data collection tools in the form of Multiple Choice Tests using Certainty Of Response Index (CRI), which consists of 10 items with five alternative answers and guidance interview. Based on the analysis of research data shows there are misconceptions students. The results showed the highest percentage of misconceptions in the amount of 63.93% on indicators to distinguish the concept of oxidation and reduction in terms of incorporation and release of oxygen and misconceptions lowest percentage that is equal to 4.92% on the indicator to distinguish the concept of oxidation and reduction in terms of release and acceptance of electrons. The cause of the misconceptions of students caused by the fault of students that includes associative thinking students, preconception or early concept is wrong, intuition is wrong, and the ability of students but it is also caused by the teaching methods are boring and less varied and selfconfidence of students is too great when filling criteria CRI.Keywords: student misconceptions, misconceptions cause, reaction oxidation reduction

Download Full-text