Empirical Estimates of Intercorrelations among the Components of Scores on Multiple-Choice Tests

College students were instructed to indicate on various multiple-choice tests whether they “knew the answer” or “guessed” each item, and the results were treated as estimated true and error components of scores. The values of the intercorrelations of these components were similar to those given by a computer program described previously. The values found for all tests were consistent with the assumption that test scores consist of both independent and non-independent components of error and that the non-independent error component is relatively large.

Download Full-text

A Computer Program for Keying Options of Multiple-Choice Tests to Increase Internal Consistency1

Educational and Psychological Measurement ◽

10.1177/001316447203200320 ◽

1972 ◽

Vol 32 (3) ◽

pp. 789-791 ◽

Cited By ~ 1

Author(s):

Richard R. Reilly ◽

Barbara J. Dynarski

Keyword(s):

Computer Program ◽

Multiple Choice ◽

Multiple Choice Tests ◽

Choice Tests

Download Full-text

Examination of the Quality of Multiple-choice Items on Classroom Tests

The Canadian Journal for the Scholarship of Teaching and Learning ◽

10.5206/cjsotl-rcacea.2011.2.4 ◽

2011 ◽

Vol 2 (2) ◽

Cited By ~ 26

Author(s):

David DiBattista ◽

Laura Kurzawa

Keyword(s):

Nous Avons ◽

Test Scores ◽

Item Analysis ◽

Multiple Choice ◽

Discriminatory Power ◽

Multiple Choice Tests ◽

Choice Tests ◽

Discrimination Coefficient ◽

Multiple Choice Items

Because multiple-choice testing is so widespread in higher education, we assessed the quality of items used on classroom tests by carrying out a statistical item analysis. We examined undergraduates’ responses to 1198 multiple-choice items on sixteen classroom tests in various disciplines. The mean item discrimination coefficient was +0.25, with more than 30% of items having unsatisfactory coefficients less than +0.20. Of the 3819 distractors, 45% were flawed either because less than 5% of examinees selected them or because their selection was positively rather than negatively correlated with test scores. In three tests, more than 40% of the items had an unsatisfactory discrimination coefficient, and in six tests, more than half of the distractors were flawed. Discriminatory power suffered dramatically when the selection of one or more distractors was positively correlated with test scores, but it was only minimally affected by the presence of distractors that were selected by less than 5% of examinees. Our findings indicate that there is considerable room for improvement in the quality of many multiple-choice tests. We suggest that instructors consider improving the quality of their multiple-choice tests by conducting an item analysis and by modifying distractors that impair the discriminatory power of items. Étant donné que les examens à choix multiple sont tellement généralisés dans l’enseignement supérieur, nous avons effectué une analyse statistique des items utilisés dans les examens en classe afin d’en évaluer la qualité. Nous avons analysé les réponses des étudiants de premier cycle à 1198 questions à choix multiples dans 16 examens effectués en classe dans diverses disciplines. Le coefficient moyen de discrimination de l’item était +0.25. Plus de 30 % des items avaient des coefficients insatisfaisants inférieurs à + 0.20. Sur les 3819 distracteurs, 45 % étaient imparfaits parce que moins de 5 % des étudiants les ont choisis ou à cause d’une corrélation négative plutôt que positive avec les résultats des examens. Dans trois examens, le coefficient de discrimination de plus de 40 % des items était insatisfaisant et dans six examens, plus de la moitié des distracteurs était imparfaits. Le pouvoir de discrimination était considérablement affecté en cas de corrélation positive entre un distracteur ou plus et les résultatsde l’examen, mais la présence de distracteurs choisis par moins de 5 % des étudiants avait une influence minime sur ce pouvoir. Nos résultats indiquent que les examens à choix multiple peuvent être considérablement améliorés. Nous suggérons que les enseignants procèdent à une analyse des items et modifient les distracteurs qui compromettent le pouvoir de discrimination des items.

Download Full-text

TESTER: A computer program to produce individualized multiple-choice tests

Behavior Research Methods ◽

10.3758/bf03205100 ◽

1978 ◽

Vol 10 (1) ◽

pp. 77-77

Author(s):

Rober Hamer ◽

Forrest W. Young

Keyword(s):

Computer Program ◽

Multiple Choice ◽

Multiple Choice Tests ◽

Choice Tests

Download Full-text

Empirical Estimates of the Comparative Reliability of Matching Tests and Multiple-Choice Tests

The Journal of Experimental Education ◽

10.1080/00220973.1984.11011890 ◽

1984 ◽

Vol 52 (3) ◽

pp. 179-182 ◽

Cited By ~ 3

Author(s):

Donald W. Zimmerman ◽

Richard H. Williams ◽

Debbra L. Symons

Keyword(s):

Multiple Choice ◽

Multiple Choice Tests ◽

Empirical Estimates ◽

Choice Tests

Download Full-text

MINISCORE—A Computer Program for Scoring Multiple Choice Tests and its Relation to Self Learning Assessment Modules (SLAM)∗

International Journal of Mathematical Education in Science and Technology ◽

10.1080/0020739720030406 ◽

1972 ◽

Vol 3 (4) ◽

pp. 367-373 ◽

Cited By ~ 2

Author(s):

M.D. Buckley‐Sharp ◽

F.T.C. Harris

Keyword(s):

Computer Program ◽

Multiple Choice ◽

Learning Assessment ◽

Multiple Choice Tests ◽

Choice Tests ◽

Self Learning

Download Full-text

Personality Correlates of Multiple Choice Answer-Changing Patterns

Psychological Reports ◽

10.2466/pr0.1982.51.2.523 ◽

1982 ◽

Vol 51 (2) ◽

pp. 523-527 ◽

Cited By ~ 3

Author(s):

Lillian M. Range ◽

Howard N. Anderson ◽

Andrea L. Wesley

Keyword(s):

College Students ◽

Multiple Choice ◽

Positive View ◽

Multiple Choice Tests ◽

Personality Correlates ◽

Choice Tests ◽

Changing Patterns

On multiple-choice tests, 52 anxious college students changed answers significantly more often than nonanxious students. Nondepressed students, and those who held a positive view of the nature of man, were more successful in changing answers. Student's who made B's were more successful than C students in changing answers.

Download Full-text

On the Risk of Certain Psychotechnological Response Options in Multiple-Choice Tests

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000040 ◽

2010 ◽

Vol 26 (4) ◽

pp. 302-308 ◽

Cited By ~ 2

Author(s):

Klaus D. Kubinger ◽

Christine Wolfsbauer

Keyword(s):

Test Scores ◽

Multivariate Analyses ◽

Multiple Choice ◽

Response Options ◽

Multiple Choice Tests ◽

Choice Tests ◽

Different Types ◽

Moderate Effect ◽

Multiple Choice Items ◽

Lower Test

Test authors may think about adding the response options “I don’t know the solution” and “none of the other options is correct” in order to reduce a high guessing probability for multiple-choice items. However, in this paper it was expected that different types of personality would use these response options differently, as a consequence of which they would do more or less guessing and, therefore, achieve higher or lower test scores, on average. An experiment was performed based on randomizing participants into two groups, one of them being warned that it is better to admit being unable to solve the item, and the participants were classified according to their personality scores into high-, medium-, and low-scoring. Multivariate analyses of variance (195 pupils between 14 and 19 years) disclosed that only Openness to Experience showed any (moderate) effect, and even this only for a single subtest (Cattell’s culture fair test).

Download Full-text

Assessing vocabulary size through multiple-choice formats

ITL - International Journal of Applied Linguistics ◽

10.1075/itl.166.2.04gyl ◽

2015 ◽

Vol 166 (2) ◽

pp. 278-306 ◽

Cited By ~ 34

Author(s):

Henrik Gyllstad ◽

Laura Vilkaitė ◽

Norbert Schmitt

Keyword(s):

Test Scores ◽

Representative Sample ◽

Sampling Rate ◽

Multiple Choice ◽

Criterion Measure ◽

Vocabulary Size ◽

Multiple Choice Tests ◽

Choice Tests ◽

Multiple Choice Items

In most tests of vocabulary size, knowledge is assessed through multiple-choice formats. Despite advantages such as ease of scoring, multiple-choice tests (MCT) are accompanied with problems. One of the more central issues has to do with guessing and the presence of other construct-irrelevant strategies that can lead to overestimation of scores. A further challenge when designing vocabulary size tests is that of sampling rate. How many words constitute a representative sample of the underlying population of words that the test is intended to measure? This paper addresses these two issues through a case study based on data from a recent and increasingly used MCT of vocabulary size: the Vocabulary Size Test. Using a criterion-related validity approach, our results show that for multiple-choice items sampled from this test, there is a discrepancy between the test scores and the scores obtained from the criterion measure, and that a higher sampling rate would be needed in order to better represent knowledge of the underlying population of words. We offer two main interpretations of these results, and discuss their implications for the construction and use of vocabulary size tests.

Download Full-text