Are multiple-choice items unfair? And if so, for whom?

Due to their test economy and objective evaluability, multiple-choice items are used much more frequently to test knowledge than constructed-response questions. However, studies point out that dependencies may exist between the individual test result and the test format (multiple-choice or constructed-response). Studies testing economic knowledge (one dimension of economic competence) are using mainly multiple-choice items and indicate gender-specific performance in the corresponding tests in favour of male test-takers. As an explanation for these “gender differences” gender-specific affinities and differences in cognitive abilities are mentioned. Moreover, the test format itself is mentioned but has hardly been investigated in detail to date. In order to answer the question to what extent students test performance depends on the item format, we test economic knowledge using two test formats (constructed-response and multiple-choice), but with the same content. Results from 201 business and business education students show that the usage of constructed-response items can compensate for existing gender differences in 53% of all cases. This underlines that no general, gender-specific advantage or disadvantage can be assumed in relation to the item format. However, the mixed use of constructed-response and multiple-choice items seem promising to compensate for potential gender differences.

Download Full-text

MEASURING PROBLEM SOLVING ABILITY IN MATHEMATICS WITH MULTIPLE-CHOICE ITEMS: THE EFFECT OF ITEM FORMAT ON SELECTED ITEM AND TEST CHARACTERISTICS

Journal of Educational Measurement ◽

10.1111/j.1745-3984.1980.tb00812.x ◽

1980 ◽

Vol 17 (1) ◽

pp. 31-43 ◽

Cited By ~ 9

Author(s):

ROBERT A. FORSYTH ◽

KEVIN F. SPRATT

Keyword(s):

Problem Solving ◽

Multiple Choice ◽

Item Format ◽

Test Characteristics ◽

Multiple Choice Items

Download Full-text

The Relationship Between Test Item Format and Gender Achievement Gaps on Math and ELA Tests in Fourth and Eighth Grades

Educational Researcher ◽

10.3102/0013189x18762105 ◽

2018 ◽

Vol 47 (5) ◽

pp. 284-294 ◽

Cited By ~ 19

Author(s):

Sean F. Reardon ◽

Demetra Kalogrides ◽

Erin M. Fahle ◽

Anne Podolsky ◽

Rosalía C. Zárate

Keyword(s):

Test Item ◽

Multiple Choice ◽

The United States ◽

Achievement Gaps ◽

Item Format ◽

Northwest Evaluation Association ◽

Constructed Response ◽

State Accountability ◽

And Gender ◽

Gender Achievement

Prior research suggests that males outperform females, on average, on multiple-choice items compared to their relative performance on constructed-response items. This paper characterizes the extent to which gender achievement gaps on state accountability tests across the United States are associated with those tests’ item formats. Using roughly 8 million fourth- and eighth-grade students’ scores on state assessments, we estimate state- and district-level math and reading male-female achievement gaps. We find that the estimated gaps are strongly associated with the proportions of the test scores based on multiple-choice and constructed-response questions on state accountability tests, even when controlling for gender achievement gaps as measured by the National Assessment of Educational Progress (NAEP) or Northwest Evaluation Association (NWEA) Measures of Academic Progress (MAP) assessments, which have the same item format across states. We find that test item format explains approximately 25% of the variation in gender achievement gaps among states.

Download Full-text

Multiple Choice and Constructed Response Tests: Do Test Format and Scoring Matter?

Procedia - Social and Behavioral Sciences ◽

10.1016/j.sbspro.2011.02.035 ◽

2011 ◽

Vol 12 ◽

pp. 263-273 ◽

Cited By ~ 12

Author(s):

Margit Kastner ◽

Barbara Stangla

Keyword(s):

Multiple Choice ◽

Test Format ◽

Constructed Response

Download Full-text

Performance of a Proposed Method for the Linking of Mixed Format Tests With Constructed Response and Multiple Choice Items

Journal of Educational Measurement ◽

10.1111/j.1745-3984.2000.tb01090.x ◽

2000 ◽

Vol 37 (4) ◽

pp. 329-346 ◽

Cited By ~ 21

Author(s):

Richard Tate

Keyword(s):

Multiple Choice ◽

Constructed Response ◽

Multiple Choice Items ◽

Mixed Format ◽

Mixed Format Tests

Download Full-text

On Minimizing Guessing Effects on Multiple-Choice Items: Superiority of a two solutions and three distractors item format to a one solution and five distractors item format

International Journal of Selection and Assessment ◽

10.1111/j.1468-2389.2010.00493.x ◽

2010 ◽

Vol 18 (1) ◽

pp. 111-115 ◽

Cited By ~ 11

Author(s):

Klaus D. Kubinger ◽

Stefana Holocher-Ertl ◽

Manuel Reif ◽

Christine Hohensinn ◽

Martina Frebort

Keyword(s):

Multiple Choice ◽

Item Format ◽

Multiple Choice Items

Download Full-text

Gender Differences on High School Science Achievement Tests: Do Format and Content Matter?

Educational Evaluation and Policy Analysis ◽

10.3102/01623737020003179 ◽

1998 ◽

Vol 20 (3) ◽

pp. 179-195 ◽

Cited By ~ 18

Author(s):

Laura S. Hamilton

Keyword(s):

Gender Differences ◽

Extracurricular Activities ◽

Science Achievement ◽

Large Scale ◽

Multiple Choice ◽

School Science ◽

Group Differences ◽

High School Science ◽

Constructed Response ◽

Science Tests

Gender differences on the NELS:88 multiple-choice and constructed-response science tests were explored through a combination of statistical analyses and interviews. Performance gaps between males and females varied across formats (multiple-choice versus constructed-response) and across items within a format. Differences were largest for items that involved visual content and called on application of knowledge commonly acquired through extracurricular activities. Large-scale surveys such as NELS:88 are widely used by researchers to study the effects of various student and school characteristics on achievement. The results of this investigation reveal the value of studying the validity of the outcome measure and suggest that conclusions about group differences and about correlates of achievement depend heavily on specific features of the items that make up the test.

Download Full-text

Gender Differences in Performance on Multiple-Choice and Constructed Response Mathematics Items

Applied Measurement in Education ◽

10.1207/s15324818ame1201_3 ◽

1999 ◽

Vol 12 (1) ◽

pp. 29-51 ◽

Cited By ~ 33

Author(s):

Mary Gamer ◽

George Engelhard Jr.

Keyword(s):

Gender Differences ◽

Multiple Choice ◽

Constructed Response

Download Full-text

An Empirical Investigation of the Fairness of Multiple-Choice Items Relative to Constructed-Response Items on Tests of Students’ Mastery of Course Content

Journal of Education, Teaching and Social Studies ◽

10.22158/jetss.v2n4p16 ◽

2020 ◽

Vol 2 (4) ◽

pp. p16

Author(s):

Michael Joseph Wise

Keyword(s):

Test Item ◽

Empirical Investigation ◽

Multiple Choice ◽

Constructed Response ◽

Course Content ◽

Content Domain ◽

Multiple Choice Items ◽

Biology Courses ◽

Highly Correlated

Although many instructors prefer multiple-choice (MC) items due to their convenience and objectivity, many others eschew their use due to concerns that they are less fair than constructed response (CR) items at evaluating student mastery of course content. To address three common unfairness concerns, I analyzed performance on MC and CR items from tests within nine sections of five different biology courses I taught over a five-year period. In all nine sections, students’ scores on MC items were highly correlated with their scores on CR items (overall r = 0.90), suggesting that MC and CR items quantified mastery of content in an essentially equivalent manner—at least to the extent that students’ relative rankings depended very little on the type of test item. In addition, there was no evidence that any students were unfairly disadvantaged on MC items (relative to their performance on CR items) due to poor guessing abilities. Finally, there was no evidence that females were unfairly assessed by MC items, as they scored 4% higher on average than males on both MC and CR items. Overall, there was no evidence that MC items were any less fair than CR items testing within the same content domain.

Download Full-text