Are multiple-choice items unfair? And if so, for whom?

2019 ◽  
Vol 18 (3) ◽  
pp. 198-217
Author(s):  
Christin Siegfried ◽  
Eveline Wuttke

Due to their test economy and objective evaluability, multiple-choice items are used much more frequently to test knowledge than constructed-response questions. However, studies point out that dependencies may exist between the individual test result and the test format (multiple-choice or constructed-response). Studies testing economic knowledge (one dimension of economic competence) are using mainly multiple-choice items and indicate gender-specific performance in the corresponding tests in favour of male test-takers. As an explanation for these “gender differences” gender-specific affinities and differences in cognitive abilities are mentioned. Moreover, the test format itself is mentioned but has hardly been investigated in detail to date. In order to answer the question to what extent students test performance depends on the item format, we test economic knowledge using two test formats (constructed-response and multiple-choice), but with the same content. Results from 201 business and business education students show that the usage of constructed-response items can compensate for existing gender differences in 53% of all cases. This underlines that no general, gender-specific advantage or disadvantage can be assumed in relation to the item format. However, the mixed use of constructed-response and multiple-choice items seem promising to compensate for potential gender differences.

2018 ◽  
Vol 47 (5) ◽  
pp. 284-294 ◽  
Author(s):  
Sean F. Reardon ◽  
Demetra Kalogrides ◽  
Erin M. Fahle ◽  
Anne Podolsky ◽  
Rosalía C. Zárate

Prior research suggests that males outperform females, on average, on multiple-choice items compared to their relative performance on constructed-response items. This paper characterizes the extent to which gender achievement gaps on state accountability tests across the United States are associated with those tests’ item formats. Using roughly 8 million fourth- and eighth-grade students’ scores on state assessments, we estimate state- and district-level math and reading male-female achievement gaps. We find that the estimated gaps are strongly associated with the proportions of the test scores based on multiple-choice and constructed-response questions on state accountability tests, even when controlling for gender achievement gaps as measured by the National Assessment of Educational Progress (NAEP) or Northwest Evaluation Association (NWEA) Measures of Academic Progress (MAP) assessments, which have the same item format across states. We find that test item format explains approximately 25% of the variation in gender achievement gaps among states.


1998 ◽  
Vol 20 (3) ◽  
pp. 179-195 ◽  
Author(s):  
Laura S. Hamilton

Gender differences on the NELS:88 multiple-choice and constructed-response science tests were explored through a combination of statistical analyses and interviews. Performance gaps between males and females varied across formats (multiple-choice versus constructed-response) and across items within a format. Differences were largest for items that involved visual content and called on application of knowledge commonly acquired through extracurricular activities. Large-scale surveys such as NELS:88 are widely used by researchers to study the effects of various student and school characteristics on achievement. The results of this investigation reveal the value of studying the validity of the outcome measure and suggest that conclusions about group differences and about correlates of achievement depend heavily on specific features of the items that make up the test.


2020 ◽  
Vol 2 (4) ◽  
pp. p16
Author(s):  
Michael Joseph Wise

Although many instructors prefer multiple-choice (MC) items due to their convenience and objectivity, many others eschew their use due to concerns that they are less fair than constructed response (CR) items at evaluating student mastery of course content. To address three common unfairness concerns, I analyzed performance on MC and CR items from tests within nine sections of five different biology courses I taught over a five-year period. In all nine sections, students’ scores on MC items were highly correlated with their scores on CR items (overall r = 0.90), suggesting that MC and CR items quantified mastery of content in an essentially equivalent manner—at least to the extent that students’ relative rankings depended very little on the type of test item. In addition, there was no evidence that any students were unfairly disadvantaged on MC items (relative to their performance on CR items) due to poor guessing abilities. Finally, there was no evidence that females were unfairly assessed by MC items, as they scored 4% higher on average than males on both MC and CR items. Overall, there was no evidence that MC items were any less fair than CR items testing within the same content domain.


Sign in / Sign up

Export Citation Format

Share Document