The Relationship Between Test Item Format and Gender Achievement Gaps on Math and ELA Tests in Fourth and Eighth Grades

Prior research suggests that males outperform females, on average, on multiple-choice items compared to their relative performance on constructed-response items. This paper characterizes the extent to which gender achievement gaps on state accountability tests across the United States are associated with those tests’ item formats. Using roughly 8 million fourth- and eighth-grade students’ scores on state assessments, we estimate state- and district-level math and reading male-female achievement gaps. We find that the estimated gaps are strongly associated with the proportions of the test scores based on multiple-choice and constructed-response questions on state accountability tests, even when controlling for gender achievement gaps as measured by the National Assessment of Educational Progress (NAEP) or Northwest Evaluation Association (NWEA) Measures of Academic Progress (MAP) assessments, which have the same item format across states. We find that test item format explains approximately 25% of the variation in gender achievement gaps among states.

Download Full-text

An Empirical Investigation of the Fairness of Multiple-Choice Items Relative to Constructed-Response Items on Tests of Students’ Mastery of Course Content

Journal of Education, Teaching and Social Studies ◽

10.22158/jetss.v2n4p16 ◽

2020 ◽

Vol 2 (4) ◽

pp. p16

Author(s):

Michael Joseph Wise

Keyword(s):

Test Item ◽

Empirical Investigation ◽

Multiple Choice ◽

Constructed Response ◽

Course Content ◽

Content Domain ◽

Multiple Choice Items ◽

Biology Courses ◽

Highly Correlated

Although many instructors prefer multiple-choice (MC) items due to their convenience and objectivity, many others eschew their use due to concerns that they are less fair than constructed response (CR) items at evaluating student mastery of course content. To address three common unfairness concerns, I analyzed performance on MC and CR items from tests within nine sections of five different biology courses I taught over a five-year period. In all nine sections, students’ scores on MC items were highly correlated with their scores on CR items (overall r = 0.90), suggesting that MC and CR items quantified mastery of content in an essentially equivalent manner—at least to the extent that students’ relative rankings depended very little on the type of test item. In addition, there was no evidence that any students were unfairly disadvantaged on MC items (relative to their performance on CR items) due to poor guessing abilities. Finally, there was no evidence that females were unfairly assessed by MC items, as they scored 4% higher on average than males on both MC and CR items. Overall, there was no evidence that MC items were any less fair than CR items testing within the same content domain.

Download Full-text

Are multiple-choice items unfair? And if so, for whom?

Citizenship Social and Economics Education ◽

10.1177/2047173419892525 ◽

2019 ◽

Vol 18 (3) ◽

pp. 198-217

Author(s):

Christin Siegfried ◽

Eveline Wuttke

Keyword(s):

Gender Differences ◽

Test Performance ◽

Cognitive Abilities ◽

Multiple Choice ◽

Economic Knowledge ◽

Test Format ◽

Item Format ◽

Constructed Response ◽

Multiple Choice Items ◽

Gender Specific

Due to their test economy and objective evaluability, multiple-choice items are used much more frequently to test knowledge than constructed-response questions. However, studies point out that dependencies may exist between the individual test result and the test format (multiple-choice or constructed-response). Studies testing economic knowledge (one dimension of economic competence) are using mainly multiple-choice items and indicate gender-specific performance in the corresponding tests in favour of male test-takers. As an explanation for these “gender differences” gender-specific affinities and differences in cognitive abilities are mentioned. Moreover, the test format itself is mentioned but has hardly been investigated in detail to date. In order to answer the question to what extent students test performance depends on the item format, we test economic knowledge using two test formats (constructed-response and multiple-choice), but with the same content. Results from 201 business and business education students show that the usage of constructed-response items can compensate for existing gender differences in 53% of all cases. This underlines that no general, gender-specific advantage or disadvantage can be assumed in relation to the item format. However, the mixed use of constructed-response and multiple-choice items seem promising to compensate for potential gender differences.

Download Full-text

Race and Gender

The Oxford Handbook of Ethics of AI ◽

10.1093/oxfordhb/9780190067397.013.16 ◽

2020 ◽

pp. 251-269 ◽

Cited By ~ 2

Author(s):

Timnit Gebru

Keyword(s):

Machine Learning ◽

Language Processing ◽

The United States ◽

Error Rates ◽

Political Factors ◽

Recidivism Rates ◽

Race And Gender ◽

Decision Tools ◽

And Gender ◽

Technical Solutions

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.

Download Full-text

The Interplay of Developing Morality and Gender Attitudes

The Oxford Handbook of Moral Development ◽

10.1093/oxfordhb/9780190676049.013.42 ◽

2020 ◽

pp. 724-745

Author(s):

Rebecca S. Bigler ◽

Lynn S. Liben

Keyword(s):

Psychological Theory ◽

The United States ◽

Gender Nonconformity ◽

Gender Attitudes ◽

Future Research ◽

Policy And Practice ◽

Ethical Policy ◽

History Of ◽

And Gender ◽

And Behavior

Morality and gender are intersecting realms of human thought and behavior. Reasoning and action at their intersection (e.g., views of women’s rights legislation) carry important consequences for societies, communities, and individual lives. In this chapter, the authors argue that children’s developing views of morality and gender reciprocally shape one another in important and underexplored ways. The chapter begins with a brief history of psychological theory and research at the intersection of morality and gender and suggests reasons for the historical failure to view gender attitudes through moral lenses. The authors then describe reasons for expecting morality to play an important role in shaping children’s developing gender attitudes and, reciprocally, for gender attitudes to play an important role in shaping children’s developing moral values. The authors next illustrate the importance and relevance of these ideas by discussing two topics at the center of contentious debate in the United States concerning ethical policy and practice: treatment of gender nonconformity and gender-segregated schooling. The chapter concludes with suggestions for future research.

Download Full-text

Military Subsidization of Human Capital and Gender Stratification in the US Economy

Review of Radical Political Economics ◽

10.1177/0486613420982627 ◽

2021 ◽

pp. 048661342098262

Author(s):

Tyler Saxon

Keyword(s):

Human Capital ◽

The United States ◽

Jel Classification ◽

Access To Higher Education ◽

Gender Stratification ◽

Us Economy ◽

The Us ◽

And Gender ◽

Capital Development ◽

The Military

In the United States, the military is the primary channel through which many are able to obtain supports traditionally provided by the welfare state, such as access to higher education, job training, employment, health care, and so on. However, due to the nature of the military as a highly gendered institution, these social welfare functions are not as accessible for women as they are for men. This amounts to a highly gender-biased state spending pattern that subsidizes substantially more human capital development for men than for women, effectively reinforcing women’s subordinate status in the US economy. JEL classification: B54, B52, Z13

Download Full-text

Subjective cognitive decline higher among sexual and gender minorities in the United States, 2015–2018

Alzheimer s & Dementia Translational Research & Clinical Interventions ◽

10.1002/trc2.12197 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Jason D. Flatt ◽

Ethan C. Cicero ◽

Nickolas H. Lambrou ◽

Whitney Wharton ◽

Joel G. Anderson ◽

...

Keyword(s):

United States ◽

Cognitive Decline ◽

The United States ◽

Subjective Cognitive Decline ◽

Sexual And Gender Minorities ◽

Gender Minorities ◽

And Gender

Download Full-text

What Lies Beneath: The Role of Self-Efficacy, Causal Attribution Habits, and Gender in Accounting for the Success of College Students

Education Sciences ◽

10.3390/educsci11070333 ◽

2021 ◽

Vol 11 (7) ◽

pp. 333

Author(s):

Kerstin Hamann ◽

Maura A. E. Pilotti ◽

Bruce M. Wilson

Keyword(s):

General Education ◽

Self Efficacy ◽

Causal Attribution ◽

The United States ◽

Female Students ◽

Male Students ◽

Course Grades ◽

Success In Higher Education ◽

And Gender

Existing research has identified gender as a driving variable of student success in higher education: women attend college at a higher rate and are also more successful than their male peers. We build on the extant literature by asking whether specific cognitive variables (i.e., self-efficacy and causal attribution habits) distinguish male and female students with differing academic performance levels. Using a case study, we collected data from students enrolled in a general education course (sample size N = 400) at a large public university in the United States. Our findings indicate that while students’ course grades and cumulative college grades did not vary by gender, female and male students reported different self-efficacy and causal attribution habits for good grades and poor grades. To illustrate, self-efficacy for female students is broad and stretches across all their courses; in contrast, for male students, it is more limited to specific courses. These gender differences in cognition, particularly in accounting for undesirable events, may assist faculty members and advisors in understanding how students respond to difficulties and challenges.

Download Full-text

Tax Progressivity of Personal Wages and Income Inequality

Journal of Risk and Financial Management ◽

10.3390/jrfm14020060 ◽

2021 ◽

Vol 14 (2) ◽

pp. 60

Author(s):

Nikolaos Papanikolaou

Keyword(s):

Income Inequality ◽

The United States ◽

Census Bureau ◽

Tax System ◽

Race And Gender ◽

Tax Progressivity ◽

Income Data ◽

Kakwani Index ◽

And Gender ◽

Wage Income

The paper examines tax progressivity and income inequality using Census Bureau Current Population Survey (CPS) personal income data. The Kakwani index is used to derive tax progressivity for All, Male, Female, White and African American personal wage income of CPS respondents, respectively. The tax progressivity results show a tax system that is partly progressive and mostly regressive. Due to its regressive nature, the tax system did not display tax progressivity for the entire period under analysis for personal wage income respondents as well as when broken-down by race and gender in the United States for years 1996 to 2011.

Download Full-text

Are Multiple‐Choice Exams Easier for Economics Students? A Comparison of Multiple‐Choice and “Equivalent” Constructed‐Response Exam Questions

Southern Economic Journal ◽

10.1002/j.2325-8012.2002.tb00469.x ◽

2002 ◽

Vol 68 (4) ◽

pp. 957-971

Author(s):

Nixon Chan ◽

Peter E. Kennedy

Keyword(s):

Multiple Choice ◽

Constructed Response ◽

Economics Students

Download Full-text

The Portuguese Performance Assessment of Self-Care Skills Measure: Validity and Reliability

OTJR Occupation Participation and Health ◽

10.1177/15394492211021309 ◽

2021 ◽

pp. 153944922110213

Author(s):

Pedro L. Ferreira ◽

Ana L. Simões ◽

Marília Dourado ◽

Margo B. Holm ◽

Joan C. Rogers

Keyword(s):

Performance Assessment ◽

Self Care ◽

The United States ◽

Factor Analyses ◽

Validity And Reliability ◽

Portuguese Population ◽

Perfect Agreement ◽

Portuguese Version ◽

And Gender ◽

Effective Assessment

Performance Assessment of Self-Care Skills (PASS) is a performance-based scale developed in the United States. Because of cultural differences, a Portuguese version was developed, then validated in the Portuguese population and tested ensuring reliability. The objective of this study was to create and test psychometric properties of a Portuguese version of PASS. A linguistic validation on older adults with physical/cognitive disabilities enabled us to validate P-PASS. Some original tasks were changed. Data were analyzed by PASS constructs (independence-safety adequacy), age, and gender. Construct validity (known-group analyses, factor analyses), with 98 individuals yielded excellent results. Reliability between two observers for 30 participants yielded almost perfect agreement for all three constructs. Independence scores were highest, followed by safety and adequacy. Men presented greater independence, as well as participants <60 years. We obtained results comparable with the original version. Conclusion. P-PASS is valid and reliable for the Portuguese population, enabling effective assessment of function and measurement of health outcomes.

Download Full-text