Commentary on Reardon, Kalogrides, and Ho’s “Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale”

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620948267 ◽

2020 ◽

pp. 107699862094826

Author(s):

Daniel Bolt

Keyword(s):

Predictive Accuracy ◽

State Level ◽

State Assessments ◽

Aggregate Level ◽

National Assessment ◽

Common Scale ◽

Scale Linking ◽

Bias Evaluation ◽

Preliminary Support

The studies presented by Reardon, Kalogrides, and Ho provide preliminary support for a National Assessment of Educational Progress–based aggregate linking of state assessments when used for research purposes. In this commentary, I suggest future efforts to explore possible sources of district-level bias, evaluation of predictive accuracy at the state level, and a better understanding of the performance of the linking when applied to the inevitable nonrepresentative district samples that will be encountered in research studies.

Download Full-text

Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998619874089 ◽

2019 ◽

pp. 107699861987408 ◽

Cited By ~ 3

Author(s):

Sean F. Reardon ◽

Demetra Kalogrides ◽

Andrew D. Ho

Keyword(s):

School District ◽

Test Score ◽

Standard Errors ◽

Aggregate Level ◽

National Assessment ◽

Random Effects Models ◽

Validation Methods ◽

Common Scale ◽

Scale Linking

Linking score scales across different tests is considered speculative and fraught, even at the aggregate level. We introduce and illustrate validation methods for aggregate linkages, using the challenge of linking U.S. school district average test scores across states as a motivating example. We show that aggregate linkages can be validated both directly and indirectly under certain conditions such as when the scores for at least some target units (districts) are available on a common test (e.g., the National Assessment of Educational Progress). We introduce precision-adjusted random effects models to estimate linking error, for populations and for subpopulations, for averages and for progress over time. These models allow us to distinguish linking error from sampling variability and illustrate how linking error plays a larger role in aggregates with smaller sample sizes. Assuming that target districts generalize to the full population of districts, we can show that standard errors for district means are generally less than .2 standard deviation units, leading to reliabilities above .7 for roughly 90% of districts. We also show how sources of imprecision and linking error contribute to both within- and between-state district comparisons within versus between states. This approach is applicable whenever the essential counterfactual question—“what would means/variance/progress for the aggregate units be, had students taken the other test?”—can be answered directly for at least some of the units.

Download Full-text

Commentary on “Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale”

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620956668 ◽

2020 ◽

pp. 107699862095666

Author(s):

Alina A. von Davier

Keyword(s):

School District ◽

Test Score ◽

Applied Research ◽

Aggregate Level ◽

Validation Methods ◽

Common Scale ◽

Scale Linking ◽

Score Distributions

In this commentary, I share my perspective on the goals of assessments in general, on linking assessments that were developed according to different specifications and for different purposes, and I propose several considerations for the authors and the readers. This brief commentary is structured around three perspectives (1) the context of this research, (2) the methodology proposed here, and (3) the consequences for applied research.

Download Full-text

Aggregate-Level Test-Scale Linking: A New Solution for an Old Problem?

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620960089 ◽

2020 ◽

pp. 107699862096008

Author(s):

Tim Moses ◽

Neil J. Dorans

Keyword(s):

Data Structures ◽

Test Scores ◽

District Level ◽

State Assessments ◽

Aggregate Level ◽

Educational Progress ◽

National Assessment ◽

Validation Methods ◽

Scale Linking ◽

Level Scale

The Reardon, Kalogrides, and Ho article on validation methods for aggregate-level test scale linking is an attempt to validate a district-level scale aligning procedure that appears to be a new solution to an old problem. Their aligning procedure uses the National Assessment of Educational Progress (NAEP) scale to piece together a patchwork of data structures from different tests of different constructs obtained under different administration conditions and used in different ways by different states. In this article, we critique their linking and validation efforts. Our critique has three components. First, we review the recommendations for linking state assessments to NAEP from several studies and commentaries to provide background from which to interpret Reardon et al.’s validation attempts. Second, we provide a replication of the Reardon et al. empirical validations of its proposed linking procedure to demonstrate that correlations between district means on two test scores can be high even when (1) the constructs being measured by the tests are different and (2) the district-level means estimated using the Reardon et al. linking approach can differ substantially from actual district-level means. Then, we suggest additional checks for construct similarity and subpopulation invariance from other concordance studies that could be used to assess whether the inferences made by Reardon et al. are warranted. Finally, until such checks are made, we urge cautious use of the results of the Reardon et al. results.

Download Full-text

Commentary on “Validation Methods for Aggregate-Level Test Scale Linking: A Case Study Mapping School District Test Score Distributions to a Common Scale”

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620949172 ◽

2020 ◽

pp. 107699862094917

Author(s):

Mark L. Davison

Keyword(s):

School District ◽

Test Score ◽

Full Range ◽

Limited Range ◽

Aggregate Level ◽

Validity Data ◽

Level Data ◽

Common Scale ◽

Scale Linking

This paper begins by setting the linking methods of Reardon, Kalogrides, and Ho in the broader literature on linking. Trends in the validity data suggest that there may be a conditional bias in the estimates of district means, but the data in the article are not conclusive on this point. Further, the data used in their case study might support the validity of the methods only over a limited range of the ability continuum. Applications of the method are then discussed. Contrary to the title, the application of the linking results is not limited to aggregate-level data. Because the potential application is so broad, further research is needed on issues such as the possibility of conditional bias and the validity of estimates over the full range of possible values. Validity is not a dichotomous concept where validity exists or it does not. The evidence reported by Reardon et al. provides substantial, but incomplete, support for the validity of the linked measures in this case study.

Download Full-text

National Assessment of Title I Shows Student Progress On State Assessments; Little Progress on NAEP

PsycEXTRA Dataset ◽

10.1037/e433162005-001 ◽

2001 ◽

Author(s):

Keyword(s):

Title I ◽

State Assessments ◽

National Assessment ◽

Student Progress

Download Full-text

A Preregistered Validation Study of the Taylor Aggression Paradigm

10.31234/osf.io/8jgca ◽

2017 ◽

Cited By ~ 1

Author(s):

David Skylan Chester

Keyword(s):

Aggressive Behavior ◽

Validation Study ◽

Real World ◽

External Validity ◽

State Level ◽

Taylor Aggression Paradigm ◽

Psychological Science ◽

Laboratory Measure ◽

Behavioral Tasks ◽

Preliminary Support

The Taylor Aggression Paradigm (TAP) is a frequently-used laboratory measure of aggressive behavior. However, the flexibility inherent in its implementation and analysis can undermine its validity. To test whether the TAP was a valid aggression measure irrespective of this flexibility, I conducted a preregistered study of a 25-trial version of the TAP using a single scoring approach with 160 diverse undergraduate participants. TAP scores showed agreement with other laboratory aggression measures and were magnified by an experimental provocation manipulation. Mixed evidence was found for associations with aggressive dispositions and real-world violence. These results provide preliminary support for this approach to the TAP to measure state-level aggressive behavior. However, more evidence is needed to assess the TAP’s external validity and ability to measure dispositional forms of aggression. Using preregistered designs, researchers should validate specific variants of their behavioral tasks in order to optimize the veridicality and reproducibility of psychological science.

Download Full-text

Analytical and numerical assessment of a preliminary support design – a case study

Cogent Engineering ◽

10.1080/23311916.2020.1869367 ◽

2021 ◽

Vol 8 (1) ◽

pp. 1869367

Author(s):

Sylvanus Sebbeh-Newton ◽

Shaib Abdulazeez Shehu ◽

Prosper Ayawah ◽

Azupuri A. Kaba ◽

Hareyani Zabidi

Keyword(s):

Support Design ◽

Numerical Assessment ◽

Preliminary Support

Download Full-text

Local law enforcement terrorism prevention efforts: A state level case study

Journal of Criminal Justice ◽

10.1016/j.jcrimjus.2007.03.007 ◽

2007 ◽

Vol 35 (3) ◽

pp. 313-321 ◽

Cited By ~ 26

Author(s):

William V. Pelfrey

Keyword(s):

Law Enforcement ◽

State Level ◽

Local Law Enforcement ◽

Local Law ◽

Terrorism Prevention

Download Full-text

National assessment of arsenic within groundwater: A case study with Ireland

Arsenic in the Environment - Proceedings - Arsenic Research and Global Sustainability ◽

10.1201/b20466-16 ◽

2016 ◽

pp. 33-34

Author(s):

L Morrison ◽

E McGrory ◽

C Brown

Keyword(s):

National Assessment

Download Full-text

Cardiovascular Risk Detection Through Big Data Analysis

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.2020070101 ◽

2020 ◽

Vol 5 (2) ◽

pp. 1-11

Author(s):

Miguel A. Sánchez-Acevedo ◽

Zaydi Anaí Acosta-Chi ◽

Ma. del Rocío Morales-Salgado

Keyword(s):

Nearest Neighbor ◽

State Level ◽

Mortality Data ◽

K Nearest Neighbor ◽

Web Tool ◽

Health And Nutrition ◽

Risk Detection ◽

Unhealthy Diet ◽

The Cost

Cardiovascular diseases are the main cause of mortality in the world. As more people suffer from diabetes and hypertension, the risk of cardiovascular disease (CVD) increases. A sedentary lifestyle, an unhealthy diet, and stressful activities are behaviors that can be changed to prevent CVD. Taking measures to prevent CVD lowers the cost of treatments and reduces mortality. Data-driven plans generate more effective results and can be applied to groups with similar characteristics. Currently, there are several databases that can be used to extract information in real time and improve decision making. This article proposes a methodology for the detection of CVD and a web tool to analyze the data more effectively. The methodology for extracting, describing, and visualizing data from a state-level case study of CVD in Mexico is presented. The data is obtained from the databases of the National Institute of Statistics and Geography (INEGI) and the National Survey of Health and Nutrition (ENSANUT). A k-nearest neighbor (KNN) algorithm is proposed to predict missing data.

Download Full-text