Distorting Value Added: The Use of Longitudinal, Vertically Scaled Student Achievement Data for Growth-Based, Value-Added Accountability

Longitudinal, student performance-based, value-added accountability models have become popular of late and continue to enjoy increasing popularity. Such models require student data to be vertically scaled across wide grade and developmental ranges so that the value added to student growth/achievement by teachers, schools, and districts may be modeled in an accurate manner. Many assessment companies provide such vertical scales and claim that those scales are adequate for longitudinal value-added modeling. However, psychometricians tend to agree that scales spanning wide grade/developmental ranges also span wide content ranges, and that scores cannot be considered exchangeable along the various portions of the scale. This shift in the constructs being measured from grade to grade jeopardizes the validity of inferences made from longitudinal value-added models. This study demonstrates mathematically that the use of such “construct-shifting” vertical scales in longitudinal, value-added models introduces remarkable distortions in the value-added estimates of the majority of educators. These distortions include (a) identification of effective teachers/schools as ineffective (and vice versa) simply because their students’ achievement is outside the developmental range measured well by “appropriate” grade-level tests, and (b) the attribution of prior teacher/school effects to later teachers/schools. Therefore, theories, models, policies, rewards, and sanctions based upon such value-added estimates are likely to be invalid because of distorted conclusions about educator effectiveness in eliciting student growth. This study identifies highly restrictive scenarios in which current value-added models can be validly applied in high-stakes and low-stakes research uses. This article further identifies one use of student achievement data for growth-based, value-added modeling that is not plagued by the problems of construct shift: the assessment of an upper grade content (e.g., fourth grade) in both the grade below and the appropriate grade to obtain a measure of student gain on a grade-specific mix of constructs. Directions for future research on methods to alleviate the problems of construct shift are identified as well.

Download Full-text

Rethinking Teacher Evaluation: A Conversation about Statistical Inferences and Value-Added Models

Harvard Educational Review ◽

10.17763/haer.83.2.m32hk8q851u752h0 ◽

2013 ◽

Vol 83 (2) ◽

pp. 349-370 ◽

Cited By ~ 5

Author(s):

Kimberlee Callister Everson ◽

Erika Feinauer ◽

Richard Sudweeks

Keyword(s):

Teacher Evaluation ◽

Teacher Effectiveness ◽

Value Added ◽

Statistical Methodology ◽

Current Standard ◽

Accountability System ◽

Statistical Inferences ◽

Value Added Models ◽

Value Added Modeling

In this article, the authors provide a methodological critique of the current standard of value-added modeling forwarded in educational policy contexts as a means of measuring teacher effectiveness. Conventional value-added estimates of teacher quality are attempts to determine to what degree a teacher would theoretically contribute, on average, to the test score gains of any student in the accountability population (i.e., district or state). Everson, Feinauer, and Sudweeks suggest an alternative statistical methodology, propensity score matching, which allows estimation of how well a teacher performs relative to teachers assigned comparable classes of students. This approach more closely fits the appropriate role of an accountability system: to estimate how well employees perform in the job to which they are actually assigned. It also has the benefit of requiring fewer statistical assumptions—assumptions that are frequently violated in value-added modeling. The authors conclude that this alternative method allows for more appropriate and policy-relevant inferences about the performance of teachers.

Download Full-text

Does Student Sorting Invalidate Value-Added Models of Teacher Effectiveness? An Extended Analysis of the Rothstein Critique

Education Finance and Policy ◽

10.1162/edfp_a_00027 ◽

2011 ◽

Vol 6 (1) ◽

pp. 18-42 ◽

Cited By ~ 66

Author(s):

Cory Koedel ◽

Julian R. Betts

Keyword(s):

Teacher Effectiveness ◽

Value Added ◽

Classroom Teaching ◽

First Year ◽

Value Added Models ◽

Value Added Model ◽

Extended Analysis ◽

Research Questions ◽

Value Added Modeling ◽

Student Sorting

Value-added modeling continues to gain traction as a tool for measuring teacher performance. However, recent research questions the validity of the value-added approach by showing that it does not mitigate student-teacher sorting bias (its presumed primary benefit). Our study explores this critique in more detail. Although we find that estimated teacher effects from some value-added models are severely biased, we also show that a sufficiently complex value-added model that evaluates teachers over multiple years reduces the sorting bias problem to statistical insignificance. One implication of our findings is that data from the first year or two of classroom teaching for novice teachers may be insufficient to make reliable judgments about quality. Overall, our results suggest that in some cases value-added modeling will continue to provide useful information about the effectiveness of educational inputs.

Download Full-text

Escuelas de Alto y Bajo Valor Añadido. Perfiles Diferenciales de las Secundarias en Baja California

Education Policy Analysis Archives ◽

10.14507/epaa.v23.1917 ◽

2015 ◽

Vol 23 ◽

pp. 67 ◽

Cited By ~ 1

Author(s):

Maria Castro-Morera ◽

Adán Moisés García-Medina ◽

Luis Horacio Pedroza-Zuñiga ◽

Joaquín Caso-Niebla

Keyword(s):

Student Performance ◽

Baja California ◽

Value Added ◽

Background Characteristics ◽

Math Teachers ◽

Cultural Conditions ◽

Expected Performance ◽

Value Added Models ◽

Value Add ◽

The Many

Contextual value-added models have been serving as a tool for the development of evaluation systems based on accountability. However, the study of school’s contribution after controlling for background characteristics related to the socio-cultural level of students and schools (technically called residuals) can also serve to identify and study the good educational practices of those centers that promote student performance more beyond that expected for their socio-cultural conditions starting. The aim of this article is differentially describe the characteristics of these secondary schools in the state of Baja California (Mexico). For this we have used the information from the instruments that belong to the Estrategia Evaluativa Integral for 2010 and 2011. The results show that schools that promote beyond the expected performance of their students are characterized among students there are higher levels democratic coexistence, sense of belonging, self-reliance and school expectations; likewise, Math teachers give them more support when students have doubts and offer them a richer feedback. Surprisingly, high and low value-add schools do not differ in the type of activities that teachers do with students of low achievement. This article is a good illustration of the many uses that can be given to the contextual value-added models.

Download Full-text

Linking Instruction and Student Achievement. A research design for a new generation of classroom studies

Acta Didactica Norge ◽

10.5617/adno.4729 ◽

2017 ◽

Vol 11 (3) ◽

pp. 10 ◽

Cited By ~ 5

Author(s):

Kirsti Klette ◽

Marte Blikstad-Balas ◽

Astrid Roe

Keyword(s):

Student Achievement ◽

Large Scale ◽

Classroom Practices ◽

Instructional Quality ◽

Theoretical Background ◽

Video Data ◽

Data Sources ◽

Future Research ◽

Ongoing Research ◽

Achievement Data

AbstractEducational research into instructional quality would benefit from macro- and meso-level instructional data – such as achievement data or large-scale student surveys – in relation to data from the micro level – such as detailed analyses of classroom practices. Several scholars have specifically asked for studies that correlate achievement data with records of learning processes and teaching strategies, and ongoing projects attempting to do so have shown promising results. Linking different data sources on instructional quality is quite demanding because it requires a concerted effort by researchers from different fields of expertise and different traditions. A main ambition of our ongoing research project is precisely to advance such integration. As the title of the project reveals, we are dedicated to Linking Instruction and Student Achievement (LISA). In this article, we start by providing a theoretical background and status of knowledge related to instructional quality. We go on to argue that video data has shown particular promise in studies aiming to obtain systematic data from a range of classrooms in order to compare classroom practices. We then present the three components of the LISA project’s design – student perception surveys, systematic classroom observation, and achievement gains in national tests – and the value of combining these three data sources. Finally, we will outline some of our findings thus far and point to future research possibilities.Key words: instructional quality; classroom practices; video studies; mathematics; language arts Å koble undervisning med elevprestasjoner - Forskningsdesign for en ny generasjon klasseromsstudierSammendragFor å studere undervisningskvalitet vil det være en fordel å kombinere data fra et makro og meso- nivå med detaljerte studier av hva som skjer i klasserommet. Flere har etterlyst studier som ser på sammenhenger mellom målbar faglig fremgang og lærerens undervisning. Å få til slike studier er krevende, da det forutsetter et tett samarbeid mellom forskere fra ulike felt med ulik ekspertise innenfor nokså ulike forskningstradisjoner. En hovedambisjon i vårt pågående forskningsprosjekt er nettopp å få til en slik integrasjon. Som tittelen avslører, er vi dedikert til «Linking Instruction and Student Achievement (LISA)». I denne artikkelen presenterer vi det teoretiske og empiriske grunnlaget knyttet til undervisningskvalitet. Videre argumenterer vi for verdien av videodata i studier som sammenligner undervisningspraksiser fra ulike klasserom på en systematisk måte. Deretter presenterer vi de tre datakildene i LISA-prosjektets forskningsdesign – spørreskjemaer til elever om deres oppfatninger om lærerens undervisning, systematiske klasseromsobservasjoner, og målt fremgang på nasjonale prøver i lesing og regning. Verdien av å kombinere nettopp disse tre datakildene vil også bli diskutert. Avslutningsvis deler vi noen av våre tidlige forskningsfunn.Nøkkelord: undervisningskvalitet; klasseromspraksis; video studier; matematikk; norskfaget

Download Full-text

The effect of school spending on student achievement: addressing biases in value‐added models

Journal of the Royal Statistical Society Series A (Statistics in Society) ◽

10.1111/rssa.12304 ◽

2017 ◽

Vol 181 (2) ◽

pp. 487-515 ◽

Cited By ~ 3

Author(s):

Cheti Nicoletti ◽

Birgitta Rabe

Keyword(s):

Student Achievement ◽

Value Added ◽

Value Added Models ◽

School Spending

Download Full-text

School Reform: America’s Winchester Mystery House

International Journal of Education Policy and Leadership ◽

10.22230/ijepl.2016v11n4a665 ◽

2016 ◽

Vol 11 (4) ◽

Author(s):

David Daniel Meyer ◽

Loredana Werth

Keyword(s):

Student Achievement ◽

Student Performance ◽

Science Achievement ◽

International Student ◽

Rank Order ◽

Achievement Tests ◽

Data Set ◽

Global Competitiveness ◽

Achievement Data ◽

National Competitiveness

This quantitative study examines the correlation between international student achievement test outcomes and national competitiveness rankings. Student achievement data are derived from a variation-adjusted, common metric data set for 74 countries that have participated in any of the international mathematics and science achievement tests since 1964. National competitiveness data are taken from the 2014-15 Global Competitiveness Index (GCI) published by the World Economic Forum. A Spearman’s rank-order correlation was run to assess the relationship between student performance on international achievement tests and the competitiveness of nations. For all nations, there was a moderate positive correlation between student performance on international achievement tests and the competitiveness of a nation, rs(98)=0.688, p

Download Full-text

A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling

Teachers College Record ◽

10.1177/016146811311501202 ◽

2013 ◽

Vol 115 (12) ◽

pp. 1-35

Author(s):

Stuart S. Yeh

Keyword(s):

Student Achievement ◽

Large Scale ◽

Reliability And Validity ◽

Value Added ◽

Cost Effective ◽

Large Numbers ◽

Cost Effective Approach ◽

Need To Evaluate ◽

Value Added Modeling ◽

Lifetime Earnings

Background In principle, value-added modeling (VAM) might be justified if it can be shown to be a more reliable indicator of teacher quality than existing indicators for existing low-stakes decisions that are already being made, such as the award of small merit bonuses. However, a growing number of researchers now advocate the use of VAM to identify and replace large numbers of low-performing teachers. There is a need to evaluate these proposals because the active termination of large numbers of teachers based on VAM requires a much higher standard of reliability and validity. Furthermore, these proposals must be evaluated to determine if they are cost-effective compared to alternative proposals for raising student achievement. While VAM might be justified as a replacement for existing indicators (for existing decisions regarding merit compensation), it might not meet the higher standard of reliability and validity required for large-scale teacher termination, and it may not be the most cost-effective approach for raising student achievement. If society devotes its resources to approaches that are not cost-effective, the increase in achievement per dollar of resources expended will remain low, inhibiting reduction of the achievement gap. Objective This article reviews literature regarding the reliability and validity of VAM, then focuses on an evaluation of a proposal by Chetty, Friedman, and Rockoff to use VAM to identify and replace the lowest-performing 5% of teachers with average teachers. Chetty et al. estimate that implementation of this proposal would increase the achievement and lifetime earnings of students. The results appear likely to accelerate the adoption of VAM by school districts nationwide. The objective of the current article is to evaluate the Chetty et al. proposal and the strategy of raising student achievement by using VAM to identify and replace low-performing teachers. Method This article analyzes the assumptions of the Chetty et al. study and the assumptions of similar VAM-based proposals to raise student achievement. This analysis establishes a basis for evaluating the Chetty et al. proposal and, in general, a basis for evaluating all VAM-based policies to raise achievement. Conclusion VAM is not reliable or valid, and VAM-based polices are not cost-effective for the purpose of raising student achievement and increasing earnings by terminating large numbers of low-performing teachers.

Download Full-text

Teacher Effects, Value-Added Models, and Accountability

Teachers College Record ◽

10.1177/016146811411600109 ◽

2014 ◽

Vol 116 (1) ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

Spyros Konstantopoulos

Keyword(s):

Student Achievement ◽

Empirical Evidence ◽

Statistical Models ◽

Value Added ◽

Unique Contribution ◽

Teacher Effects ◽

Achievement Gains ◽

Value Added Models ◽

The Stability ◽

Value Added Measures

Background In the last decade, the effects of teachers on student performance (typically manifested as state-wide standardized tests) have been re-examined using statistical models that are known as value-added models. These statistical models aim to compute the unique contribution of the teachers in promoting student achievement gains from grade to grade, net of student background and prior ability. Value-added models are widely used nowadays and they are used by some states to rank teachers. These models are used to measure teacher performance or effectiveness (via student achievement gains), with the ultimate objective of rewarding or penalizing teachers. Such practices have resulted in a large amount of controversy in the education community about the role of value-added models in the process of making important decisions about teachers such as salary increases, promotion, or termination of employment. Purpose The purpose of this paper is to review the effects teachers have on student achievement, with an emphasis on value-added models. The paper also discusses whether value-added models are appropriately used as a sole indicator in evaluating teachers’ performance and making critical decisions about teachers’ futures in the profession. Research Design This is a narrative review of the literature on teacher effects that includes evidence about the stability of teacher effects using value-added models. Conclusions More comprehensive systems for teacher evaluation are needed. We need more research on value-added models and more work on evaluating value-added models. The strengths and weaknesses of these models should be clearly described. We also need much more empirical evidence with respect to the reliability and the stability of value-added measures across different states. The findings thus far do not seem robust and conclusive enough to warrant decisions about raises, tenure, or termination of employment. In other words, it is unclear that the value-added measures that inform the accountability system are adequate. It is not obvious that we are better equipped now to make such important decisions about teachers than we were 35 years ago. Good et al. have argued that we need well-thought-out and well-developed criteria that guide accountability decisions. Perhaps such criteria should be standardized across school districts and states. That would ensure that empirical evidence across different states is comparable and would help determine whether findings converge or diverge.

Download Full-text

Putting Growth and Value-Added Models on the Map: A National Overview

Teachers College Record ◽

10.1177/016146811411600106 ◽

2014 ◽

Vol 116 (1) ◽

pp. 1-32 ◽

Cited By ~ 1

Author(s):

Clarin Collins ◽

Audrey Amrein-Beardsley

Keyword(s):

Student Performance ◽

English Language ◽

Data Use ◽

Value Added ◽

Language Arts Teachers ◽

Value Added Models ◽

Value Added Model ◽

Arts Teachers ◽

Teacher Evaluation Systems ◽

National Growth

Background Within the last few years, the focus on educational accountability has shifted from holding students responsible for their own performance to holding those shown to impact student performance responsible—students’ teachers. Encouraged and financially incentivized by federal programs, states are becoming ever more reliant on statistical models used to measure students’ growth or value added and are attributing such growth (or decline) to students’ teachers of record. As states continue to join the growth and value-added model movement, it is difficult to find inclusive resources documenting the types of models used and plans for each state. Objective To capture state initiatives in this area, researchers collected data from all 50 states and the District of Columbia to provide others with an inclusive national growth and value-added model overview. Data yielded include information about the types of growth or value-added models used in each state, the legislature behind each state's reform efforts, the standardized tests used for growth or value-added calculations, and the strengths and weaknesses of each state's models as described by state personnel. Method This article synthesizes qualitative and quantitative themes as identified from data collected via multiple phone interviews and emails with state department of education personnel in charge of their own state's initiatives in this area, as well as state websites. These data provide the most inclusive and up-to-date resource on national growth and value-added data usage, noting however that this is changing rapidly across the nation, given adjustments in policies, pieces of legislation, and the like. Conclusions Findings from this study provide a one-stop resource on what each state has in place or in development regarding growth or value-added model use as a key component of its state-based teacher evaluation systems. Despite widespread use, however, not one state has yet articulated a plan for formative data use by teachers. Federal and state leaders seem to assume that implementing growth and value-added models leads to simultaneous data use by teachers. In addition, state representatives expressed concern that the current emphasis on growth and value-added models could be applied to only math and English/language arts teachers with state standardized assessments (approximately 30% of all teachers). While some believe the implementation of the Common Core State Standards and its associated tests will help to alleviate such issues with fairness, more research is needed surrounding (the lack of) fairness and formative use associated with growth and value-added models.

Download Full-text

Contradictions Resolved: An Analysis of Two Theories of the Achievement Gap

Teachers College Record ◽

10.1177/016146811711900603 ◽

2017 ◽

Vol 119 (6) ◽

pp. 1-42

Author(s):

Stuart S. Yeh

Keyword(s):

Student Achievement ◽

Achievement Gap ◽

Student Performance ◽

School Quality ◽

Reliability And Validity ◽

Value Added ◽

Theoretical Explanation ◽

Structural Factors ◽

Nationally Representative ◽

K 12

Background Value-added modeling (VAM) has been used to rank teachers and assess teacher and school quality. The apparent relationship between value-added teacher rankings and gains in student performance provide a foundation for the view that the contribution of teachers to student performance is the largest factor influencing student achievement, suggesting that differences in teacher quality might explain the persistence of the gap in student achievement as students advance throughout the K–12 years. However, several studies raise questions about the reliability and validity of VAM. Purpose The purpose of this article is to reconcile the evidence that the contribution of teachers to student achievement is large with the evidence that value-added rankings are unreliable and possibly invalid. Design The method involves an analytical review of the available evidence, development of a theoretical explanation for the contradictory results, and a test of this explanation using path analysis with three longitudinal datasets involving nationally representative samples of schools and students. Conclusion The hypothesis that the contribution of teachers to student performance is the strongest factor influencing student achievement is not supported. A stronger factor is the degree to which students believe that they are proficient students. This is consistent with the view that the persistence of the achievement gap is better explained as the outcome of structural factors embedded in the conventional model of schooling that undermines the self-efficacy, engagement, effort, and achievement of students who enter kindergarten performing below the level of their more advantaged peers.

Download Full-text