TIMSS 2015: Illustrating Advancements in Large-Scale International Assessments

International large-scale assessments of student achievement such as International Association for the Evaluation of Educational Achievement’s Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study and Organization for Economic Cooperation and Development’s Program for International Student Assessment that have come to prominence over the past 25 years owe a great deal in methodological terms to pioneering work by National Assessment of Educational Progress (NAEP). Using TIMSS as an example, this article describes how a number of core techniques, such as matrix sampling, student population sampling, item response theory scaling with population modeling, and resampling methods for variance estimation, have been adapted and implemented in an international context and are fundamental to the international assessment effort. In addition to the methodological contributions of NAEP, this article illustrates how the large-scale international assessments go beyond measuring student achievement by representing important aspects of community, home, school, and classroom contexts in ways that can be used to address issues of importance to researchers and policymakers.

Download Full-text

Assessing global competence in PISA 2018: Challenges and approaches to capturing a complex construct

International Journal of Development Education and Global Learning ◽

10.18546/ijdegl.10.1.02 ◽

2018 ◽

Vol 10 (1) ◽

pp. 5-20 ◽

Cited By ~ 5

Author(s):

Christine Sälzer ◽

Nina Roczen

Keyword(s):

Student Achievement ◽

Comparative Studies ◽

Large Scale ◽

International Student ◽

Student Assessment ◽

Quality Standards ◽

Global Competence ◽

International Student Assessment ◽

Large Scale Assessments ◽

International Comparative

International large-scale assessments such as the Programme for International Student Assessment (PISA) yield comparative indicators of student achievement in various competence domains. This article focuses on global competence as a suggested cross-curricular domain for the PISA 2018 study. The measurement of global competence is related to a number of challenges, which are elaborated, described and discussed. As these challenges have so far not been sufficiently targeted, Germany, among several other countries, has decided not to assess global competence in the upcoming PISA cycle. In conclusion, propositions are made regarding viable options to capture global competence in international comparative studies so that established quality standards can be met.

Download Full-text

Are There Test Administrator Effects in Large-Scale Educational Assessments?

Methodology ◽

10.1027/1614-2241.3.4.149 ◽

2007 ◽

Vol 3 (4) ◽

pp. 149-159 ◽

Cited By ~ 9

Author(s):

Oliver Lüdtke ◽

Alexander Robitzsch ◽

Ulrich Trautwein ◽

Frauke Kreuter ◽

Jan Marten Ihme

Keyword(s):

Large Scale ◽

International Student ◽

Student Assessment ◽

Student Groups ◽

Mathematics Scores ◽

International Student Assessment ◽

Sample Attrition ◽

Educational Assessments ◽

Test Administrator ◽

Practical Implications

Abstract. In large-scale educational assessments such as the Third International Mathematics and Sciences Study (TIMSS) or the Program for International Student Assessment (PISA), sizeable numbers of test administrators (TAs) are needed to conduct the assessment sessions in the participating schools. TA training sessions are run and administration manuals are compiled with the aim of ensuring standardized, comparable, assessment situations in all student groups. To date, however, there has been no empirical investigation of the effectiveness of these standardizing efforts. In the present article, we probe for systematic TA effects on mathematics achievement and sample attrition in a student achievement study. Multilevel analyses for cross-classified data using Markov Chain Monte Carlo (MCMC) procedures were performed to separate the variance that can be attributed to differences between schools from the variance associated with TAs. After controlling for school effects, only a very small, nonsignificant proportion of the variance in mathematics scores and response behavior was attributable to the TAs (< 1%). We discuss practical implications of these findings for the deployment of TAs in educational assessments.

Download Full-text

Early tracking and different types of inequalities in achievement: difference-in-differences evidence from 20 years of large-scale assessments

Educational Assessment Evaluation and Accountability ◽

10.1007/s11092-020-09346-4 ◽

2021 ◽

Vol 33 (1) ◽

pp. 139-167

Author(s):

Andrés Strello ◽

Rolf Strietholt ◽

Isa Steinmann ◽

Charlotte Siepmann

Keyword(s):

Large Scale ◽

International Student ◽

Student Assessment ◽

Difference In Differences ◽

Seminal Paper ◽

International Student Assessment ◽

Units Of Analysis ◽

Social Achievement ◽

Mathematics And Science ◽

Combined Data

AbstractResearch to date on the effects of between-school tracking on inequalities in achievement and on performance has been inconclusive. A possible explanation is that different studies used different data, focused on different domains, and employed different measures of inequality. To address this issue, we used all accumulated data collected in the three largest international assessments—PISA (Programme for International Student Assessment), PIRLS (Progress in International Reading Literacy Study), and TIMSS (Trends in International Mathematics and Science Study)—in the past 20 years in 75 countries and regions. Following the seminal paper by Hanushek and Wößmann (2006), we combined data from a total of 21 cycles of primary and secondary school assessments to estimate difference-in-differences models for different outcome measures. We synthesized the effects using a meta-analytical approach and found strong evidence that tracking increased social achievement gaps, that it had smaller but still significant effects on dispersion inequalities, and that it had rather weak effects on educational inadequacies. In contrast, we did not find evidence that tracking increased performance levels. Besides these substantive findings, our study illustrated that the effect estimates varied considerably across the datasets used because the low number of countries as the units of analysis was a natural limitation. This finding casts doubt on the reproducibility of findings based on single international datasets and suggests that researchers should use different data sources to replicate analyses.

Download Full-text

Principals and Student Achievement

Maximizing Social Science Research Through Publicly Accessible Data Sets - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-5225-3616-1.ch005 ◽

2018 ◽

pp. 84-117 ◽

Cited By ~ 1

Author(s):

S. Marshall Perry ◽

Karen M. Sealy ◽

Héctor X. Ramírez-Pérez ◽

Thomas C. DeNicola ◽

Yair Cohen

Keyword(s):

Academic Achievement ◽

Student Achievement ◽

Instructional Leadership ◽

Teaching And Learning ◽

International Student ◽

Student Assessment ◽

School Context ◽

School Level ◽

International Student Assessment ◽

Context Specific

Connections between principal leadership activities, school context, and student achievement are examined within this paper. Data for this quantitative study are from the 2013 Teaching and Learning International Survey (TALIS) and the 2012 Programme for International Student Assessment (PISA). The eight countries of examination participated in both the TALIS and PISA and the researchers merged datasets, yielding a study sample of 1,301 schools. This paper supports a context-specific view of instructional leadership. When looking across countries, the researchers found different practices were more strongly associated with the academic achievement of students, and suggest that school leaders have a meaningful overall relationship with academic achievement, both directly and indirectly. This study therefore supports prior research about the direct and indirect effects of instructional leadership. Further study, which accounts for differences in family academic resources and school-level opportunities to learn, will better illuminate the connection between instructional leadership practices and academic achievement.

Download Full-text

Assessment Background: What PISA Measures and How

Improving a Country’s Education ◽

10.1007/978-3-030-59031-4_12 ◽

2020 ◽

pp. 249-263

Author(s):

Luisa Araújo ◽

Patrícia Costa ◽

Nuno Crato

Keyword(s):

Student Performance ◽

Large Scale ◽

International Student ◽

Student Assessment ◽

Short Description ◽

School Characteristics ◽

Technical Aspects ◽

Performance Levels ◽

International Student Assessment ◽

Large Scale Assessments

AbstractThis chapter provides a short description of what the Programme for International Student Assessment (PISA) measures and how it measures it. First, it details the concepts associated with the measurement of student performance and the concepts associated with capturing student and school characteristics and explains how they compare with some other International Large-Scale Assessments (ILSA). Second, it provides information on the assessment of reading, the main domain in PISA 2018. Third, it provides information on the technical aspects of the measurements in PISA. Lastly, it offers specific examples of PISA 2018 cognitive items, corresponding domains (mathematics, science, and reading), and related performance levels.

Download Full-text

Reflections on Analytical Choices in the Scaling Model for Test Scores in International Large-Scale Assessment Studies

10.31234/osf.io/pkjth ◽

2021 ◽

Author(s):

Alexander Robitzsch ◽

Oliver Lüdtke

Keyword(s):

Large Scale ◽

International Student ◽

Student Assessment ◽

Individual Student ◽

Scaling Model ◽

Large Scale Assessment ◽

International Student Assessment ◽

Wide Range ◽

Analytical Strategies ◽

The Individual

International large-scale assessments (LSAs) such as the Programme for International Student Assessment (PISA) provide important information about the distribution of student proficiencies across a wide range of countries. The repeated assessments of these content domains offer policymakers important information for evaluating educational reforms and received considerable attention from the media. Furthermore, the analytical strategies employed in LSAs often define methodological standards for applied researchers in the field. Hence, it is vital to critically reflect the conceptual foundations of analytical choices in LSA studies. This article discusses methodological challenges in selecting and specifying the scaling model used to obtain proficiency estimates from the individual student responses in LSA studies. We distinguish design-based inference from model-based inference. It is argued that for the official reporting of LSA results, design-based inference should be preferred because it allows for a clear definition of the target of inference (e.g., country mean achievement) and is less sensitive to specific modeling assumptions. More specifically, we discuss five analytical choices in the specification of the scaling model: (1) Specification of the functional form of item response functions, (2) the treatment of local dependencies and multidimensionality, (3) the consideration of test-taking behavior for estimating student ability, and the role of country differential items functioning (DIF) for (4) cross-country comparisons, and (5) trend estimation. This article's primary goal is to stimulate discussion about recently implemented changes and suggested refinements of the scaling models in LSA studies.

Download Full-text

Multigroup CFA and alignment approaches for testing measurement invariance and factor score estimation: Illustration with the schoolwork-related anxiety survey across countries and gender

Methodology ◽

10.5964/meth.2281 ◽

2021 ◽

Vol 17 (1) ◽

pp. 22-38

Author(s):

Jason C. Immekus

Keyword(s):

Measurement Invariance ◽

Large Scale ◽

International Student ◽

Student Assessment ◽

International Studies ◽

Invariance Testing ◽

International Student Assessment ◽

Item Parameters ◽

And Gender ◽

Gender Groups

Within large-scale international studies, the utility of survey scores to yield meaningful comparative data hinges on the degree to which their item parameters demonstrate measurement invariance (MI) across compared groups (e.g., culture). To-date, methodological challenges have restricted the ability to test the measurement invariance of item parameters of these instruments in the presence of many groups (e.g., countries). This study compares multigroup confirmatory factor analysis (MGCFA) and alignment method to investigate the MI of the schoolwork-related anxiety survey across gender groups within the 35 Organisation for Economic Co-operation and Development (OECD) countries (gender × country) of the Programme for International Student Assessment 2015 study. Subsequently, the predictive validity of MGCFA and alignment-based factor scores for subsequent mathematics achievement are examined. Considerations related to invariance testing of noncognitive instruments with many groups are discussed.

Download Full-text

Resultados brasileiros no PISA e seus (des)usos

Estudos em Avaliação Educacional ◽

10.18222/eae.v28i68.4553 ◽

2017 ◽

Vol 28 (68) ◽

pp. 344 ◽

Cited By ~ 1

Author(s):

Maria de Lourdes Haywanon Santos Araújo ◽

Robinson Moreira Tenório

Keyword(s):

Large Scale ◽

International Student ◽

Student Assessment ◽

Educational Assessment ◽

Educational Context ◽

Structured Interviews ◽

Fundamental Factor ◽

International Student Assessment ◽

Documentary Analysis ◽

Palabras Clave

O objetivo desta pesquisa consistiu em analisar como foram utilizados os resultados do Programa Internacional de Avaliação de Estudantes (PISA) no contexto educacional brasileiro. A revisão de literatura permitiu apontar a avaliação como um fator fundamental para a qualificação da educação, elaborar um panorama das pesquisas sobre o PISA no Brasil, além de propiciar discussões sobre a necessidade do uso dos resultados das avaliações em larga escala. A partir da análise documental e de entrevistas semiestruturadas, foi possível não apenas apresentar um estudo sobre o uso dos resultados do PISA no país, mas também estabelecer categorias de usos como o Uso Indevido ou Não Uso, apresentando as possibilidades e dificuldades dessa utilização e o papel dos gestores nesse processo.Palavras-chave: Pisa; Uso de Resultados; Avaliação Educacional; Políticas Públicas. Resultados brasileños en el PISA y sus (des)usosEl objetivo de este estudio consistió en analizar cómo se utilizaron los resultados del Programa Internacional de Evaluación de Estudiantes (PISA) en el marco educacional brasileño. La revisión de literatura permitió que la evaluación se considerase como un factor fundamental para la cualificación de la educación y se elaborase un panorama de las investigaciones sobre PISA en Brasil, además de propiciar discusiones sobre la necesidad del uso de los resultados de las evaluaciones en gran escala. A partir del análisis documental y de entrevistas semiestructuradas, se hizo posible no solo presentar un estudio sobre el uso de los resultados de PISA en el país, sino también establecer categorías de usos, como el Uso Indebido o No Uso, presentando las posibilidades y dificultades de dicha utilización y el papel de los gestores en este proceso.Palabras-clave: PISA; Uso de Resultados; Evaluación Educacional; Políticas Públicas. Brazilian results in PISA and its (mis)usesThe objective of this study was to analyze how the results of the Program for International Student Assessment (PISA) were used in the Brazilian educational context. The literature review showed that assessment is a fundamental factor for the qualification of education, for elaborating an overview of the PISA studies in Brazil, as well as for promoting discussions about the need to use the results of evaluations on a large scale. Based on the documentary analysis and semi-structured interviews, it was possible not only to present a study on the use of the PISA results in the country but also to establish categories of uses, such as Improper Usage or Lack of Usage, showing the possibilities and difficulties of such use and the administrators’ role in this process.Keywords: PISA; Use of Results; Educational Assessment; Public Policies.

Download Full-text

Beyond the Competencies Agenda in Large-Scale International Assessments: A Confucian Alternative

Philosophical Inquiry in Education ◽

10.7202/1071418ar ◽

2020 ◽

Vol 26 (1) ◽

pp. 20-32 ◽

Cited By ~ 1

Author(s):

Charlene Tan

Keyword(s):

Large Scale ◽

International Student ◽

Student Assessment ◽

Social Practices ◽

Relational Model ◽

Moral Cultivation ◽

Personal Mastery ◽

International Student Assessment ◽

The Social ◽

Technical Rationality

This article examines a Confucian conception of competence and its corresponding response to the competencies agenda that underpins international large-scale assessments such as the Programme for International Student Assessment (PISA). It is argued that standardised transnational assessments is underpinned by technical rationality that emphasises proficiency in discrete skills for their instrumental worth at the expense of moral cultivation and personal mastery. Challenging the competencies agenda, this paper draws upon a relational model of competence proposed by Jones and Moore (1995) that views competence as essentially communal, situated within social practices, and manifested through tacit achievement. A Confucian notion of competence is advocated where skills are premised on the virtue of ren (humanity) and demonstrated through appropriate judgement in everyday settings. A Confucian perspective offers an alternative to the behaviourist and generic notions of performance in global assessments by highlighting the social, cultural and ethical dimensions of competence.

Download Full-text