scholarly journals Multigroup CFA and alignment approaches for testing measurement invariance and factor score estimation: Illustration with the schoolwork-related anxiety survey across countries and gender

Methodology ◽  
2021 ◽  
Vol 17 (1) ◽  
pp. 22-38
Author(s):  
Jason C. Immekus

Within large-scale international studies, the utility of survey scores to yield meaningful comparative data hinges on the degree to which their item parameters demonstrate measurement invariance (MI) across compared groups (e.g., culture). To-date, methodological challenges have restricted the ability to test the measurement invariance of item parameters of these instruments in the presence of many groups (e.g., countries). This study compares multigroup confirmatory factor analysis (MGCFA) and alignment method to investigate the MI of the schoolwork-related anxiety survey across gender groups within the 35 Organisation for Economic Co-operation and Development (OECD) countries (gender × country) of the Programme for International Student Assessment 2015 study. Subsequently, the predictive validity of MGCFA and alignment-based factor scores for subsequent mathematics achievement are examined. Considerations related to invariance testing of noncognitive instruments with many groups are discussed.

2010 ◽  
Vol 106 (1) ◽  
pp. 49-53 ◽  
Author(s):  
Robert M. Capraro ◽  
Mary Margaret Capraro ◽  
Z. Ebrar Yetkiner ◽  
Serkan Özel ◽  
Hae Gyu Kim ◽  
...  

This study extends the scope of international comparisons examining students' conceptions of the equal sign. Specifically, Korean ( n = 193) and Turkish ( n = 334) Grade 6 students were examined to assess whether their conceptions and responses were similar to prior findings published for Chinese and U.S. students and to hypothesize relationships about problem types and conceptual understanding of the equal sign. About 59.6% of the Korean participants correctly answered all items providing conceptually accurate solutions, as compared to 28.4% of the Turkish sample. Comparison with previous studies in China and the USA indicated that the Chinese sample outperformed those from other nations, followed by Korea, Turkey, and the USA. In large-scale international studies such as Trends in International Mathematics and Science (TIMSS) and the Programme for International Student Assessment (PISA), students from China and Korea have been among the high achievers.


2020 ◽  
Vol 84 (1) ◽  
pp. 109-133
Author(s):  
Nurullah Erylmaz ◽  
Mauricio Rivera-Gutiérrez ◽  
Andrés Sandoval-Hernández

It has been claimed that there is a lack of theory-driven constructs and a lack of cross-country comparability in International Large-Scale Assessment (ILSA)’s socio-economic background scales. To address these issues, a new socio-economic background scale was created based on Pierre Bourdieu’s cultural reproduction theory, which distinguishes economic, cultural and social capital. Secondly, measurement invariance of this construct was tested across countries participating in the Programme for International Student Assessment (PISA). After dividing the countries which participated in PISA 2015 into three groups, i.e., Latin American, European, and Asian, a Multi-Group Confirmatory Factor Analysis was carried out in order to examine the measurement invariance of this new socio-economic scale. The results of this study revealed that this questionnaire, which measures the socio-economic background, was not found to be utterly invariant in the analysis involving all countries. However, when analysing more homogenous groups, measurement invariance was verified at the metric level, except for the group of Latin American countries. Further, implications for policymakers and recommendations for future studies are discussed.


2017 ◽  
Vol 43 (3) ◽  
pp. 241-250 ◽  
Author(s):  
Janine Buchholz ◽  
Johannes Hartig

Questionnaires for the assessment of attitudes and other psychological traits are crucial in educational and psychological research, and item response theory (IRT) has become a viable tool for scaling such data. Many international large-scale assessments aim at comparing these constructs across countries, and the invariance of measures across countries is thus required. In its most recent cycle, the Programme for International Student Assessment (PISA 2015) implemented an innovative approach for testing the invariance of IRT-scaled constructs in the context questionnaires administered to students, parents, school principals, and teachers. On the basis of a concurrent calibration with equal item parameters across all groups (i.e., languages within countries), a group-specific item-fit statistic (root mean square deviance [RMSD]) was used as a measure for the invariance of item parameters for individual groups. The present simulation study examines the statistic’s distribution under different types and extents of (non)invariance in polytomous items. Responses to five 4-point Likert-type items were generated under the generalized partial credit model (GPCM) for 1,000 simulees in 50 groups each. For one of the five items, either location or discrimination parameters were drawn from a normal distribution. In addition to the type of noninvariance, the extent of noninvariance was varied by manipulating the variation of these distributions. The results indicate that the RMSD statistic is better at detecting noninvariance related to between-group differences in item location than in item discrimination. The study’s findings may be used as a starting point to sensitivity analysis aiming to define cutoff values for determining (non)invariance.


Methodology ◽  
2007 ◽  
Vol 3 (4) ◽  
pp. 149-159 ◽  
Author(s):  
Oliver Lüdtke ◽  
Alexander Robitzsch ◽  
Ulrich Trautwein ◽  
Frauke Kreuter ◽  
Jan Marten Ihme

Abstract. In large-scale educational assessments such as the Third International Mathematics and Sciences Study (TIMSS) or the Program for International Student Assessment (PISA), sizeable numbers of test administrators (TAs) are needed to conduct the assessment sessions in the participating schools. TA training sessions are run and administration manuals are compiled with the aim of ensuring standardized, comparable, assessment situations in all student groups. To date, however, there has been no empirical investigation of the effectiveness of these standardizing efforts. In the present article, we probe for systematic TA effects on mathematics achievement and sample attrition in a student achievement study. Multilevel analyses for cross-classified data using Markov Chain Monte Carlo (MCMC) procedures were performed to separate the variance that can be attributed to differences between schools from the variance associated with TAs. After controlling for school effects, only a very small, nonsignificant proportion of the variance in mathematics scores and response behavior was attributable to the TAs (< 1%). We discuss practical implications of these findings for the deployment of TAs in educational assessments.


2021 ◽  
Vol 33 (1) ◽  
pp. 139-167
Author(s):  
Andrés Strello ◽  
Rolf Strietholt ◽  
Isa Steinmann ◽  
Charlotte Siepmann

AbstractResearch to date on the effects of between-school tracking on inequalities in achievement and on performance has been inconclusive. A possible explanation is that different studies used different data, focused on different domains, and employed different measures of inequality. To address this issue, we used all accumulated data collected in the three largest international assessments—PISA (Programme for International Student Assessment), PIRLS (Progress in International Reading Literacy Study), and TIMSS (Trends in International Mathematics and Science Study)—in the past 20 years in 75 countries and regions. Following the seminal paper by Hanushek and Wößmann (2006), we combined data from a total of 21 cycles of primary and secondary school assessments to estimate difference-in-differences models for different outcome measures. We synthesized the effects using a meta-analytical approach and found strong evidence that tracking increased social achievement gaps, that it had smaller but still significant effects on dispersion inequalities, and that it had rather weak effects on educational inadequacies. In contrast, we did not find evidence that tracking increased performance levels. Besides these substantive findings, our study illustrated that the effect estimates varied considerably across the datasets used because the low number of countries as the units of analysis was a natural limitation. This finding casts doubt on the reproducibility of findings based on single international datasets and suggests that researchers should use different data sources to replicate analyses.


2020 ◽  
Vol 64 (3) ◽  
pp. 205-226
Author(s):  
John Ainley ◽  
Dan Cloney ◽  
Jessica Thompson

Declines in the scores of Australian 15-year-old students from the Programme for International Student Assessment are a matter of policy interest. Some of the declines may have resulted from shifts in the age-grade distributions of students in the Programme for International Student Assessment samples. We use multiple regression methods to model the student-level effects of grade for each Programme for International Student Assessment cycle allowing for the effects of student characteristics (e.g. socioeconomic background and gender) and jurisdiction. We estimate an average net effect of grade over the Programme for International Student Assessment cycles since 2006 as 42 scale points with no difference between reading and mathematics. We explore the extent to which differences between grades in achievement and changes in the grade distributions of students contributed to changes in average Programme for International Student Assessment achievement scores. We conclude that the relatively greater decline in Grade 11, compared to Grade 10 achievement, contributed to the overall decline and that shifts in distributions may have also contributed a little to those declines.


2019 ◽  
Vol 44 (6) ◽  
pp. 752-781
Author(s):  
Michael O. Martin ◽  
Ina V.S. Mullis

International large-scale assessments of student achievement such as International Association for the Evaluation of Educational Achievement’s Trends in International Mathematics and Science Study (TIMSS) and Progress in International Reading Literacy Study and Organization for Economic Cooperation and Development’s Program for International Student Assessment that have come to prominence over the past 25 years owe a great deal in methodological terms to pioneering work by National Assessment of Educational Progress (NAEP). Using TIMSS as an example, this article describes how a number of core techniques, such as matrix sampling, student population sampling, item response theory scaling with population modeling, and resampling methods for variance estimation, have been adapted and implemented in an international context and are fundamental to the international assessment effort. In addition to the methodological contributions of NAEP, this article illustrates how the large-scale international assessments go beyond measuring student achievement by representing important aspects of community, home, school, and classroom contexts in ways that can be used to address issues of importance to researchers and policymakers.


2020 ◽  
pp. 249-263
Author(s):  
Luisa Araújo ◽  
Patrícia Costa ◽  
Nuno Crato

AbstractThis chapter provides a short description of what the Programme for International Student Assessment (PISA) measures and how it measures it. First, it details the concepts associated with the measurement of student performance and the concepts associated with capturing student and school characteristics and explains how they compare with some other International Large-Scale Assessments (ILSA). Second, it provides information on the assessment of reading, the main domain in PISA 2018. Third, it provides information on the technical aspects of the measurements in PISA. Lastly, it offers specific examples of PISA 2018 cognitive items, corresponding domains (mathematics, science, and reading), and related performance levels.


2021 ◽  
Author(s):  
Alexander Robitzsch ◽  
Oliver Lüdtke

International large-scale assessments (LSAs) such as the Programme for International Student Assessment (PISA) provide important information about the distribution of student proficiencies across a wide range of countries. The repeated assessments of these content domains offer policymakers important information for evaluating educational reforms and received considerable attention from the media. Furthermore, the analytical strategies employed in LSAs often define methodological standards for applied researchers in the field. Hence, it is vital to critically reflect the conceptual foundations of analytical choices in LSA studies. This article discusses methodological challenges in selecting and specifying the scaling model used to obtain proficiency estimates from the individual student responses in LSA studies. We distinguish design-based inference from model-based inference. It is argued that for the official reporting of LSA results, design-based inference should be preferred because it allows for a clear definition of the target of inference (e.g., country mean achievement) and is less sensitive to specific modeling assumptions. More specifically, we discuss five analytical choices in the specification of the scaling model: (1) Specification of the functional form of item response functions, (2) the treatment of local dependencies and multidimensionality, (3) the consideration of test-taking behavior for estimating student ability, and the role of country differential items functioning (DIF) for (4) cross-country comparisons, and (5) trend estimation. This article's primary goal is to stimulate discussion about recently implemented changes and suggested refinements of the scaling models in LSA studies.


Sign in / Sign up

Export Citation Format

Share Document