scholarly journals On the Treatment of Missing Item Responses in Educational Large-Scale Assessment Data: An Illustrative Simulation Study and a Case Study Using PISA 2018 Mathematics Data

2021 ◽  
Vol 11 (4) ◽  
pp. 1653-1687
Author(s):  
Alexander Robitzsch

Missing item responses are prevalent in educational large-scale assessment studies such as the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians have advocated for a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. In an illustrative simulation study, it is shown that the Mislevy-Wu model provides unbiased model parameters. Moreover, the simulation replicates the finding from various simulation studies from the literature that scoring missing item responses as wrong provides biased estimates if the latent ignorability assumption holds in the data-generating model. However, if missing item responses are generated such that they can only be generated from incorrect item responses, applying an item response model that relies on latent ignorability results in biased estimates. The Mislevy-Wu model guarantees unbiased parameter estimates if the more general Mislevy-Wu model holds in the data-generating model. In addition, this article uses the PISA 2018 mathematics dataset as a case study to investigate the consequences of different missing data treatments on country means and country standard deviations. Obtained country means and country standard deviations can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, in the discussion section, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.

Author(s):  
Alexander Robitzsch

Missing item responses are prevalent in educational large-scale assessment studies like the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians advocated a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. This article uses the PISA 2018 mathematics dataset to investigate the consequences of different missing data treatments on country means. Obtained country means can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.


Land ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 295
Author(s):  
Yuan Gao ◽  
Anyu Zhang ◽  
Yaojie Yue ◽  
Jing’ai Wang ◽  
Peng Su

Suitable land is an important prerequisite for crop cultivation and, given the prospect of climate change, it is essential to assess such suitability to minimize crop production risks and to ensure food security. Although a variety of methods to assess the suitability are available, a comprehensive, objective, and large-scale screening of environmental variables that influence the results—and therefore their accuracy—of these methods has rarely been explored. An approach to the selection of such variables is proposed and the criteria established for large-scale assessment of land, based on big data, for its suitability to maize (Zea mays L.) cultivation as a case study. The predicted suitability matched the past distribution of maize with an overall accuracy of 79% and a Kappa coefficient of 0.72. The land suitability for maize is likely to decrease markedly at low latitudes and even at mid latitudes. The total area suitable for maize globally and in most major maize-producing countries will decrease, the decrease being particularly steep in those regions optimally suited for maize at present. Compared with earlier research, the method proposed in the present paper is simple yet objective, comprehensive, and reliable for large-scale assessment. The findings of the study highlight the necessity of adopting relevant strategies to cope with the adverse impacts of climate change.


2018 ◽  
Vol 43 (7) ◽  
pp. 543-561 ◽  
Author(s):  
Yuan-Pei Chang ◽  
Chia-Yi Chiu ◽  
Rung-Ching Tsai

Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small.


Author(s):  
Michael Walker ◽  
Douglas C. Schmidt ◽  
Jules White

To address this efficiency and the resulting inherent scalability problems of Learning-at-Scale, this chapter propose a platform called SPLAShED: Software Platform for Large-Scale Assessment of Software-Development for Education-at-a-Distance, that uses Linux containers to provide OS-level virtualization. This gives each desired service the equivalent of a Virtual Private Server (VPS) that creates a temporary private userspace on the server. Each VPS provides a separate working environment for each desired application, but does not incur the overhead of traditional virtualization techniques. Our SPLAShED platform applies recent advances in Linux container deployment automation, resources isolation, portability, and usability. These advances enable the SPLAShED platform to serve both as. This chapter explores assignments as a case study: an Android based software design assignment. This case study shows how the SPLAShED platform will able to accommodate and facilitate advanced Software Development courses with features and abilities currently not available.


Author(s):  
Giorgio Bolondi ◽  
Federica Ferretti

We report an example of a research approach aimed at gathering quantitative pieces of evidence of solid findings in mathematics education. The main goal of this project is to provide an additional perspective on solid findings in education, to be used by teachers and by researchers in their work. As a case study, we present a situation of “loss of meaning” in algebra, exploring it with data coming from a large-scale assessment interpreted by means of theoretical lenses. We are able to give information about the extent of the phenomenon and to highlight how the phenomenon is relevant also for high-level students. This approach can provide a link between large-scale assessment results, educational research, and teachers’ practices, and suggests further research issues.


2020 ◽  
Author(s):  
Alexander Robitzsch

In recent literature, alternative models for handling missing item responses in large-scale assessments are proposed. In principle, based on simulations and arguments based test theory (Rose, 2013). In those approaches, it is argued that missing item responses should never be scored as incorrect, but rather treated as ignorable (e.g., Pohl et al., 2014). The present contribution shows that these arguments have limited validity and illustrates the consequences in a country comparison in the PIRLS 2011 study. A different treatment of missing item responses than recoding them as incorrect leads to significant changes in country rankings, which induces nonignorable consequences regarding the results' validity. Additionally, two alternative item response models based on different assumptions for missing item responses are proposed.


Sign in / Sign up

Export Citation Format

Share Document