On the Treatment of Missing Item Responses in Educational Large-Scale Assessment Data: An Illustrative Simulation Study and a Case Study Using PISA 2018 Mathematics Data

Missing item responses are prevalent in educational large-scale assessment studies such as the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians have advocated for a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. In an illustrative simulation study, it is shown that the Mislevy-Wu model provides unbiased model parameters. Moreover, the simulation replicates the finding from various simulation studies from the literature that scoring missing item responses as wrong provides biased estimates if the latent ignorability assumption holds in the data-generating model. However, if missing item responses are generated such that they can only be generated from incorrect item responses, applying an item response model that relies on latent ignorability results in biased estimates. The Mislevy-Wu model guarantees unbiased parameter estimates if the more general Mislevy-Wu model holds in the data-generating model. In addition, this article uses the PISA 2018 mathematics dataset as a case study to investigate the consequences of different missing data treatments on country means and country standard deviations. Obtained country means and country standard deviations can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, in the discussion section, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.

Download Full-text

On the Treatment of Missing Item Responses in Educational Large-scale Assessment Data: The Case of PISA 2018 Mathematics

10.20944/preprints202110.0107.v1 ◽

2021 ◽

Author(s):

Alexander Robitzsch

Keyword(s):

Missing Data ◽

Large Scale ◽

Model Fit ◽

Large Scale Assessment ◽

Missing Data Treatments ◽

Scale Assessment ◽

Scaling Models ◽

Item Responses ◽

Response Propensity ◽

Missing Item

Missing item responses are prevalent in educational large-scale assessment studies like the programme for international student assessment (PISA). The current operational practice scores missing item responses as wrong, but several psychometricians advocated a model-based treatment based on latent ignorability assumption. In this approach, item responses and response indicators are jointly modeled conditional on a latent ability and a latent response propensity variable. Alternatively, imputation-based approaches can be used. The latent ignorability assumption is weakened in the Mislevy-Wu model that characterizes a nonignorable missingness mechanism and allows the missingness of an item to depend on the item itself. The scoring of missing item responses as wrong and the latent ignorable model are submodels of the Mislevy-Wu model. This article uses the PISA 2018 mathematics dataset to investigate the consequences of different missing data treatments on country means. Obtained country means can substantially differ for the different scaling models. In contrast to previous statements in the literature, the scoring of missing item responses as incorrect provided a better model fit than a latent ignorable model for most countries. Furthermore, the dependence of the missingness of an item from the item itself after conditioning on the latent response propensity was much more pronounced for constructed-response items than for multiple-choice items. As a consequence, scaling models that presuppose latent ignorability should be refused from two perspectives. First, the Mislevy-Wu model is preferred over the latent ignorable model for reasons of model fit. Second, we argue that model fit should only play a minor role in choosing psychometric models in large-scale assessment studies because validity aspects are most relevant. Missing data treatments that countries can simply manipulate (and, hence, their students) result in unfair country comparisons.

Download Full-text

Predicting Shifts in Land Suitability for Maize Cultivation Worldwide Due to Climate Change: A Modeling Approach

Land ◽

10.3390/land10030295 ◽

2021 ◽

Vol 10 (3) ◽

pp. 295

Author(s):

Yuan Gao ◽

Anyu Zhang ◽

Yaojie Yue ◽

Jing’ai Wang ◽

Peng Su

Keyword(s):

Climate Change ◽

Crop Production ◽

Large Scale ◽

Land Suitability ◽

Large Scale Assessment ◽

Scale Assessment ◽

Maize Cultivation ◽

Large Scale Screening ◽

Selection Of

Suitable land is an important prerequisite for crop cultivation and, given the prospect of climate change, it is essential to assess such suitability to minimize crop production risks and to ensure food security. Although a variety of methods to assess the suitability are available, a comprehensive, objective, and large-scale screening of environmental variables that influence the results—and therefore their accuracy—of these methods has rarely been explored. An approach to the selection of such variables is proposed and the criteria established for large-scale assessment of land, based on big data, for its suitability to maize (Zea mays L.) cultivation as a case study. The predicted suitability matched the past distribution of maize with an overall accuracy of 79% and a Kappa coefficient of 0.72. The land suitability for maize is likely to decrease markedly at low latitudes and even at mid latitudes. The total area suitable for maize globally and in most major maize-producing countries will decrease, the decrease being particularly steep in those regions optimally suited for maize at present. Compared with earlier research, the method proposed in the present paper is simple yet objective, comprehensive, and reliable for large-scale assessment. The findings of the study highlight the necessity of adopting relevant strategies to cope with the adverse impacts of climate change.

Download Full-text

Nonparametric CAT for CD in Educational Settings With Small Samples

Applied Psychological Measurement ◽

10.1177/0146621618813113 ◽

2018 ◽

Vol 43 (7) ◽

pp. 543-561 ◽

Cited By ~ 3

Author(s):

Yuan-Pei Chang ◽

Chia-Yi Chiu ◽

Rung-Ching Tsai

Keyword(s):

Large Scale ◽

Computerized Adaptive Testing ◽

Small Samples ◽

Accurate Estimation ◽

Small Scale ◽

Large Scale Assessment ◽

Scale Assessment ◽

Item Parameters ◽

Item Responses ◽

Nonparametric Classification

Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small.

Download Full-text

Large-scale assessment of soil erosion using a neuro-fuzzy model combined with GIS: A case study of Hubei Province, China

Land Degradation and Development ◽

10.1002/ldr.956 ◽

2009 ◽

Vol 20 (6) ◽

pp. 654-666 ◽

Cited By ~ 5

Author(s):

D. Zhu ◽

T. W. Wang ◽

C. F. Cai ◽

L. Li ◽

Z. H. Shi

Keyword(s):

Soil Erosion ◽

Large Scale ◽

Fuzzy Model ◽

Hubei Province ◽

Large Scale Assessment ◽

Scale Assessment ◽

Neuro Fuzzy

Download Full-text

Scenario analysis of rainwater harvesting and use on a large scale – assessment of runoff, storage and economic performance for the case study Amsterdam Airport Schiphol

Urban Water Journal ◽

10.1080/1573062x.2015.1086007 ◽

2015 ◽

Vol 14 (3) ◽

pp. 237-246 ◽

Cited By ~ 6

Author(s):

Martijn Kuller ◽

N.J. Dolman ◽

J.H.G. Vreeburg ◽

Marc Spiller

Keyword(s):

Scenario Analysis ◽

Economic Performance ◽

Large Scale ◽

Rainwater Harvesting ◽

Large Scale Assessment ◽

Scale Assessment

Download Full-text

An Elastic Platform for Large-scale Assessment of Software Assignments for MOOCs (EPLASAM)

User-Centered Design Strategies for Massive Open Online Courses (MOOCs) - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-4666-9743-0.ch012 ◽

2016 ◽

pp. 187-206 ◽

Cited By ~ 1

Author(s):

Michael Walker ◽

Douglas C. Schmidt ◽

Jules White

Keyword(s):

Software Development ◽

Software Design ◽

Large Scale ◽

Working Environment ◽

Software Platform ◽

Large Scale Assessment ◽

Scale Assessment ◽

Recent Advances ◽

Linux Containers

To address this efficiency and the resulting inherent scalability problems of Learning-at-Scale, this chapter propose a platform called SPLAShED: Software Platform for Large-Scale Assessment of Software-Development for Education-at-a-Distance, that uses Linux containers to provide OS-level virtualization. This gives each desired service the equivalent of a Virtual Private Server (VPS) that creates a temporary private userspace on the server. Each VPS provides a separate working environment for each desired application, but does not incur the overhead of traditional virtualization techniques. Our SPLAShED platform applies recent advances in Linux container deployment automation, resources isolation, portability, and usability. These advances enable the SPLAShED platform to serve both as. This chapter explores assignments as a case study: an Android based software design assignment. This case study shows how the SPLAShED platform will able to accommodate and facilitate advanced Software Development courses with features and abilities currently not available.

Download Full-text

Quantifying Solid Findings in Mathematics Education: Loss of Meaning for Algebraic Symbols

International Journal of Innovation in Science and Mathematics Education ◽

10.30722/ijisme.29.01.001 ◽

2021 ◽

Vol 29 (1) ◽

Author(s):

Giorgio Bolondi ◽

Federica Ferretti

Keyword(s):

Mathematics Education ◽

Educational Research ◽

Large Scale ◽

Research Approach ◽

Research Issues ◽

Large Scale Assessment ◽

Scale Assessment ◽

High Level ◽

Assessment Results

We report an example of a research approach aimed at gathering quantitative pieces of evidence of solid findings in mathematics education. The main goal of this project is to provide an additional perspective on solid findings in education, to be used by teachers and by researchers in their work. As a case study, we present a situation of “loss of meaning” in algebra, exploring it with data coming from a large-scale assessment interpreted by means of theoretical lenses. We are able to give information about the extent of the phenomenon and to highlight how the phenomenon is relevant also for high-level students. This approach can provide a link between large-scale assessment results, educational research, and teachers’ practices, and suggests further research issues.

Download Full-text

An application of GIS and coastal geomorphology for large scale assessment of coastal erosion and management: a case study of Ghana

Journal of Coastal Conservation ◽

10.1007/s11852-012-0209-0 ◽

2012 ◽

Vol 16 (3) ◽

pp. 383-397 ◽

Cited By ~ 26

Author(s):

Isaac Boateng

Keyword(s):

Coastal Erosion ◽

Large Scale ◽

Coastal Geomorphology ◽

Large Scale Assessment ◽

Scale Assessment ◽

Application Of Gis

Download Full-text

About Still Nonignorable Consequences of (Partially) Ignoring Missing Item Responses in Large-scale Assessment

10.31219/osf.io/hmy45 ◽

2020 ◽

Author(s):

Alexander Robitzsch

Keyword(s):

Large Scale ◽

Test Theory ◽

Present Contribution ◽

Response Models ◽

Item Response Models ◽

Large Scale Assessment ◽

Country Comparison ◽

Country Rankings ◽

Item Responses ◽

Missing Item

In recent literature, alternative models for handling missing item responses in large-scale assessments are proposed. In principle, based on simulations and arguments based test theory (Rose, 2013). In those approaches, it is argued that missing item responses should never be scored as incorrect, but rather treated as ignorable (e.g., Pohl et al., 2014). The present contribution shows that these arguments have limited validity and illustrates the consequences in a country comparison in the PIRLS 2011 study. A different treatment of missing item responses than recoding them as incorrect leads to significant changes in country rankings, which induces nonignorable consequences regarding the results' validity. Additionally, two alternative item response models based on different assumptions for missing item responses are proposed.

Download Full-text

Alignment, Inclusion, and Technical Adequacy in Large-Scale Assessment and Accountability.

PsycCRITIQUES ◽

10.1037/004824 ◽

2004 ◽

Vol 49 (5) ◽

Author(s):

William D. Schafer

Keyword(s):

Large Scale ◽

Technical Adequacy ◽

Large Scale Assessment ◽

Scale Assessment

Download Full-text