scholarly journals Cognitive Diagnosis Modeling Incorporating Item-Level Missing Data Mechanism

2020 ◽  
Vol 11 ◽  
Author(s):  
Na Shan ◽  
Xiaofei Wang

The aim of cognitive diagnosis is to classify respondents' mastery status of latent attributes from their responses on multiple items. Since respondents may answer some but not all items, item-level missing data often occur. Even if the primary interest is to provide diagnostic classification of respondents, misspecification of missing data mechanism may lead to biased conclusions. This paper proposes a joint cognitive diagnosis modeling of item responses and item-level missing data mechanism. A Bayesian Markov chain Monte Carlo (MCMC) method is developed for model parameter estimation. Our simulation studies examine the parameter recovery under different missing data mechanisms. The parameters could be recovered well with correct use of missing data mechanism for model fit, and missing that is not at random is less sensitive to incorrect use. The Program for International Student Assessment (PISA) 2015 computer-based mathematics data are applied to demonstrate the practical value of the proposed method.

2019 ◽  
Vol 43 (8) ◽  
pp. 639-654 ◽  
Author(s):  
Kaiwen Man ◽  
Jeffrey R. Harring ◽  
Hong Jiao ◽  
Peida Zhan

Computer-based testing (CBT) is becoming increasingly popular in assessing test-takers’ latent abilities and making inferences regarding their cognitive processes. In addition to collecting item responses, an important benefit of using CBT is that response times (RTs) can also be recorded and used in subsequent analyses. To better understand the structural relations between multidimensional cognitive attributes and the working speed of test-takers, this research proposes a joint-modeling approach that integrates compensatory multidimensional latent traits and response speediness using item responses and RTs. The joint model is cast as a multilevel model in which the structural relation between working speed and accuracy are connected through their variance-covariance structures. The feasibility of this modeling approach is investigated via a Monte Carlo simulation study using a Bayesian estimation scheme. The results indicate that integrating RTs increased model parameter recovery and precision. In addition, Program of International Student Assessment (PISA) 2015 mathematics standard unit items are analyzed to further evaluate the feasibility of the approach to recover model parameters.


2020 ◽  
Vol 80 (6) ◽  
pp. 1168-1195
Author(s):  
Hung-Yu Huang

In educational assessments and achievement tests, test developers and administrators commonly assume that test-takers attempt all test items with full effort and leave no blank responses with unplanned missing values. However, aberrant response behavior—such as performance decline, dropping out beyond a certain point, and skipping certain items over the course of the test—is inevitable, especially for low-stakes assessments and speeded tests due to low motivation and time limits, respectively. In this study, test-takers are classified as normal or aberrant using a mixture item response theory (IRT) modeling approach, and aberrant response behavior is described and modeled using item response trees (IRTrees). Simulations are conducted to evaluate the efficiency and quality of the new class of mixture IRTree model using WinBUGS with Bayesian estimation. The results show that the parameter recovery is satisfactory for the proposed mixture IRTree model and that treating missing values as ignorable or incorrect and ignoring possible performance decline results in biased estimation. Finally, the applicability of the new model is illustrated by means of an empirical example based on the Program for International Student Assessment.


2018 ◽  
Vol 26 (2) ◽  
pp. 196-212 ◽  
Author(s):  
Kentaro Yamamoto ◽  
Mary Louise Lennon

Purpose Fabricated data jeopardize the reliability of large-scale population surveys and reduce the comparability of such efforts by destroying the linkage between data and measurement constructs. Such data result in the loss of comparability across participating countries and, in the case of cyclical surveys, between past and present surveys. This paper aims to describe how data fabrication can be understood in the context of the complex processes involved in the collection, handling, submission and analysis of large-scale assessment data. The actors involved in those processes, and their possible motivations for data fabrication, are also elaborated. Design/methodology/approach Computer-based assessments produce new types of information that enable us to detect the possibility of data fabrication, and therefore the need for further investigation and analysis. The paper presents three examples that illustrate how data fabrication was identified and documented in the Programme for the International Assessment of Adult Competencies (PIAAC) and the Programme for International Student Assessment (PISA) and discusses the resulting remediation efforts. Findings For two countries that participated in the first round of PIAAC, the data showed a subset of interviewers who handled many more cases than others. In Case 1, the average proficiency for respondents in those interviewers’ caseloads was much higher than expected and included many duplicate response patterns. In Case 2, anomalous response patterns were identified. Case 3 presents findings based on data analyses for one PISA country, where results for human-coded responses were shown to be highly inflated compared to past results. Originality/value This paper shows how new sources of data, such as timing information collected in computer-based assessments, can be combined with other traditional sources to detect fabrication.


2020 ◽  
Vol 20 (1) ◽  
pp. 59-78
Author(s):  
Mohammed A. A. Abulela ◽  
Michael Harwell

Data analysis is a significant methodological component when conducting quantitative education studies. Guidelines for conducting data analyses in quantitative education studies are common but often underemphasize four important methodological components impacting the validity of inferences: quality of constructed measures, proper handling of missing data, proper level of measurement of a dependent variable, and model checking. This paper highlights these components for novice researchers to help ensure statistical inferences are valid. We used empirical examples involving contingency tables, group comparisons, regression analysis, and multilevel modelling to illustrate these components using the Program for International Student Assessment (PISA) data. For every example, we stated a research question and provided evidence related to the quality of constructed measures since measures with weak reliability and validity evidence can bias estimates and distort inferences. The adequate strategies for handling missing data were also illustrated. The level of measurement for the dependent variable was assessed and the proper statistical technique was utilized accordingly. Model residuals were checked for normality and homogeneity of variance. Recommendations for obtaining stronger inferences and reporting related evidence were also illustrated. This work provides an important methodological resource for novice researchers conducting data analyses by promoting improved practice and stronger inferences.


2017 ◽  
Vol 43 (3) ◽  
pp. 316-353 ◽  
Author(s):  
Simon Grund ◽  
Oliver Lüdtke ◽  
Alexander Robitzsch

Multiple imputation (MI) can be used to address missing data at Level 2 in multilevel research. In this article, we compare joint modeling (JM) and the fully conditional specification (FCS) of MI as well as different strategies for including auxiliary variables at Level 1 using either their manifest or their latent cluster means. We show with theoretical arguments and computer simulations that (a) an FCS approach that uses latent cluster means is comparable to JM and (b) using manifest cluster means provides similar results except in relatively extreme cases with unbalanced data. We outline a computational procedure for including latent cluster means in an FCS approach using plausible values and provide an example using data from the Programme for International Student Assessment 2012 study.


2019 ◽  
Vol 2 (1) ◽  
pp. 43
Author(s):  
Heru Sukoco

The results of the PISA (Program for International Student Assessment) survey in 2012-2015 on the achievement of mathematical competencies of Indonesian students showed a significant increase, but the overall achievement was still below the average of the countries belonging to the Organization for Economic Co-operation and Development (OECD). Furthermore, the results of the Trends in International Mathematics and Science Study (TIMSS) report showed that many students like and feel good about mathematics, but their confidence in their mathematical abilities was quite low. Many studies reveal the close association of Mathematics Self-Efficacy (MSE) with the performance/achievement of students' mathematical competencies. In 2015, the PISA survey was done using computerization except in 15 countries, one of them was in Indonesia. Therefore, the results of this study are to produce the first computer-based MSE scale developed in Indonesia.


2020 ◽  
Vol 64 (3) ◽  
pp. 282-303
Author(s):  
Claire Scoular ◽  
Sofia Eleftheriadou ◽  
Dara Ramalingam ◽  
Dan Cloney

Collaboration is a complex skill, comprised of multiple subskills, that is of growing interest to policy makers, educators and researchers. Several definitions and frameworks have been described in the literature to support assessment of collaboration; however, the inherent structure of the construct still needs better definition. In 2015, the Organisation for Economic Cooperation and Development, in their Programme for International Student Assessment assessed 15-year-old students’ collaborative problem solving achievement, with the use of computer-simulated agents, aiming to address the lack of internationally comparable data in this field. This paper explores what the data from this assessment tell us about the skill, and how these data compare with data from two other assessments of collaboration. Analyses enable comment on the extent to which the three assessments are measuring the same construct, and the extent to which the construct can be covered using computer-based assessments. These investigations generate better understanding of this complex and innovative domain.


2019 ◽  
Vol 24 (3) ◽  
pp. 231-242 ◽  
Author(s):  
Herbert W. Marsh ◽  
Philip D. Parker ◽  
Reinhard Pekrun

Abstract. We simultaneously resolve three paradoxes in academic self-concept research with a single unifying meta-theoretical model based on frame-of-reference effects across 68 countries, 18,292 schools, and 485,490 15-year-old students. Paradoxically, but consistent with predictions, effects on math self-concepts were negative for: • being from countries where country-average achievement was high; explaining the paradoxical cross-cultural self-concept effect; • attending schools where school-average achievement was high; demonstrating big-fish-little-pond-effects (BFLPE) that generalized over 68 countries, Organisation for Economic Co-operation and Development (OECD)/non-OECD countries, high/low achieving schools, and high/low achieving students; • year-in-school relative to age; unifying different research literatures for associated negative effects for starting school at a younger age and acceleration/skipping grades, and positive effects for starting school at an older age (“academic red shirting”) and, paradoxically, even for repeating a grade. Contextual effects matter, resulting in significant and meaningful effects on self-beliefs, not only at the student (year in school) and local school level (BFLPE), but remarkably even at the macro-contextual country-level. Finally, we juxtapose cross-cultural generalizability based on Programme for International Student Assessment (PISA) data used here with generalizability based on meta-analyses, arguing that although the two approaches are similar in many ways, the generalizability shown here is stronger in terms of support for the universality of the frame-of-reference effects.


Methodology ◽  
2007 ◽  
Vol 3 (4) ◽  
pp. 149-159 ◽  
Author(s):  
Oliver Lüdtke ◽  
Alexander Robitzsch ◽  
Ulrich Trautwein ◽  
Frauke Kreuter ◽  
Jan Marten Ihme

Abstract. In large-scale educational assessments such as the Third International Mathematics and Sciences Study (TIMSS) or the Program for International Student Assessment (PISA), sizeable numbers of test administrators (TAs) are needed to conduct the assessment sessions in the participating schools. TA training sessions are run and administration manuals are compiled with the aim of ensuring standardized, comparable, assessment situations in all student groups. To date, however, there has been no empirical investigation of the effectiveness of these standardizing efforts. In the present article, we probe for systematic TA effects on mathematics achievement and sample attrition in a student achievement study. Multilevel analyses for cross-classified data using Markov Chain Monte Carlo (MCMC) procedures were performed to separate the variance that can be attributed to differences between schools from the variance associated with TAs. After controlling for school effects, only a very small, nonsignificant proportion of the variance in mathematics scores and response behavior was attributable to the TAs (< 1%). We discuss practical implications of these findings for the deployment of TAs in educational assessments.


Sign in / Sign up

Export Citation Format

Share Document