scholarly journals A Multilevel Mixture IRT Framework for Modeling Response Times as Predictors or Indicators of Response Engagement in IRT Models

2021 ◽  
pp. 001316442110453
Author(s):  
Gabriel Nagy ◽  
Esther Ulitzsch

Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.

Author(s):  
Jun Huang ◽  
Linchuan Xu ◽  
Jing Wang ◽  
Lei Feng ◽  
Kenji Yamanishi

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i.e., Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.


2021 ◽  
pp. 43-48
Author(s):  
Rosa Fabbricatore ◽  
Francesco Palumbo

Evaluating learners' competencies is a crucial concern in education, and home and classroom structured tests represent an effective assessment tool. Structured tests consist of sets of items that can refer to several abilities or more than one topic. Several statistical approaches allow evaluating students considering the items in a multidimensional way, accounting for their structure. According to the evaluation's ending aim, the assessment process assigns a final grade to each student or clusters students in homogeneous groups according to their level of mastery and ability. The latter represents a helpful tool for developing tailored recommendations and remediations for each group. At this aim, latent class models represent a reference. In the item response theory (IRT) paradigm, the multidimensional latent class IRT models, releasing both the traditional constraints of unidimensionality and continuous nature of the latent trait, allow to detect sub-populations of homogeneous students according to their proficiency level also accounting for the multidimensional nature of their ability. Moreover, the semi-parametric formulation leads to several advantages in practice: It avoids normality assumptions that may not hold and reduces the computation demanding. This study compares the results of the multidimensional latent class IRT models with those obtained by a two-step procedure, which consists of firstly modeling a multidimensional IRT model to estimate students' ability and then applying a clustering algorithm to classify students accordingly. Regarding the latter, parametric and non-parametric approaches were considered. Data refer to the admission test for the degree course in psychology exploited in 2014 at the University of Naples Federico II. Students involved were N=944, and their ability dimensions were defined according to the domains assessed by the entrance exam, namely Humanities, Reading and Comprehension, Mathematics, Science, and English. In particular, a multidimensional two-parameter logistic IRT model for dichotomously-scored items was considered for students' ability estimation.


2019 ◽  
Vol 79 (5) ◽  
pp. 931-961 ◽  
Author(s):  
Cengiz Zopluoglu

Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potential testing fraud. In this study, a technical review of a recently developed state-of-the-art algorithm, Extreme Gradient Boosting (XGBoost), is provided and the utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam. Four different XGBoost models were trained using different sets of input features based on (a) only dichotomous item responses, (b) only nominal item responses, (c) both dichotomous item responses and response times, and (d) both nominal item responses and response times. The predictive performance of each model was evaluated using the area under the receiving operating characteristic curve and several classification measures such as the false-positive rate, true-positive rate, and precision. For comparison purposes, the results from two person-fit statistics on the same data set were also provided. The results indicated that XGBoost successfully classified the honest test takers and fraudulent test takers with item preknowledge. Particularly, the classification performance of XGBoost was reasonably good when the response time information and item responses were both taken into account.


2020 ◽  
Author(s):  
Benjamin Domingue ◽  
Klint Kanopka ◽  
Ben Stenhaug ◽  
Jim Soland ◽  
Megan Kuhfeld ◽  
...  

As our ability to collect data about respondents increases, approaches for incorporating ancillary data features such as response time are of heightened interest. Models for response time have been advanced, but relatively limited large-scale empirical investigations have been conducted. We take advantage of a unique and massive dataset—data from computer adaptive administrations of the NWEA MAP Growth assessment in two states consisting of roughly 1/4 billion item responses—containing both item responses plus response times to shed light on emergent features of response time behavior. We focus on two behaviors in particular. The first, response acceleration, is a reduction in response time for responses that occur relatively late on the assessment. We further note that such reductions are heterogeneous as a function of estimated ability (lower ability estimates are associated with larger increases in acceleration) and that reductions in response time on later items lead to reductions in accuracy relative to expectation. We also document variation in interplay between speed and accuracy. In some cases, additional time spent on an item is associated with an increase in accuracy; in other cases, the opposite is true. This finding has potential connections to the nascent literature on different within-person response processes. We argue that our approach may be useful in other settings and that the behaviors observed here should be of interest in other data.


2005 ◽  
Author(s):  
◽  
Yanyan Sheng

As item response theory models gain increased popularity in large scale educational and measurement testing situations, many studies have been conducted on the development and applications of unidimensional and multidimensional models. However, to date, no study has yet looked at models in the IRT framework with an overall ability dimension underlying all test items and several ability dimensions specific for each subtest. This study is to propose such a model and compare it with the conventional IRT models using Bayesian methodology. The results suggest that the proposed model offers a better way to represent the test situations not realized in existing models. The model specifications for the proposed model also give rise to implications for test developers on test designing. In addition, the proposed IRT model can be applied in other areas, such as intelligence or psychology, among others.


2018 ◽  
Vol 43 (7) ◽  
pp. 543-561 ◽  
Author(s):  
Yuan-Pei Chang ◽  
Chia-Yi Chiu ◽  
Rung-Ching Tsai

Cognitive diagnostic computerized adaptive testing (CD-CAT) has been suggested by researchers as a diagnostic tool for assessment and evaluation. Although model-based CD-CAT is relatively well researched in the context of large-scale assessment systems, this type of system has not received the same degree of research and development in small-scale settings, such as at the course-based level, where this system would be the most useful. The main obstacle is that the statistical estimation techniques that are successfully applied within the context of a large-scale assessment require large samples to guarantee reliable calibration of the item parameters and an accurate estimation of the examinees’ proficiency class membership. Such samples are simply not obtainable in course-based settings. Therefore, the nonparametric item selection (NPS) method that does not require any parameter calibration, and thus, can be used in small educational programs is proposed in the study. The proposed nonparametric CD-CAT uses the nonparametric classification (NPC) method to estimate an examinee’s attribute profile and based on the examinee’s item responses, the item that can best discriminate the estimated attribute profile and the other attribute profiles is then selected. The simulation results show that the NPS method outperformed the compared parametric CD-CAT algorithms and the differences were substantial when the calibration samples were small.


2008 ◽  
Vol 216 (2) ◽  
pp. 89-101 ◽  
Author(s):  
Johannes Hartig ◽  
Jana Höhler

Multidimensional item response theory (MIRT) holds considerable promise for the development of psychometric models of competence. It provides an ideal foundation for modeling performance in complex domains, simultaneously taking into account multiple basic abilities. The aim of this paper is to illustrate the relations between a two-dimensional IRT model with between-item multidimensionality and a nested-factor model with within-item multidimensionality, and the different substantive meanings of the ability dimensions in the two models. Both models are applied to empirical data from a large-scale assessment of reading and listening comprehension in a foreign language. In the between-item model, performance in the reading and listening items is modeled by two separate dimensions. In the within-item model, one dimension represents the abilities common to both tests, and a second dimension represents abilities specific to listening comprehension. Distinct relations of external variables, such as gender and cognitive abilities, with ability scores demonstrate that the alternative models have substantively different implications.


2017 ◽  
Vol 33 (3) ◽  
pp. 181-189 ◽  
Author(s):  
Christoph J. Kemper ◽  
Michael Hock

Abstract. Anxiety Sensitivity (AS) denotes the tendency to fear anxiety-related sensations. Trait AS is an established risk factor for anxiety pathology. The Anxiety Sensitivity Index-3 (ASI-3) is a widely used measure of AS and its three most robust dimensions with well-established construct validity. At present, the dimensional conceptualization of AS, and thus, the construct validity of the ASI-3 is challenged. A latent class structure with two distinct and qualitatively different forms, an adaptive form (normative AS) and a maladaptive form (AS taxon, predisposing for anxiety pathology) was postulated. Item Response Theory (IRT) models were applied to item-level data of the ASI-3 in an attempt to replicate previous findings in a large nonclinical sample (N = 2,603) and to examine possible interpretations for the latent discontinuity observed. Two latent classes with a pattern of distinct responses to ASI-3 items were found. However, classes were indicative of participant’s differential use of the response scale (midpoint and extreme response style) rather than differing in AS content (adaptive and maladaptive AS forms). A dimensional structure of AS and the construct validity of the ASI-3 was supported.


Sign in / Sign up

Export Citation Format

Share Document