scholarly journals A Model for Detecting Lack of Invariance for Item Responses and Response Times

2014 ◽  
Author(s):  
Emily Hailey
2020 ◽  
pp. 001316442096863
Author(s):  
Kaiwen Man ◽  
Jeffrey R. Harring

Many approaches have been proposed to jointly analyze item responses and response times to understand behavioral differences between normally and aberrantly behaved test-takers. Biometric information, such as data from eye trackers, can be used to better identify these deviant testing behaviors in addition to more conventional data types. Given this context, this study demonstrates the application of a new method for multiple-group analysis that concurrently models item responses, response times, and visual fixation counts collected from an eye-tracker. It is hypothesized that differences in behavioral patterns between normally behaved test-takers and those who have different levels of preknowledge about the test items will manifest in latent characteristics of the different data types. A Bayesian estimation scheme is used to fit the proposed model to experimental data and the results are discussed.


2021 ◽  
Vol 12 ◽  
Author(s):  
Denise Reis Costa ◽  
Maria Bolsinova ◽  
Jesper Tijmstra ◽  
Björn Andersson

Log-file data from computer-based assessments can provide useful collateral information for estimating student abilities. In turn, this can improve traditional approaches that only consider response accuracy. Based on the amounts of time students spent on 10 mathematics items from the PISA 2012, this study evaluated the overall changes in and measurement precision of ability estimates and explored country-level heterogeneity when combining item responses and time-on-task measurements using a joint framework. Our findings suggest a notable increase in precision with the incorporation of response times and indicate differences between countries in how respondents approached items as well as in their response processes. Results also showed that additional information could be captured through differences in the modeling structure when response times were included. However, such information may not reflect the testing objective.


2019 ◽  
Vol 79 (5) ◽  
pp. 931-961 ◽  
Author(s):  
Cengiz Zopluoglu

Researchers frequently use machine-learning methods in many fields. In the area of detecting fraud in testing, there have been relatively few studies that have used these methods to identify potential testing fraud. In this study, a technical review of a recently developed state-of-the-art algorithm, Extreme Gradient Boosting (XGBoost), is provided and the utility of XGBoost in detecting examinees with potential item preknowledge is investigated using a real data set that includes examinees who engaged in fraudulent testing behavior, such as illegally obtaining live test content before the exam. Four different XGBoost models were trained using different sets of input features based on (a) only dichotomous item responses, (b) only nominal item responses, (c) both dichotomous item responses and response times, and (d) both nominal item responses and response times. The predictive performance of each model was evaluated using the area under the receiving operating characteristic curve and several classification measures such as the false-positive rate, true-positive rate, and precision. For comparison purposes, the results from two person-fit statistics on the same data set were also provided. The results indicated that XGBoost successfully classified the honest test takers and fraudulent test takers with item preknowledge. Particularly, the classification performance of XGBoost was reasonably good when the response time information and item responses were both taken into account.


2020 ◽  
Author(s):  
Benjamin Domingue ◽  
Klint Kanopka ◽  
Ben Stenhaug ◽  
Jim Soland ◽  
Megan Kuhfeld ◽  
...  

As our ability to collect data about respondents increases, approaches for incorporating ancillary data features such as response time are of heightened interest. Models for response time have been advanced, but relatively limited large-scale empirical investigations have been conducted. We take advantage of a unique and massive dataset—data from computer adaptive administrations of the NWEA MAP Growth assessment in two states consisting of roughly 1/4 billion item responses—containing both item responses plus response times to shed light on emergent features of response time behavior. We focus on two behaviors in particular. The first, response acceleration, is a reduction in response time for responses that occur relatively late on the assessment. We further note that such reductions are heterogeneous as a function of estimated ability (lower ability estimates are associated with larger increases in acceleration) and that reductions in response time on later items lead to reductions in accuracy relative to expectation. We also document variation in interplay between speed and accuracy. In some cases, additional time spent on an item is associated with an increase in accuracy; in other cases, the opposite is true. This finding has potential connections to the nascent literature on different within-person response processes. We argue that our approach may be useful in other settings and that the behaviors observed here should be of interest in other data.


Psychometrika ◽  
2021 ◽  
Author(s):  
Udo Boehm ◽  
Maarten Marsman ◽  
Han L. J. van der Maas ◽  
Gunter Maris

AbstractThe emergence of computer-based assessments has made response times, in addition to response accuracies, available as a source of information about test takers’ latent abilities. The development of substantively meaningful accounts of the cognitive process underlying item responses is critical to establishing the validity of psychometric tests. However, existing substantive theories such as the diffusion model have been slow to gain traction due to their unwieldy functional form and regular violations of model assumptions in psychometric contexts. In the present work, we develop an attention-based diffusion model based on process assumptions that are appropriate for psychometric applications. This model is straightforward to analyse using Gibbs sampling and can be readily extended. We demonstrate our model’s good computational and statistical properties in a comparison with two well-established psychometric models.


2019 ◽  
Vol 43 (8) ◽  
pp. 639-654 ◽  
Author(s):  
Kaiwen Man ◽  
Jeffrey R. Harring ◽  
Hong Jiao ◽  
Peida Zhan

Computer-based testing (CBT) is becoming increasingly popular in assessing test-takers’ latent abilities and making inferences regarding their cognitive processes. In addition to collecting item responses, an important benefit of using CBT is that response times (RTs) can also be recorded and used in subsequent analyses. To better understand the structural relations between multidimensional cognitive attributes and the working speed of test-takers, this research proposes a joint-modeling approach that integrates compensatory multidimensional latent traits and response speediness using item responses and RTs. The joint model is cast as a multilevel model in which the structural relation between working speed and accuracy are connected through their variance-covariance structures. The feasibility of this modeling approach is investigated via a Monte Carlo simulation study using a Bayesian estimation scheme. The results indicate that integrating RTs increased model parameter recovery and precision. In addition, Program of International Student Assessment (PISA) 2015 mathematics standard unit items are analyzed to further evaluate the feasibility of the approach to recover model parameters.


2021 ◽  
pp. 001316442110453
Author(s):  
Gabriel Nagy ◽  
Esther Ulitzsch

Disengaged item responses pose a threat to the validity of the results provided by large-scale assessments. Several procedures for identifying disengaged responses on the basis of observed response times have been suggested, and item response theory (IRT) models for response engagement have been proposed. We outline that response time-based procedures for classifying response engagement and IRT models for response engagement are based on common ideas, and we propose the distinction between independent and dependent latent class IRT models. In all IRT models considered, response engagement is represented by an item-level latent class variable, but the models assume that response times either reflect or predict engagement. We summarize existing IRT models that belong to each group and extend them to increase their flexibility. Furthermore, we propose a flexible multilevel mixture IRT framework in which all IRT models can be estimated by means of marginal maximum likelihood. The framework is based on the widespread Mplus software, thereby making the procedure accessible to a broad audience. The procedures are illustrated on the basis of publicly available large-scale data. Our results show that the different IRT models for response engagement provided slightly different adjustments of item parameters of individuals’ proficiency estimates relative to a conventional IRT model.


Sign in / Sign up

Export Citation Format

Share Document