An Application of Item Response Time: The Effort-Moderated IRT Model

2006 ◽  
Vol 43 (1) ◽  
pp. 19-38 ◽  
Author(s):  
Steven L. Wise ◽  
Christine E. DeMars
2019 ◽  
Author(s):  
Kyosuke Bunji ◽  
Kensuke Okada

On the basis of a combination of linear ballistic accumulation (LBA) and item response theory (IRT), this paper proposes a new class of item response models, namely LBA IRT, which incorporates the observed response time by means of LBA. Our main objective is to develop a simple yet effective alternative to the diffusion IRT model, which is one of best-known response time (RT)-incorporating IRT models that explicitly models the underlying psychological process of the elicited item response. Through a simulation study, we show that the proposed model enables us to obtain the corresponding parameter estimates compared with the diffusion IRT model while achieving a much faster convergence speed. Furthermore, the application of the proposed model to real personality measurement data indicates that it fits the data better than the diffusion IRT model in terms of its predictive performance. Thus, the proposed model exhibits good performance and promising modeling capabilities in terms of capturing the cognitive and psychometric processes underlying the observed data.


1993 ◽  
Vol 18 (2) ◽  
pp. 131-154 ◽  
Author(s):  
John R. Donoghue ◽  
Nancy L. Allen

This Monte Carlo study examined strategies for forming the matching variable for the Mantel-Haenszel (MH) differential item functioning (DIF) procedure; thin matching on total test score was compared to forms of thick matching, pooling levels of the matching variable. Data were generated using a three-parameter logistic (3PL) item response theory (IRT) model with common guessing parameter. Number of subjects and test length were manipulated, as were the difficulty, discrimination, and presence/absence of DIF in the studied item. Outcome measures were the transformed log-odds &Deltacirc; MH, its standard error, and the MH chi-square statistic. For short tests (5 or 10 items), thin matching yielded very poor results, with a tendency to falsely identify items as possessing DIF against the reference group. The best methods of thick matching yielded outcome measure values closer to the expected value for non-DIF items, as well as a larger value than thin matching when the studied item possessed DIF. Intermediate length tests yielded similar results for thin matching and the best methods of thick matching. The method of thick matching that performed best depended on the measure used to detect DIF. Both difficulty and discrimination of the studied item were found to have a strong effect on the value of &Deltacirc; MH.


Psihologija ◽  
2012 ◽  
Vol 45 (2) ◽  
pp. 189-207 ◽  
Author(s):  
Bojana Dinic ◽  
Bojan Janicic

The aim of this research was to examine the psychometric properties of the Buss-Perry Aggression Questionnaire on Serbian sample, using the IRT model for graded responses. AQ contains four subscales: Physical aggression, Verbal aggression, Hostility and Anger. The sample included 1272 participants, both gender and age ranged from 18 to 68 years, with average age of 31.39 (SD = 12.63) years. Results of IRT analysis suggested that the subscales had greater information in the range of above-average scores, namely in participants with higher level of aggressiveness. The exception was Hostilisty subscale, because it was informative in the wider range of trait. On the other hand, this subscale contains two items which violate assumption of homogenity. Implications for measurement of aggressiveness are discussed.


2021 ◽  
pp. 43-48
Author(s):  
Rosa Fabbricatore ◽  
Francesco Palumbo

Evaluating learners' competencies is a crucial concern in education, and home and classroom structured tests represent an effective assessment tool. Structured tests consist of sets of items that can refer to several abilities or more than one topic. Several statistical approaches allow evaluating students considering the items in a multidimensional way, accounting for their structure. According to the evaluation's ending aim, the assessment process assigns a final grade to each student or clusters students in homogeneous groups according to their level of mastery and ability. The latter represents a helpful tool for developing tailored recommendations and remediations for each group. At this aim, latent class models represent a reference. In the item response theory (IRT) paradigm, the multidimensional latent class IRT models, releasing both the traditional constraints of unidimensionality and continuous nature of the latent trait, allow to detect sub-populations of homogeneous students according to their proficiency level also accounting for the multidimensional nature of their ability. Moreover, the semi-parametric formulation leads to several advantages in practice: It avoids normality assumptions that may not hold and reduces the computation demanding. This study compares the results of the multidimensional latent class IRT models with those obtained by a two-step procedure, which consists of firstly modeling a multidimensional IRT model to estimate students' ability and then applying a clustering algorithm to classify students accordingly. Regarding the latter, parametric and non-parametric approaches were considered. Data refer to the admission test for the degree course in psychology exploited in 2014 at the University of Naples Federico II. Students involved were N=944, and their ability dimensions were defined according to the domains assessed by the entrance exam, namely Humanities, Reading and Comprehension, Mathematics, Science, and English. In particular, a multidimensional two-parameter logistic IRT model for dichotomously-scored items was considered for students' ability estimation.


2017 ◽  
Vol 43 (3) ◽  
pp. 259-285 ◽  
Author(s):  
Yang Liu ◽  
Ji Seung Yang

The uncertainty arising from item parameter estimation is often not negligible and must be accounted for when calculating latent variable (LV) scores in item response theory (IRT). It is particularly so when the calibration sample size is limited and/or the calibration IRT model is complex. In the current work, we treat two-stage IRT scoring as a predictive inference problem: The target of prediction is a random variable that follows the true posterior of the LV conditional on the response pattern being scored. Various Bayesian, fiducial, and frequentist prediction intervals of LV scores, which can be obtained from a simple yet generic Monte Carlo recipe, are evaluated and contrasted via simulations based on several measures of prediction quality. An empirical data example is also presented to illustrate the use of candidate methods.


2006 ◽  
Vol 31 (1) ◽  
pp. 63-79 ◽  
Author(s):  
Henry May

A new method is presented and implemented for deriving a scale of socioeconomic status (SES) from international survey data using a multilevel Bayesian item response theory (IRT) model. The proposed model incorporates both international anchor items and nation-specific items and is able to (a) produce student family SES scores that are internationally comparable, (b) reduce the influence of irrelevant national differences in culture on the SES scores, and (c) effectively and efficiently deal with the problem of missing data in a manner similar to Rubin’s (1987) multiple imputation approach. The results suggest that this model is superior to conventional models in terms of its fit to the data and its ability to use information collected via international surveys.


1983 ◽  
Vol 8 (2) ◽  
pp. 137-156 ◽  
Author(s):  
Nancy S. Petersen ◽  
Linda L. Cook ◽  
Martha L. Stocking

Scale drift for the verbal and mathematical portions of the Scholastic Aptitude Test (SAT) was investigated using linear, equipercentile and item response theory (IRT) equating methods. The linear methods investigated were the Tucker, Levine Equally Reliable and Levine Unequally Reliable models. Three IRT calibration designs were employed. These designs are referred to as (1) concurrent, (2) fixed b’s method, and (3) characteristic curve transformation method. The results of the various equating methods were compared both graphically and analytically. These results indicated that for reasonably parallel tests, linear equating methods perform adequately. However, when tests differ somewhat in content and length, methods based on the three-parameter logistic IRT model lead to greater stability of equating results. Of the conventional equating methods investigated, the Levine Equally Reliable model appears to be the most robust for the type of equating situation used in this study. The IRT method that provided the most stable equating results overall was the concurrent calibration method.


Sign in / Sign up

Export Citation Format

Share Document