scholarly journals Evaluating Anchor-Item Designs for Concurrent Calibration With the GGUM

2016 ◽  
Vol 41 (2) ◽  
pp. 83-96 ◽  
Author(s):  
Seang-Hwane Joo ◽  
Philseok Lee ◽  
Stephen Stark

Concurrent calibration using anchor items has proven to be an effective alternative to separate calibration and linking for developing large item banks, which are needed to support continuous testing. In principle, anchor-item designs and estimation methods that have proven effective with dominance item response theory (IRT) models, such as the 3PL model, should also lead to accurate parameter recovery with ideal point IRT models, but surprisingly little research has been devoted to this issue. This study, therefore, had two purposes: (a) to develop software for concurrent calibration with, what is now the most widely used ideal point model, the generalized graded unfolding model (GGUM); (b) to compare the efficacy of different GGUM anchor-item designs and develop empirically based guidelines for practitioners. A Monte Carlo study was conducted to compare the efficacy of three anchor-item designs in vertical and horizontal linking scenarios. The authors found that a block-interlaced design provided the best parameter recovery in nearly all conditions. The implications of these findings for concurrent calibration with the GGUM and practical recommendations for pretest designs involving ideal point computer adaptive testing (CAT) applications are discussed.

2021 ◽  
pp. 014662162110138
Author(s):  
Joseph A. Rios ◽  
James Soland

Suboptimal effort is a major threat to valid score-based inferences. While the effects of such behavior have been frequently examined in the context of mean group comparisons, minimal research has considered its effects on individual score use (e.g., identifying students for remediation). Focusing on the latter context, this study addressed two related questions via simulation and applied analyses. First, we investigated how much including noneffortful responses in scoring using a three-parameter logistic (3PL) model affects person parameter recovery and classification accuracy for noneffortful responders. Second, we explored whether improvements in these individual-level inferences were observed when employing the Effort Moderated IRT (EM-IRT) model under conditions in which its assumptions were met and violated. Results demonstrated that including 10% noneffortful responses in scoring led to average bias in ability estimates and misclassification rates by as much as 0.15 SDs and 7%, respectively. These results were mitigated when employing the EM-IRT model, particularly when model assumptions were met. However, once model assumptions were violated, the EM-IRT model’s performance deteriorated, though still outperforming the 3PL model. Thus, findings from this study show that (a) including noneffortful responses when using individual scores can lead to potential unfounded inferences and potential score misuse, and (b) the negative impact that noneffortful responding has on person ability estimates and classification accuracy can be mitigated by employing the EM-IRT model, particularly when its assumptions are met.


2008 ◽  
Vol 24 (1) ◽  
pp. 65-77 ◽  
Author(s):  
Anke M. Weekers ◽  
Rob R. Meijer

Abstract. Stark, Chernyshenko, Drasgow, and Williams (2006) and Chernyshenko, Stark, Drasgow, and Roberts (2007) suggested that unfolding item response theory (IRT) models are important alternatives to dominance IRT models to describe the response processes on self-report personality inventories. To obtain more insight into the structure of personality data, we investigated whether dominance or unfolding IRT models are a better description of the response processes on personality trait inventories constructed using dominance response processes or ideal-point response processes. Data from 866 adolescents on a Dutch personality inventory, the NPV-J ( Luteijn, van Dijk, & Barelds, 2005 ), and from 704 adolescents on a Dutch translation of an Order scale ( Chernyshenko et al., 2007 ) were used. Results from Stark et al. (2006) and Chernyshenko et al. (2007) were partly supported. The self-report inventory that was constructed using dominance response processes (NPV-J) consisted mostly of items with monotonically increasing item response functions (IRFs), but some IRFs were single-peaked. The Order scale (constructed on the basis of ideal-point response processes) consisted of items with monotonically increasing, decreasing, and single-peaked IRFs. Further implications for personality test construction are discussed.


2020 ◽  
Vol 8 (1) ◽  
pp. 5 ◽  
Author(s):  
Paul-Christian Bürkner

Raven’s Standard Progressive Matrices (SPM) test and related matrix-based tests are widely applied measures of cognitive ability. Using Bayesian Item Response Theory (IRT) models, I reanalyzed data of an SPM short form proposed by Myszkowski and Storme (2018) and, at the same time, illustrate the application of these models. Results indicate that a three-parameter logistic (3PL) model is sufficient to describe participants dichotomous responses (correct vs. incorrect) while persons’ ability parameters are quite robust across IRT models of varying complexity. These conclusions are in line with the original results of Myszkowski and Storme (2018). Using Bayesian as opposed to frequentist IRT models offered advantages in the estimation of more complex (i.e., 3–4PL) IRT models and provided more sensible and robust uncertainty estimates.


2021 ◽  
Author(s):  
Joseph Rios ◽  
Jim Soland

Suboptimal effort is a major threat to valid score-based inferences. While the effects of such behavior have been frequently examined in the context of mean group comparisons, minimal research has considered its effects on individual score use (e.g., identifying students for remediation). Focusing on the latter context, this study addressed two related questions via simulation and applied analyses. First, we investigated how much including noneffortful responses in scoring using a three-parameter logistic (3PL) model affects person parameter recovery and classification accuracy for noneffortful responders. Second, we explored whether improvements in these individual-level inferences were observed when employing the Effort Moderated IRT (EM-IRT) model under conditions in which its assumptions were met and violated. Results demonstrated that including 10% noneffortful responses in scoring led to average bias in ability estimates and misclassification rates by as much as 0.15 SDs and 7% respectively. These results were mitigated when employing the EM-IRT model, particularly when model assumptions were met. However, once model assumptions were violated, the EM-IRT model’s performance deteriorated, though still outperforming the 3PL model. Thus, findings from this study show that: (a) including noneffortful responses when using individual scores can lead to potential unfounded inferences and potential score misuse; and (b) the negative impact that noneffortful responding has on person ability estimates and classification accuracy can be mitigated by employing the EM-IRT model, particularly when its assumptions are met.


2019 ◽  
Vol 80 (2) ◽  
pp. 293-311
Author(s):  
Igor Himelfarb ◽  
Katerina M. Marcoulides ◽  
Guoliang Fang ◽  
Bruce L. Shotts

The chiropractic clinical competency examination uses groups of items that are integrated by a common case vignette. The nature of the vignette items violates the assumption of local independence for items nested within a vignette. This study examines via simulation a new algorithmic approach for addressing the local independence violation problem using a two-level alternating directions testlet model. Parameter values for item difficulty, discrimination, test-taker ability, and test-taker secondary abilities associated with a particular testlet are generated and parameter recovery through Markov Chain Monte Carlo Bayesian methods and generalized maximum likelihood estimation methods are compared. To aid with the complex computational efforts, the novel so-called TensorFlow platform is used. Both estimation methods provided satisfactory parameter recovery, although the Bayesian methods were found to be somewhat superior in recovering item discrimination parameters. The practical significance of the results are discussed in relation to obtaining accurate estimates of item, test, ability parameters, and measurement reliability information.


2018 ◽  
Vol 43 (3) ◽  
pp. 195-210 ◽  
Author(s):  
Chen-Wei Liu ◽  
Wen-Chung Wang

It is commonly known that respondents exhibit different response styles when responding to Likert-type items. For example, some respondents tend to select the extreme categories (e.g., strongly disagree and strongly agree), whereas some tend to select the middle categories (e.g., disagree, neutral, and agree). Furthermore, some respondents tend to disagree with every item (e.g., strongly disagree and disagree), whereas others tend to agree with every item (e.g., agree and strongly agree). In such cases, fitting standard unfolding item response theory (IRT) models that assume no response style will yield a poor fit and biased parameter estimates. Although there have been attempts to develop dominance IRT models to accommodate the various response styles, such models are usually restricted to a specific response style and cannot be used for unfolding data. In this study, a general unfolding IRT model is proposed that can be combined with a softmax function to accommodate various response styles via scoring functions. The parameters of the new model can be estimated using Bayesian Markov chain Monte Carlo algorithms. An empirical data set is used for demonstration purposes, followed by simulation studies to assess the parameter recovery of the new model, as well as the consequences of ignoring the impact of response styles on parameter estimators by fitting standard unfolding IRT models. The results suggest the new model to exhibit good parameter recovery and seriously biased estimates when the response styles are ignored.


2011 ◽  
Vol 35 (4) ◽  
pp. 280-295 ◽  
Author(s):  
Louis Tay ◽  
Usama S. Ali ◽  
Fritz Drasgow ◽  
Bruce Williams
Keyword(s):  

2018 ◽  
Vol 43 (2) ◽  
pp. 172-173 ◽  
Author(s):  
Jorge N. Tendeiro ◽  
Sebastian Castro-Alvarez

In this article, the newly created GGUM R package is presented. This package finally brings the generalized graded unfolding model (GGUM) to the front stage for practitioners and researchers. It expands the possibilities of fitting this type of item response theory (IRT) model to settings that, up to now, were not possible (thus, beyond the limitations imposed by the widespread GGUM2004 software). The outcome is therefore a unique software, not limited by the dimensions of the data matrix or the operating system used. It includes various routines that allow fitting the model, checking model fit, plotting the results, and also interacting with GGUM2004 for those interested. The software should be of interest to all those who are interested in IRT in general or to ideal point models in particular.


2021 ◽  
pp. 107699862110571
Author(s):  
Kuan-Yu Jin ◽  
Yi-Jhen Wu ◽  
Hui-Fang Chen

For surveys of complex issues that entail multiple steps, multiple reference points, and nongradient attributes (e.g., social inequality), this study proposes a new multiprocess model that integrates ideal-point and dominance approaches into a treelike structure (IDtree). In the IDtree, an ideal-point approach describes an individual’s attitude and then a dominance approach describes their tendency for using extreme response categories. Evaluation of IDtree performance via two empirical data sets showed that the IDtree fit these data better than other models. Furthermore, simulation studies showed a satisfactory parameter recovery of the IDtree. Thus, the IDtree model sheds light on the response processes of a multistage structure.


Sign in / Sign up

Export Citation Format

Share Document