scholarly journals The Delta-Scoring Method of Tests With Binary Items: A Note on True Score Estimation and Equating

2017 ◽  
Vol 78 (5) ◽  
pp. 805-825 ◽  
Author(s):  
Dimiter M. Dimitrov

This article presents some new developments in the methodology of an approach to scoring and equating of tests with binary items, referred to as delta scoring (D-scoring), which is under piloting with large-scale assessments at the National Center for Assessment in Saudi Arabia. This presentation builds on a previous work on delta scoring and adds procedures for scaling and equating, item response function, and estimation of true values and standard errors of D scores. Also, unlike the previous work on this topic, where D-scoring involves estimates of item and person parameters in the framework of item response theory, the approach presented here does not require item response theory calibration.

2001 ◽  
Vol 26 (1) ◽  
pp. 31-50 ◽  
Author(s):  
Haruhiko Ogasawara

The asymptotic standard errors of the estimates of the equated scores by several types of item response theory (IRT) true score equatings are provided. The first group of equatings do not use IRT equating coefficients. The second group of equatings use the IRT equating coefficients given by the moment or characteristic curve methods. The equating designs considered in this article cover those with internal or external common items and the methods with separate or simultaneous estimation of item parameters of associated tests. For the estimates of the asymptotic standard errors of the equated true scores, the method of marginal maximum likelihood estimation is employed for estimation of item parameters.


2020 ◽  
Vol 2 (1) ◽  
pp. 90-105
Author(s):  
Jimmy Y. Zhong

AbstractFocusing on 12 allocentric/survey-based strategy items of the Navigation Strategy Questionnaire (Zhong & Kozhevnikov, 2016), the current study applied item response theory-based analysis to determine whether a bidimensional model could better describe the latent structure of the survey-based strategy. Results from item and model fit diagnostics, categorical response and item information curves showed that an item with the lowest rotated component loading (.27) [SURVEY12], could be considered for exclusion in future studies; and that a bidimensional model with three preference-related items constituting a content factor offered a better representation of the latent structure than a unidimensional model per se. Mean scores from these three items also correlated significantly with a pointing-to-landmarks task to the same relative magnitude as the mean scores from all items, and all items excluding SURVEY12. These findings gave early evidence suggesting that the three preference-related items could constitute a subscale for deriving quick estimates of large-scale allocentric spatial processing in healthy adults in both experimental and clinical settings. Potential cognitive and brain mechanisms were discussed, followed by calls for future studies to gather greater evidence confirming the predictive validity of the full and sub scales, along with the design of new items focusing on environmental familiarity.


2018 ◽  
Author(s):  
Jimmy Y. Zhong

Focusing on 12 allocentric/survey-based strategy items of the Navigation Strategy Questionnaire (Zhong & Kozhevnikov, 2016), the current study applied item response theory-based analysis to determine whether a bidimensional model could better describe the latent structure of the survey-based strategy. Results from item and model fit diagnostics, categorical response and item information curves showed that an item with the lowest rotated component loading (.27) [SURVEY12], could be considered for exclusion in future studies; and that a bidimensional model with three preference-related items constituting a content factor offered a better representation of the latent structure than a unidimensional model per se. Mean scores from these three items also correlated significantly with a pointing-to-landmarks task to the same relative magnitude as the mean scores from all items, and all items excluding SURVEY12. These findings gave early evidence suggesting that the three preference-related items could constitute a subscale for deriving quick estimates of large-scale allocentric spatial processing in healthy adults in both experimental and clinical settings. Potential cognitive and brain mechanisms were discussed, followed by calls for future studies to gather greater evidence confirming the predictive validity of the full and sub scales, along with the design of new items focusing on environmental familiarity. [COPYRIGHT CC-BY-NC-ND 4.0 J. Y. ZHONG 2018]. AUTHOR'S NOTE: Officially published as "Reanalysis of an Allocentric Navigation Strategy Scale based on Item Response Theory"


2019 ◽  
Vol 80 (1) ◽  
pp. 91-125
Author(s):  
Stella Y. Kim ◽  
Won-Chan Lee ◽  
Michael J. Kolen

A theoretical and conceptual framework for true-score equating using a simple-structure multidimensional item response theory (SS-MIRT) model is developed. A true-score equating method, referred to as the SS-MIRT true-score equating (SMT) procedure, also is developed. SS-MIRT has several advantages over other complex multidimensional item response theory models including improved efficiency in estimation and straightforward interpretability. The performance of the SMT procedure was examined and evaluated through four studies using different data types. In these studies, results from the SMT procedure were compared with results from four other equating methods to assess the relative benefits of SMT compared with the other procedures. In general, SMT showed more accurate equating results compared with the traditional unidimensional IRT (UIRT) equating when the data were multidimensional. More accurate performance of SMT over UIRT true-score equating was consistently observed across the studies, which supports the benefits of a multidimensional approach in equating for multidimensional data. Also, SMT performed similarly to a SS-MIRT observed score method across all studies.


Author(s):  
Heon-Jae Jeong ◽  
Hsun-Hsiang Liao ◽  
Su Ha Han ◽  
Wui-Chiang Lee

Patient safety culture is important in preventing medical errors. Thus, many instruments have been developed to measure it. Yet, few studies focus on the data processing step. This study, by analyzing the Chinese version of the Safety Attitudes Questionnaire dataset that contained 37,163 questionnaires collected in Taiwan, found critical issues related to the currently used mean scoring method: The instrument, like other popular ones, uses a 5-point Likert scale, and because it is an ordinal scale, the mean scores cannot be calculated. Instead, Item Response Theory (IRT) was applied. The construct validity was satisfactory and the item properties of the instrument were estimated from confirmatory factor analysis. The IRT-based domain scores and mean domain scores of each respondent were estimated and compared. As for resolution, the mean approach yielded only around 20 unique values on a 0 to 100 scale for each domain; the IRT method yielded at least 440 unique values. Meanwhile, IRT scores ranged widely at each unique mean score, meaning that the precision of the mean approach was less reliable. The theoretical soundness and empirical strength of IRT suggest that healthcare institutions should adopt IRT as a new scoring method, which is the core step of processing collected data.


2019 ◽  
Vol 80 (3) ◽  
pp. 461-475
Author(s):  
Lianne Ippel ◽  
David Magis

In dichotomous item response theory (IRT) framework, the asymptotic standard error (ASE) is the most common statistic to evaluate the precision of various ability estimators. Easy-to-use ASE formulas are readily available; however, the accuracy of some of these formulas was recently questioned and new ASE formulas were derived from a general asymptotic theory framework. Furthermore, exact standard errors were suggested to better evaluate the precision of ability estimators, especially with short tests for which the asymptotic framework is invalid. Unfortunately, the accuracy of exact standard errors was assessed so far only in a very limiting setting. The purpose of this article is to perform a global comparison of exact versus (classical and new formulations of) asymptotic standard errors, for a wide range of usual IRT ability estimators, IRT models, and with short tests. Results indicate that exact standard errors globally outperform the ASE versions in terms of reduced bias and root mean square error, while the new ASE formulas are also globally less biased than their classical counterparts. Further discussion about the usefulness and practical computation of exact standard errors are outlined.


Sign in / Sign up

Export Citation Format

Share Document