true score equating
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 7)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Vol 12 ◽  
Author(s):  
Patrícia Silva Lúcio ◽  
Fausto Coutinho Lourenço ◽  
Hugo Cogo-Moreira ◽  
Deborah Bandalos ◽  
Carolina Alves Ferreira de Carvalho ◽  
...  

Equating is used to directly compare alternate forms of tests. We describe the equating of two alternative forms of a reading comprehension test for Brazilian children (2nd to 5th grade), Form A (n = 427) and Form B (n = 321). We employed non-equivalent random groups design with internal anchor items. Local independence was attested via standardized residual Pearson's bivariate correlation. First, from 176 items, we selected 42 in each form (33 unique and 9 in common) using 2PL model, a one-dimensional item response theory (IRT) model. Using the equateIRT package for R, the anchor items were used to link both forms. Linking coefficients were estimated under two different methods (Haebara and Stocking–Lord), resulting in scores equating by two methods: observed score equating (OSE) and true score equating (TSE). We provided reference-specific age-intervals for the sample. The final version was informative for a wide range of theta abilities. We concluded that the forms could be used interchangeably.


2021 ◽  
pp. 014662162110131
Author(s):  
Zhonghua Zhang

In this study, the delta method was applied to estimate the standard errors of the true score equating when using the characteristic curve methods with the generalized partial credit model in test equating under the context of the common-item nonequivalent groups equating design. Simulation studies were further conducted to compare the performance of the delta method with that of the bootstrap method and the multiple imputation method. The results indicated that the standard errors produced by the delta method were very close to the criterion empirical standard errors as well as those yielded by the bootstrap method and the multiple imputation method under all the manipulated conditions.


2019 ◽  
Vol 44 (4) ◽  
pp. 296-310
Author(s):  
Yong He ◽  
Zhongmin Cui

Item parameter estimates of a common item on a new test form may change abnormally due to reasons such as item overexposure or change of curriculum. A common item, whose change does not fit the pattern implied by the normally behaved common items, is defined as an outlier. Although improving equating accuracy, detecting and eliminating of outliers may cause a content imbalance among common items. Robust scale transformation methods have recently been proposed to solve this problem when only one outlier is present in the data, although it is not uncommon to see multiple outliers in practice. In this simulation study, the authors examined the robust scale transformation methods under conditions where there were multiple outlying common items. Results indicated that the robust scale transformation methods could reduce the influences of multiple outliers on scale transformation and equating. The robust methods performed similarly to a traditional outlier detection and elimination method in terms of reducing the influence of outliers while keeping adequate content balance.


2019 ◽  
Vol 44 (3) ◽  
pp. 215-218
Author(s):  
Kyung Yong Kim ◽  
Uk Hyun Cho

Item response theory (IRT) true-score equating for the bifactor model is often conducted by first numerically integrating out specific factors from the item response function and then applying the unidimensional IRT true-score equating method to the marginalized bifactor model. However, an alternative procedure for obtaining the marginalized bifactor model is through projecting the nuisance dimensions of the bifactor model onto the dominant dimension. Projection, which can be viewed as an approximation to numerical integration, has an advantage over numerical integration in providing item parameters for the marginalized bifactor model; therefore, projection could be used with existing equating software packages that require item parameters. In this paper, IRT true-score equating results obtained with projection are compared to those obtained with numerical integration. Simulation results show that the two procedures provide very similar equating results.


2019 ◽  
Vol 80 (1) ◽  
pp. 91-125
Author(s):  
Stella Y. Kim ◽  
Won-Chan Lee ◽  
Michael J. Kolen

A theoretical and conceptual framework for true-score equating using a simple-structure multidimensional item response theory (SS-MIRT) model is developed. A true-score equating method, referred to as the SS-MIRT true-score equating (SMT) procedure, also is developed. SS-MIRT has several advantages over other complex multidimensional item response theory models including improved efficiency in estimation and straightforward interpretability. The performance of the SMT procedure was examined and evaluated through four studies using different data types. In these studies, results from the SMT procedure were compared with results from four other equating methods to assess the relative benefits of SMT compared with the other procedures. In general, SMT showed more accurate equating results compared with the traditional unidimensional IRT (UIRT) equating when the data were multidimensional. More accurate performance of SMT over UIRT true-score equating was consistently observed across the studies, which supports the benefits of a multidimensional approach in equating for multidimensional data. Also, SMT performed similarly to a SS-MIRT observed score method across all studies.


2016 ◽  
Vol 76 (6) ◽  
pp. 954-975 ◽  
Author(s):  
Dimiter M. Dimitrov

This article describes an approach to test scoring, referred to as delta scoring ( D-scoring), for tests with dichotomously scored items. The D-scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The D-score is computed from the examinee’s response vector, which is weighted by the expected difficulties (not “easiness”) of the test items. The expected difficulty of each item is obtained as an analytic function of its IRT parameters. The D-scores are independent of the sample of test-takers as they are based on expected item difficulties. It is shown that the D-scale performs a good bit better than the IRT logit scale by criteria of scale intervalness. To equate D-scales, it is sufficient to rescale the item parameters, thus avoiding tedious and error-prone procedures of mapping test characteristic curves under the method of IRT true score equating, which is often used in the practice of large-scale testing. The proposed D-scaling proved promising under its current piloting with large-scale assessments and the hope is that it can efficiently complement IRT procedures in the practice of large-scale testing in the field of education and psychology.


2015 ◽  
Vol 1 (1) ◽  
pp. 100
Author(s):  
Rahmawati Rahmawati ◽  
Djemari Mardapi

This study is aimed at: (1) revising the criterion used in Robust Z Method for detecting Item Parameter Drift (IPD), (2) identifying the strengths and weaknesses of the modified Robust Z Method, and (3) investigating the effect of IPD on examinees’ classification consistency using empirical data. This study used two types of data. The simulated data were in the form of responses of 20,000 students on 40 dichotomous items generated by simulating six variables including: (1) ability distribution, (2) differences of groups’ ability between groups, (3) type of drifting, (4) magnitude of drifting, (5) anchor test length, and (6) number of drifting items. The empirical data was 4,187,444 students’ response of UN SD/MI 2011 who administered 41 test forms of Indonesian language, mathematics, and science. Modified Robust Z method was used to detect IPD and the IRT true score equating method was used to analyze the classification consistency. The results of this study show that: (1) the criterion of 0.5 point raw score TCC difference leads to 100% consistency on passing classification, (2) the modified Robust Z is accurate to detect the b and ab- drifting when the minimal length of anchor test is 25%, (3) IPD occurring on empirical data affected the passing status of more than 2,000 students.


Sign in / Sign up

Export Citation Format

Share Document