A comparison of linear and equipercentile equating and IRT equating with FIPC across multidimensional test forms for non-equivalent groups

Author(s):  
Ki Cole ◽  
Sohee Kim ◽  
Mwarumba Mwavita
2013 ◽  
Vol 113 (1) ◽  
pp. 291-313
Author(s):  
Xiuyuan Zhang ◽  
Paul A. McDermott ◽  
John W. Fantuzzo ◽  
Vivian L. Gadsden

A multiscale criterion-referenced test that featured two presumably equivalent forms (A and B), was administered to 1,667 Head Start children at each of four points over an academic year. Using a randomly equivalent groups design, three equating methods were applied: common-item IRT equating using concurrent calibration, linear transformation, and equipercentile transformation. The methods were compared by examining mean score differences, weighted mean squared difference, and Kolmogorov's D statistics for each subscale. The results indicated that over time the IRT equating method and conventional equating methods exhibited different patterns of discrepancy between the two test forms. IRT equating yielded marginally smaller form-to-form mean score differences and generated slightly f ewer distributional discrepancies between Forms A and B than both linear and equipercentile equating. However, the results were mixed indicating that more studies are needed to provide additional information on the relative merits and weaknesses of each approach.


2001 ◽  
Vol 26 (1) ◽  
pp. 31-50 ◽  
Author(s):  
Haruhiko Ogasawara

The asymptotic standard errors of the estimates of the equated scores by several types of item response theory (IRT) true score equatings are provided. The first group of equatings do not use IRT equating coefficients. The second group of equatings use the IRT equating coefficients given by the moment or characteristic curve methods. The equating designs considered in this article cover those with internal or external common items and the methods with separate or simultaneous estimation of item parameters of associated tests. For the estimates of the asymptotic standard errors of the equated true scores, the method of marginal maximum likelihood estimation is employed for estimation of item parameters.


1984 ◽  
Vol 9 (1) ◽  
pp. 25-44 ◽  
Author(s):  
Michael J. Kolen

An analytic procedure for smoothing in equipercentile equating using cubic smoothing splines is described and illustrated. The effectiveness of the procedure is judged by comparing the results from smoothed equipercentile equating with those from other equating methods using multiple cross-validations for a variety of sample sizes. Data on randomly equivalent groups of approximately 3,000 examinees per form from four forms of each of the four tests of the ACT Assessment Program (AAP) were used in this evaluation. Relative to the other equating procedures studied, smoothed equipercentile equating was found to be most adequate for the AAP, especially for the most dissimilar form pairs.


1983 ◽  
Vol 8 (2) ◽  
pp. 137-156 ◽  
Author(s):  
Nancy S. Petersen ◽  
Linda L. Cook ◽  
Martha L. Stocking

Scale drift for the verbal and mathematical portions of the Scholastic Aptitude Test (SAT) was investigated using linear, equipercentile and item response theory (IRT) equating methods. The linear methods investigated were the Tucker, Levine Equally Reliable and Levine Unequally Reliable models. Three IRT calibration designs were employed. These designs are referred to as (1) concurrent, (2) fixed b’s method, and (3) characteristic curve transformation method. The results of the various equating methods were compared both graphically and analytically. These results indicated that for reasonably parallel tests, linear equating methods perform adequately. However, when tests differ somewhat in content and length, methods based on the three-parameter logistic IRT model lead to greater stability of equating results. Of the conventional equating methods investigated, the Levine Equally Reliable model appears to be the most robust for the type of equating situation used in this study. The IRT method that provided the most stable equating results overall was the concurrent calibration method.


2008 ◽  
Vol 25 (2) ◽  
pp. 187-210 ◽  
Author(s):  
Chisato Saida ◽  
Tamaki Hattori
Keyword(s):  
Post Hoc ◽  

Sign in / Sign up

Export Citation Format

Share Document