A Comparison of the Accuracy of Multidimensional IRT equating methods

The asymptotic standard errors of the estimates of the equated scores by several types of item response theory (IRT) true score equatings are provided. The first group of equatings do not use IRT equating coefficients. The second group of equatings use the IRT equating coefficients given by the moment or characteristic curve methods. The equating designs considered in this article cover those with internal or external common items and the methods with separate or simultaneous estimation of item parameters of associated tests. For the estimates of the asymptotic standard errors of the equated true scores, the method of marginal maximum likelihood estimation is employed for estimation of item parameters.

Download Full-text

A Comparison of Two Procedures for Computing IRT Equating Coefficients

Journal of Educational Measurement ◽

10.1111/j.1745-3984.1991.tb00350.x ◽

1991 ◽

Vol 28 (2) ◽

pp. 147-162 ◽

Cited By ~ 50

Author(s):

Frank B. Baker ◽

Ali Al-Karni

Keyword(s):

Irt Equating ◽

Equating Coefficients

Download Full-text

A comparison of linear and equipercentile equating and IRT equating with FIPC across multidimensional test forms for non-equivalent groups

International Journal of Quantitative Research in Education ◽

10.1504/ijqre.2019.10021821 ◽

2019 ◽

Vol 4 (4) ◽

pp. 293

Author(s):

Ki Cole ◽

Mwarumba Mwavita ◽

Sohee Kim

Keyword(s):

Equipercentile Equating ◽

Irt Equating

Download Full-text

Irt Versus Conventional Equating Methods: A Comparative Study of Scale Stability

Journal of Educational Statistics ◽

10.3102/10769986008002137 ◽

1983 ◽

Vol 8 (2) ◽

pp. 137-156 ◽

Cited By ~ 19

Author(s):

Nancy S. Petersen ◽

Linda L. Cook ◽

Martha L. Stocking

Keyword(s):

Item Response ◽

Characteristic Curve ◽

Calibration Method ◽

Transformation Method ◽

Scholastic Aptitude Test ◽

Irt Model ◽

Scholastic Aptitude ◽

Reliable Model ◽

Parallel Tests ◽

Irt Equating

Scale drift for the verbal and mathematical portions of the Scholastic Aptitude Test (SAT) was investigated using linear, equipercentile and item response theory (IRT) equating methods. The linear methods investigated were the Tucker, Levine Equally Reliable and Levine Unequally Reliable models. Three IRT calibration designs were employed. These designs are referred to as (1) concurrent, (2) fixed b’s method, and (3) characteristic curve transformation method. The results of the various equating methods were compared both graphically and analytically. These results indicated that for reasonably parallel tests, linear equating methods perform adequately. However, when tests differ somewhat in content and length, methods based on the three-parameter logistic IRT model lead to greater stability of equating results. Of the conventional equating methods investigated, the Levine Equally Reliable model appears to be the most robust for the type of equating situation used in this study. The IRT method that provided the most stable equating results overall was the concurrent calibration method.

Download Full-text