On the Integrity of Reliability Estimation in Classical Test Theory: The Case for an Additive Coefficient of Stability

2008 ◽  
Vol 103 (2) ◽  
pp. 545-565
Author(s):  
Gilbert Becker

This article addresses deficiencies in the most widely used estimators of reliability and draws attention to the reason that this issue is important. Accurate calibration of relationships between constructs is critical to theory development. Unless workers have accurate estimates of scale reliability, accurate estimates of those relationships will not be forthcoming because the classical disattenuation formula requires them. This article shows that classical test theory can easily accommodate the delineation of its error component E in test scores into two sources, inconsistency across content ( E1) and inconsistency across time ( E2). Viewed from this extended model, the alternate forms approach to reliability estimation is complete in that it gauges simultaneously both sources of error. Because that approach is rarely used today for that purpose, the integrity of estimation has been lost. In its place arose estimators of partial reliability—those for estimating generalizability over one medium or the other, but not both, thereby precluding the additivity of error components. Recent developments promise to restore the integrity of the alternate forms approach without the need for alternate forms and suggest an additive alternative to the current nonadditive coefficient of stability.

2006 ◽  
Vol 32 (4) ◽  
Author(s):  
G K Huysamen

Reliability is conceptually defined in terms of consistency across test occasions but coefficient alpha, the most popular reliability estimation method, precludes the examination of such consistency. Three recent proposals to estimate transient error separately within a classical test theory tradition, and the results that they have yielded are reviewed. The merits of these proposals are compared with those of generalisability theory which differentiates between different sources of error variation. Although the procedures reviewed cannot match the advantages of generalisability theory, they may be sufficient in many applications.


2001 ◽  
Vol 89 (2) ◽  
pp. 403-424 ◽  
Author(s):  
Gilbert Becker

Two assumptions in classical test theory, essential tau-equivalence and independence of measurement errors, when violated may produce attenuated or inflated estimates of reliability, respectively. Inflation stemming from correlated errors can be controlled by a procedure in which systematically created equivalent halves of a given measuring instrument are administered across two occasions. When poor approximations to equivalent halves are constructed for this purpose, however, distortion in the opposite direction may result, being sometimes quite large when measuring instruments are not essentially tau-equivalent (or, at the practical level, unidimensional). The nature of these decrements are discussed and illustrated, and a number of procedures for eliminating them introduced.


2021 ◽  
Vol 104 (3) ◽  
pp. 003685042110283
Author(s):  
Meltem Yurtcu ◽  
Hülya Kelecioglu ◽  
Edward L Boone

Bayesian Nonparametric (BNP) modelling can be used to obtain more detailed information in test equating studies and to increase the accuracy of equating by accounting for covariates. In this study, two covariates are included in the equating under the Bayes nonparametric model, one is continuous, and the other is discrete. Scores equated with this model were obtained for a single group design for a small group in the study. The equated scores obtained with the model were compared with the mean and linear equating methods in the Classical Test Theory. Considering the equated scores obtained from three different methods, it was found that the equated scores obtained with the BNP model produced a distribution closer to the target test. Even the classical methods will give a good result with the smallest error when using a small sample, making equating studies valuable. The inclusion of the covariates in the model in the classical test equating process is based on some assumptions and cannot be achieved especially using small groups. The BNP model will be more beneficial than using frequentist methods, regardless of this limitation. Information about booklets and variables can be obtained from the distributors and equated scores that obtained with the BNP model. In this case, it makes it possible to compare sub-categories. This can be expressed as indicating the presence of differential item functioning (DIF). Therefore, the BNP model can be used actively in test equating studies, and it provides an opportunity to examine the characteristics of the individual participants at the same time. Thus, it allows test equating even in a small sample and offers the opportunity to reach a value closer to the scores in the target test.


Sign in / Sign up

Export Citation Format

Share Document