scholarly journals A Fine-Tooth Comb for Measurement Reliability: Predicting True Score and Error Variance in Hierarchical Models

2020 ◽  
Author(s):  
Donald Ray Williams ◽  
Stephen Ross Martin ◽  
Michaela C DeBolt ◽  
Lisa Oakes ◽  
Philippe Rast

The primary objective of this work is to extend classical test theory (CTT), in particular, forthe case of repeated measurement studies. The guiding idea that motivates this work is that anytheory ought to be expanded when it is not compatible with commonly observed phenomena-namely, that homogeneous variance components appear to be the exception and not the rule inpsychological applications. Additionally, advancements in methodology should also be consideredin light of theory expansion, when appropriate. We argue both goals can be accomplishedby merging heterogeneous variance modeling with the central tenants of CTT. To this end, weintroduce novel methodology that is based on the mixed-effects location scale model. This allows for fitting explanatory models to the true score (between-group) and error (within-group)variance. Two illustrative examples, that span from educational research to infant cognition,highlight such possibilities. The results revealed that there can be substantial individual differences in error variance, which necessarily implies the same for reliability, and that true scorevariance can be a function of covariates. We incorporate this variance heterogeneity into novel reliability indices that can be used to forecast group or person-specific reliability. These extend traditional formulations that assume the variance components are homogeneous. This powerful approach can be used to identify predictors of true score and error variance, which can then be used to refine measurement. The methods are implemented in the user-friendly R packageICCier.

2018 ◽  
Author(s):  
Sam Parsons

The relationship between measurement reliability and statistical power is a complex one. Where reliability is defined by classical test theory as the proportion of 'true' variance to total variance (the sum of true score and error variance), power is only functionally related to total variance. Therefore, to explore direct relationships between reliability and power, one must hold either true-score variance or error variance constant while varying the other. Here, visualisations are used to illustrate the reliability-power relationship under conditions of fixed true-score variance and fixed error variance. From these visualisations, conceptual distinctions between fixing true-score or error variance can be raised. Namely, when true-score variance is fixed, low reliability (and low power) suggests a true effect may be hidden by error. Whereas, when error variance is fixed, high reliability (and low power) may simply suggest a very small effect. I raise several observations I hope will be useful in considering the utility of measurement reliability and it's relationship to effect sizes and statistical power.


2014 ◽  
Vol 35 (4) ◽  
pp. 201-211 ◽  
Author(s):  
André Beauducel ◽  
Anja Leue

It is shown that a minimal assumption should be added to the assumptions of Classical Test Theory (CTT) in order to have positive inter-item correlations, which are regarded as a basis for the aggregation of items. Moreover, it is shown that the assumption of zero correlations between the error score estimates is substantially violated in the population of individuals when the number of items is small. Instead, a negative correlation between error score estimates occurs. The reason for the negative correlation is that the error score estimates for different items of a scale are based on insufficient true score estimates when the number of items is small. A test of the assumption of uncorrelated error score estimates by means of structural equation modeling (SEM) is proposed that takes this effect into account. The SEM-based procedure is demonstrated by means of empirical examples based on the Edinburgh Handedness Inventory and the Eysenck Personality Questionnaire-Revised.


2009 ◽  
Vol 31 (1) ◽  
pp. 81
Author(s):  
Takeaki Kumazawa

Classical test theory (CTT) has been widely used to estimate the reliability of measurements. Generalizability theory (G theory), an extension of CTT, is a powerful statistical procedure, particularly useful for performance testing, because it enables estimating the percentages of persons variance and multiple sources of error variance. This study focuses on a generalizability study (G study) conducted to investigate such variance components for a paper-pencil multiple-choice vocabulary test used as a diagnostic pretest. Further, a decision study (D study) was conducted to compute the generalizability coefficient (G coefficient) for absolute decisions. The results of the G and D studies indicated that 46% of the total variance was due to the items effect; further, the G coefficient for absolute decisions was low. 古典的テスト理論は尺度の信頼性を測定するため広く用いられている。古典的テスト理論の応用である一般化可能性理論(G理論)は特にパフォーマンステストにおいて有効な分析手法であり、受験者と誤差の要因となる分散成分の割合を測定することができる。本研究では診断テストとして用いられた多岐選択式語彙テストの分散成分を測定するため一般化可能性研究(G研究)を行った。さらに、決定研究(D研究)では絶対評価に用いる一般化可能性係数を算出した。G研究とD研究の結果、項目の分散成分が全体の分散の46%を占め、また信頼度指数は高くなかった。


2020 ◽  
Author(s):  
Stephen Ross Martin ◽  
Philippe Rast

Reliability is a crucial concept in psychometrics. Although it is typically estimated as a single fixed quantity, previous work suggests that reliability can vary across persons, groups, and covariates. We propose a novel method for estimating and modeling case-specific reliability without repeated measurements or parallel tests. The proposed method employs a “Reliability Factor” that models the error variance of each case across multiple indicators, thereby producing case-specific reliability estimates. Additionally, we use Gaussian process modeling to a estimate non-linear, non-monotonic function between the latent factor itself and the reliability of the measure, providing an analogue to test information functions in item response theory. The reliability factor model is a new tool for examining latent regions with poor conditional reliability, and correlates thereof, in a classical test theory framework.


1975 ◽  
Vol 36 (1) ◽  
pp. 115-118
Author(s):  
Joseph Levin

The problem of the effect of restriction of range on reliability is explored, under the assumption of classical reliability theory. Since both true score and error are truncated by selection, the standard formula based on the ratio of error variance to total variance cannot be employed for correction of range. It is also pointed out that split-halves and similar methods cannot be used to establish the reliability of the data that have been used for selection. The proper treatment of the problem is administration of an alternate form or a retest of the selected group; correction for range should be treated as a case of selection in a multivariate distribution, instead of using formulae of test theory.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Charlotte E. Dean ◽  
Shazia Akhtar ◽  
Tim M. Gale ◽  
Karen Irvine ◽  
Richard Wiseman ◽  
...  

Abstract Background This study describes the construction and validation of a new scale for measuring belief in paranormal phenomena. The work aims to address psychometric and conceptual shortcomings associated with existing measures of paranormal belief. The study also compares the use of classic test theory and modern test theory as methods for scale development. Method We combined novel items and amended items taken from existing scales, to produce an initial corpus of 29 items. Two hundred and thirty-one adult participants rated their level of agreement with each item using a seven-point Likert scale. Results Classical test theory methods (including exploratory factor analysis and principal components analysis) reduced the scale to 14 items and one overarching factor: Supernatural Beliefs. The factor demonstrated high internal reliability, with an excellent test–retest reliability for the total scale. Modern test theory methods (Rasch analysis using a rating scale model) reduced the scale to 13 items with a four-point response format. The Rasch scale was found to be most effective at differentiating between individuals with moderate-high levels of paranormal beliefs, and differential item functioning analysis indicated that the Rasch scale represents a valid measure of belief in paranormal phenomena. Conclusions The scale developed using modern test theory is identified as the final scale as this model allowed for in-depth analyses and refinement of the scale that was not possible using classical test theory. Results support the psychometric reliability of this new scale for assessing belief in paranormal phenomena, particularly when differentiating between individuals with higher levels of belief.


Methodology ◽  
2018 ◽  
Vol 14 (3) ◽  
pp. 133-142 ◽  
Author(s):  
Zhehan Jiang

Abstract. Extending from classical test theory, G theory allows more sources of variations to be investigated and therefore provides the accuracy of generalizing observed scores to a broader universe. However, G theory has been used less due to the absence of analytic facilities for this purpose in popular statistical software packages. Besides, there is rarely a systematic G theory introduction in the linear mixed-effect model context, which is a widely taught technique in statistical analysis curricula. The present paper fits G theory into linear mixed-effect models and estimates the variance components via the well-known lme4 package in R. Concrete examples, modeling procedures, and R syntax are illustrated so that practitioners may use G theory for their studies. Realizing the G theory estimation in R provides more flexible features than other platforms, such that users need not rely on specialized software such as GENOVA and urGENOVA.


2013 ◽  
Vol 93 (4) ◽  
pp. 562-569 ◽  
Author(s):  
Richard A. Preuss

Clinical assessment protocols must produce data that are reliable, with a clinically attainable minimal detectable change (MDC). In a reliability study, generalizability theory has 2 advantages over classical test theory. These advantages provide information that allows assessment protocols to be adjusted to match individual patient profiles. First, generalizability theory allows the user to simultaneously consider multiple sources of measurement error variance (facets). Second, it allows the user to generalize the findings of the main study across the different study facets and to recalculate the reliability and MDC based on different combinations of facet conditions. In doing so, clinical assessment protocols can be chosen based on minimizing the number of measures that must be taken to achieve a realistic MDC, using repeated measures to minimize the MDC, or simply based on the combination that best allows the clinician to monitor an individual patient's progress over a specified period of time.


Sign in / Sign up

Export Citation Format

Share Document