scholarly journals The Reliability Factor: Modeling individual reliability with multiple items from a single assessment

2020 ◽  
Author(s):  
Stephen Ross Martin ◽  
Philippe Rast

Reliability is a crucial concept in psychometrics. Although it is typically estimated as a single fixed quantity, previous work suggests that reliability can vary across persons, groups, and covariates. We propose a novel method for estimating and modeling case-specific reliability without repeated measurements or parallel tests. The proposed method employs a “Reliability Factor” that models the error variance of each case across multiple indicators, thereby producing case-specific reliability estimates. Additionally, we use Gaussian process modeling to a estimate non-linear, non-monotonic function between the latent factor itself and the reliability of the measure, providing an analogue to test information functions in item response theory. The reliability factor model is a new tool for examining latent regions with poor conditional reliability, and correlates thereof, in a classical test theory framework.

2009 ◽  
Vol 31 (1) ◽  
pp. 81
Author(s):  
Takeaki Kumazawa

Classical test theory (CTT) has been widely used to estimate the reliability of measurements. Generalizability theory (G theory), an extension of CTT, is a powerful statistical procedure, particularly useful for performance testing, because it enables estimating the percentages of persons variance and multiple sources of error variance. This study focuses on a generalizability study (G study) conducted to investigate such variance components for a paper-pencil multiple-choice vocabulary test used as a diagnostic pretest. Further, a decision study (D study) was conducted to compute the generalizability coefficient (G coefficient) for absolute decisions. The results of the G and D studies indicated that 46% of the total variance was due to the items effect; further, the G coefficient for absolute decisions was low. 古典的テスト理論は尺度の信頼性を測定するため広く用いられている。古典的テスト理論の応用である一般化可能性理論(G理論)は特にパフォーマンステストにおいて有効な分析手法であり、受験者と誤差の要因となる分散成分の割合を測定することができる。本研究では診断テストとして用いられた多岐選択式語彙テストの分散成分を測定するため一般化可能性研究(G研究)を行った。さらに、決定研究(D研究)では絶対評価に用いる一般化可能性係数を算出した。G研究とD研究の結果、項目の分散成分が全体の分散の46%を占め、また信頼度指数は高くなかった。


Assessment ◽  
2021 ◽  
pp. 107319112199416
Author(s):  
Desirée Blázquez-Rincón ◽  
Juan I. Durán ◽  
Juan Botella

A reliability generalization meta-analysis was carried out to estimate the average reliability of the seven-item, 5-point Likert-type Fear of COVID-19 Scale (FCV-19S), one of the most widespread scales developed around the COVID-19 pandemic. Different reliability coefficients from classical test theory and the Rasch Measurement Model were meta-analyzed, heterogeneity among the most reported reliability estimates was examined by searching for moderators, and a predictive model to estimate the expected reliability was proposed. At least one reliability estimate was available for a total of 44 independent samples out of 42 studies, being that Cronbach’s alpha was most frequently reported. The coefficients exhibited pooled estimates ranging from .85 to .90. The moderator analyses led to a predictive model in which the standard deviation of scores explained 36.7% of the total variability among alpha coefficients. The FCV-19S has been shown to be consistently reliable regardless of the moderator variables examined.


2018 ◽  
Author(s):  
Sam Parsons

The relationship between measurement reliability and statistical power is a complex one. Where reliability is defined by classical test theory as the proportion of 'true' variance to total variance (the sum of true score and error variance), power is only functionally related to total variance. Therefore, to explore direct relationships between reliability and power, one must hold either true-score variance or error variance constant while varying the other. Here, visualisations are used to illustrate the reliability-power relationship under conditions of fixed true-score variance and fixed error variance. From these visualisations, conceptual distinctions between fixing true-score or error variance can be raised. Namely, when true-score variance is fixed, low reliability (and low power) suggests a true effect may be hidden by error. Whereas, when error variance is fixed, high reliability (and low power) may simply suggest a very small effect. I raise several observations I hope will be useful in considering the utility of measurement reliability and it's relationship to effect sizes and statistical power.


2021 ◽  
Vol 12 ◽  
Author(s):  
David Alpizar ◽  
Brian F. French

The Motivational-Developmental Assessment (MDA) measures a university student’s motivational and developmental attributes by utilizing overlapping constructs measured across four writing prompts. The MDA’s format may lead to the violation of the local item independence (LII) assumption for unidimensional item response theory (IRT) scoring models, or the uncorrelated errors assumption for scoring models in classical test theory (CTT) due to the measurement of overlapping constructs within a prompt. This assumption violation is known as a testlet effect, which can be viewed as a method effect. The application of a unidimensional IRT or CTT model to score the MDA can result in imprecise parameter estimates when this effect is ignored. To control for this effect in the MDA responses, we first examined the presence of local dependence via a restricted bifactor model and Yen’s Q3 statistic. Second, we applied bifactor models to account for the testlet effect in the responses, as this effect is modeled as an additional latent variable in a factor model. Results support the presence of local dependence in two of the four MDA prompts, and the use of the restricted bifactor model to account for the testlet effect in the responses. Modeling the testlet effect through the restricted bifactor model supports a scoring inference in a validation argument framework. Implications are discussed.


2001 ◽  
Vol 89 (2) ◽  
pp. 291-307 ◽  
Author(s):  
Gilbert Becker

Violation of either of two basic assumptions in classical test theory may lead to biased estimates of reliability. Violation of the assumption of essential tau-equivalence may produce underestimates, and the presence of correlated errors among measurement units may result in overestimates. The ubiquity of circumstances in which this problem may occur is not fully comprehended by many workers. This article surveys a variety of settings in which biased reliability estimates may be found in an effort to increase awareness of the prevalence of the problem.


2013 ◽  
Vol 93 (4) ◽  
pp. 562-569 ◽  
Author(s):  
Richard A. Preuss

Clinical assessment protocols must produce data that are reliable, with a clinically attainable minimal detectable change (MDC). In a reliability study, generalizability theory has 2 advantages over classical test theory. These advantages provide information that allows assessment protocols to be adjusted to match individual patient profiles. First, generalizability theory allows the user to simultaneously consider multiple sources of measurement error variance (facets). Second, it allows the user to generalize the findings of the main study across the different study facets and to recalculate the reliability and MDC based on different combinations of facet conditions. In doing so, clinical assessment protocols can be chosen based on minimizing the number of measures that must be taken to achieve a realistic MDC, using repeated measures to minimize the MDC, or simply based on the combination that best allows the clinician to monitor an individual patient's progress over a specified period of time.


2014 ◽  
Vol 35 (4) ◽  
pp. 250-261 ◽  
Author(s):  
Matthias Ziegler ◽  
Arthur Poropat ◽  
Julija Mell

Short personality questionnaires are increasingly used in research and practice, with some scales including as few as two to five items per personality domain. Despite the frequency of their use, these short scales are often criticized on the basis of their reduced internal consistencies and their purported failure to assess the breadth of broad constructs, such as the Big 5 factors of personality. One reason for this might be the use of principles routed in Classical Test Theory during test construction. In this study, Generalizability Theory is used to compare psychometric properties of different scales based on the NEO-PI-R and BFI, two widely-used personality questionnaire families. Applying both Classical Test Theory (CTT) and Generalizability Theory (GT) allowed to identify the inner workings of test shortening. CTT-based analyses indicated that longer is generally better for reliability, while GT allowed differentiation between reliability for relative and absolute decisions, while revealing how different variance sources affect test score reliability estimates. These variance sources differed with scale length, and only GT allowed clear description of these internal consequences, allowing more effective identification of advantages and disadvantages of shorter and longer scales. Most importantly, the findings highlight the potential error proneness of focusing solely on reliability and scale length in test construction. Practical as well as theoretical consequences are discussed.


2020 ◽  
Author(s):  
Donald Ray Williams ◽  
Stephen Ross Martin ◽  
Michaela C DeBolt ◽  
Lisa Oakes ◽  
Philippe Rast

The primary objective of this work is to extend classical test theory (CTT), in particular, forthe case of repeated measurement studies. The guiding idea that motivates this work is that anytheory ought to be expanded when it is not compatible with commonly observed phenomena-namely, that homogeneous variance components appear to be the exception and not the rule inpsychological applications. Additionally, advancements in methodology should also be consideredin light of theory expansion, when appropriate. We argue both goals can be accomplishedby merging heterogeneous variance modeling with the central tenants of CTT. To this end, weintroduce novel methodology that is based on the mixed-effects location scale model. This allows for fitting explanatory models to the true score (between-group) and error (within-group)variance. Two illustrative examples, that span from educational research to infant cognition,highlight such possibilities. The results revealed that there can be substantial individual differences in error variance, which necessarily implies the same for reliability, and that true scorevariance can be a function of covariates. We incorporate this variance heterogeneity into novel reliability indices that can be used to forecast group or person-specific reliability. These extend traditional formulations that assume the variance components are homogeneous. This powerful approach can be used to identify predictors of true score and error variance, which can then be used to refine measurement. The methods are implemented in the user-friendly R packageICCier.


Sign in / Sign up

Export Citation Format

Share Document