Confidence Intervals About Score Reliability Coefficients

2011 ◽  
pp. 69-88 ◽  
Author(s):  
Fan Xitao ◽  
Thompson Bruce
2020 ◽  
Author(s):  
Peter E Clayson ◽  
Kaylie Amanda Carbine ◽  
Scott Baldwin ◽  
Joseph A. Olsen ◽  
Michael J. Larson

The reliability of event-related brain potential (ERP) scores depends on study context and how those scores will be used, and reliability must be routinely evaluated. Many factors can influence ERP score reliability, and generalizability (G) theory provides a multifaceted approach to estimating the internal consistency and temporal stability of scores that is well suited for ERPs. G-theory’s approach possesses a number of advantages over classical test theory that make it ideal for pinpointing sources of error in scores. The current primer outlines the G-theory approach to estimating internal consistency (coefficients of equivalence) and test-retest reliability (coefficients of stability). This approach is used to evaluate the reliability of ERP measurements. The primer outlines how to estimate reliability coefficients that consider the impact of the number of trials, events, occasion, and group. The uses of two different G-theory reliability coefficients (i.e., generalizability and dependability) in ERP research are elaborated, and a dataset from the companion manuscript, which examines N2 amplitudes to Go/NoGo stimuli, is used as an example of the application of these coefficients to ERPs. The developed algorithms are implemented in the ERP Reliability Analysis (ERA) Toolbox, which is open-source software designed for estimating score reliability using G theory. The toolbox facilitates the application of G theory in an effort to simplify the study-by-study evaluation of ERP score reliability. The formulas provided in this primer should enable researchers to pinpoint the sources of measurement error in ERP scores from multiple recording sessions and subsequently plan studies that optimize score reliability.


2000 ◽  
Vol 5 (3) ◽  
pp. 356-369 ◽  
Author(s):  
Jorge L. Mendoza ◽  
Karen L. Stafford ◽  
Joseph M. Stauffer

2017 ◽  
Vol 78 (1) ◽  
pp. 32-45 ◽  
Author(s):  
Björn Andersson ◽  
Tao Xin

In applications of item response theory (IRT), an estimate of the reliability of the ability estimates or sum scores is often reported. However, analytical expressions for the standard errors of the estimators of the reliability coefficients are not available in the literature and therefore the variability associated with the estimated reliability is typically not reported. In this study, the asymptotic variances of the IRT marginal and test reliability coefficient estimators are derived for dichotomous and polytomous IRT models assuming an underlying asymptotically normally distributed item parameter estimator. The results are used to construct confidence intervals for the reliability coefficients. Simulations are presented which show that the confidence intervals for the test reliability coefficient have good coverage properties in finite samples under a variety of settings with the generalized partial credit model and the three-parameter logistic model. Meanwhile, it is shown that the estimator of the marginal reliability coefficient has finite sample bias resulting in confidence intervals that do not attain the nominal level for small sample sizes but that the bias tends to zero as the sample size increases.


2002 ◽  
Vol 18 (1) ◽  
pp. 52-62 ◽  
Author(s):  
Olga F. Voskuijl ◽  
Tjarda van Sliedregt

Summary: This paper presents a meta-analysis of published job analysis interrater reliability data in order to predict the expected levels of interrater reliability within specific combinations of moderators, such as rater source, experience of the rater, and type of job descriptive information. The overall mean interrater reliability of 91 reliability coefficients reported in the literature was .59. The results of experienced professionals (job analysts) showed the highest reliability coefficients (.76). The method of data collection (job contact versus job description) only affected the results of experienced job analysts. For this group higher interrater reliability coefficients were obtained for analyses based on job contact (.87) than for those based on job descriptions (.71). For other rater categories (e.g., students, organization members) neither the method of data collection nor training had a significant effect on the interrater reliability. Analyses based on scales with defined levels resulted in significantly higher interrater reliability coefficients than analyses based on scales with undefined levels. Behavior and job worth dimensions were rated more reliable (.62 and .60, respectively) than attributes and tasks (.49 and .29, respectively). Furthermore, the results indicated that if nonprofessional raters are used (e.g., incumbents or students), at least two to four raters are required to obtain a reliability coefficient of .80. These findings have implications for research and practice.


Sign in / Sign up

Export Citation Format

Share Document