Comparing Hyperprior Distributions to Estimate Variance Components for Interrater Reliability Coefficients

Summary: This paper presents a meta-analysis of published job analysis interrater reliability data in order to predict the expected levels of interrater reliability within specific combinations of moderators, such as rater source, experience of the rater, and type of job descriptive information. The overall mean interrater reliability of 91 reliability coefficients reported in the literature was .59. The results of experienced professionals (job analysts) showed the highest reliability coefficients (.76). The method of data collection (job contact versus job description) only affected the results of experienced job analysts. For this group higher interrater reliability coefficients were obtained for analyses based on job contact (.87) than for those based on job descriptions (.71). For other rater categories (e.g., students, organization members) neither the method of data collection nor training had a significant effect on the interrater reliability. Analyses based on scales with defined levels resulted in significantly higher interrater reliability coefficients than analyses based on scales with undefined levels. Behavior and job worth dimensions were rated more reliable (.62 and .60, respectively) than attributes and tasks (.49 and .29, respectively). Furthermore, the results indicated that if nonprofessional raters are used (e.g., incumbents or students), at least two to four raters are required to obtain a reliability coefficient of .80. These findings have implications for research and practice.

Download Full-text

Interrater Agreement on a Socially Valid Measure of Students' Creativity

Psychological Reports ◽

10.2466/pr0.1987.61.3.1009 ◽

1987 ◽

Vol 61 (3) ◽

pp. 1009-1010 ◽

Cited By ~ 16

Author(s):

Mark A. Runco

Keyword(s):

Interrater Reliability ◽

Interrater Agreement ◽

True Score ◽

Valid Measure ◽

True Variance ◽

Interrater Reliability Coefficients ◽

Reliability Coefficients

The reliability and true variance of a socially valid measure of creativity was assessed by asking three judges to rate the creativity of 29 adolescents. Interitem reliability was .93; interrater reliability was .48; and true score variance, estimated from the interitem and interrater reliability coefficients, was .65.

Download Full-text

On the Usefulness of Interrater Reliability Coefficients

Springer Proceedings in Mathematics & Statistics - Quantitative Psychology ◽

10.1007/978-3-319-77249-3_6 ◽

2018 ◽

pp. 67-75

Author(s):

Debby ten Hove ◽

Terrence D. Jorgensen ◽

L. Andries van der Ark

Keyword(s):

Interrater Reliability ◽

Interrater Reliability Coefficients ◽

Reliability Coefficients

Download Full-text

Interpreting interrater reliability coefficients of the Braden scale: A discussion paper

International Journal of Nursing Studies ◽

10.1016/j.ijnurstu.2007.08.001 ◽

2008 ◽

Vol 45 (8) ◽

pp. 1238-1246 ◽

Cited By ~ 31

Author(s):

Jan Kottner ◽

Theo Dassen

Keyword(s):

Interrater Reliability ◽

Discussion Paper ◽

Braden Scale ◽

Interrater Reliability Coefficients ◽

Reliability Coefficients

Download Full-text

Percent of Agreement among Raters and Rater Reliability of the Copying Subtest of the Stanford-Binet Intelligence Scale: Fourth Edition

Perceptual and Motor Skills ◽

10.2466/pms.1992.74.2.347 ◽

1992 ◽

Vol 74 (2) ◽

pp. 347-353 ◽

Cited By ~ 1

Author(s):

Elizabeth M. Mason

Keyword(s):

Interrater Reliability ◽

Fourth Edition ◽

Cognitive Measures ◽

Intelligence Scale ◽

Technical Manual ◽

Subjective Judgement ◽

Visual Motor ◽

Test Retest Reliability ◽

Interrater Reliability Coefficients ◽

Reliability Coefficients

The purpose of this study was to investigate the interrater reliability of the visual-motor portion of the Copying subtest of the Stanford-Binet Intelligence Scale: Fourth Edition. Eight raters independently scored 11 protocols completed by children aged 5 through 10 years, using the scoring criteria and guidelines in the manual. The raters marked each of 10 items pass or fail and computed a total raw score for each protocol. Interrater reliability coefficients were obtained for each child's protocol, and the Kappa coefficient was computed for each item. Significant raters' reliability coefficients ranged from .82 to .91, which were low in comparison to test-retest reliability and Kuder-Richardson-20 coefficients for this and other subtests of the Stanford-Binet in the technical manual. Percent agreement among 8 raters also indicated weak reliability. Although the obtained results suggested some interrater reliability coefficients within acceptable levels, questions were raised about the scoring criteria for individual items. Caution is warranted in the use of cognitive measures which include subjective judgement of the examiner in applying scoring criteria.

Download Full-text

Reliability of Global Impressions for Assessing Methylphenidate Effects in Children with Attention-Deficit Hyperactivity Disorder

Perceptual and Motor Skills ◽

10.2466/pms.1993.77.3f.1215 ◽

1993 ◽

Vol 77 (3_suppl) ◽

pp. 1215-1218 ◽

Cited By ~ 3

Author(s):

Susan Dickerson Mayes ◽

Edward O. Bixler

Keyword(s):

Attention Deficit Hyperactivity Disorder ◽

Attention Deficit ◽

Interrater Reliability ◽

Double Blind ◽

Double Blind Placebo ◽

Hyperactivity Disorder ◽

Cohen's Kappa ◽

The Individual ◽

Interrater Reliability Coefficients ◽

Reliability Coefficients

Agreement between raters using global impressions to assess methylphenidate response was analyzed for children with Attention-Deficit Hyperactivity Disorder (ADHD) undergoing double-blind, placebo-controlled, crossover methylphenidate trials. Caregivers were more likely to disagree than agree when asked to rate the children as “better, same, or worse” during each day of the trial. Over-all agreement was 42.9%, only 9.6% above what would be expected based on chance alone. Further, none of the interrater reliability coefficients (Cohen's kappa) for the individual children were statistically significant.

Download Full-text

Interrater reliability coefficients cannot be computed when only one stimulus is rated.

Journal of Applied Psychology ◽

10.1037/0021-9010.74.2.368 ◽

1989 ◽

Vol 74 (2) ◽

pp. 368-370 ◽

Cited By ~ 67

Author(s):

Frank L. Schmidt ◽

John E. Hunter

Keyword(s):

Interrater Reliability ◽

Interrater Reliability Coefficients ◽

Reliability Coefficients

Download Full-text

Genetic parameter estimates for early growth traits in Naeini goat

Animal Production Science ◽

10.1071/an12045 ◽

2012 ◽

Vol 52 (11) ◽

pp. 1046 ◽

Cited By ~ 7

Author(s):

Hasan Baneh ◽

Mojtaba Najafi ◽

Ghodrat Rahimi

Keyword(s):

Environmental Effects ◽

Variance Components ◽

Genetic Parameters ◽

Growth Traits ◽

Genetic Parameter ◽

Average Daily Gain ◽

Parameter Estimates ◽

Estimate Variance ◽

Daily Gain ◽

Birth Type

The present study was carried out to estimate variance components for growth traits in Naeini goats. Bodyweight records were collected for two flocks under supervision of the Agriculture Organisation of the Esfahan province between 2000 and 2007. Investigated traits were birthweight (BW; n = 2483), weaning weight (WW; n = 1211) and average daily gain from birth to weaning (ADG; n = 1211). Environmental effects were investigated using fixed-effect models, while (co)variance components and genetic parameters were estimated with single- and three-trait analyses using REML methods and WOMBAT software. Six different animal models were fitted to the traits, with the best model for each trait determined by log-likelihood ratio tests (LRT). All traits were significantly influenced by herd, birth year, sex of the kid, birth type and dam age (P < 0.01). On the basis of LRT, maternal permanent environmental effects (c2) were significant for WW and ADG, while BW was affected only by direct genetic effects. Direct heritability estimates for BW, WW and ADG were 0.25 ± 0.05, 0.07 ± 0.06 and 0.21 ± 0.11, respectively. The estimate of c2 was 0.16 ± 0.06 for both WW and ADG. Estimates of genetic correlation for BW–ADG, BW–WW and ADG–WW were 0.49, 0.61 and 0.94, respectively. The estimated phenotypic correlations were positive and were between 0.03 (BW–ADG) and 0.95 (ADG–WW). These results indicate that selection can be used to improve growth traits in this goat breed.

Download Full-text

Restricted maximum likelihood to estimate variance components for animal models with several random effects using a derivative-free algorithm

Genetics Selection Evolution ◽

10.1186/1297-9686-21-3-317 ◽

1989 ◽

Vol 21 (3) ◽

pp. 317 ◽

Cited By ~ 228

Author(s):

K Meyer

Keyword(s):

Maximum Likelihood ◽

Animal Models ◽

Random Effects ◽

Variance Components ◽

Restricted Maximum Likelihood ◽

Derivative Free ◽

Estimate Variance

Download Full-text

Comparing Interrater reliability between eye examination and eye self-examination

Revista Latino-Americana de Enfermagem ◽

10.1590/1518-8345.1232.2966 ◽

2017 ◽

Vol 25 (0) ◽

Author(s):

Maria Alzete de Lima ◽

Lorita Marlena Freitag Pagliuca ◽

Jennara Cândido do Nascimento ◽

Joselany Áfio Caetano

Keyword(s):

Experimental Study ◽

Interrater Reliability ◽

Peripheral Vision ◽

Statistical Significance ◽

Public University ◽

Kappa Coefficient ◽

Quasi Experimental ◽

Eye Examination ◽

Self Examination ◽

Reliability Coefficients

Reume Objective: to compare Interrater reliability concerning two eye assessment methods. Method: quasi-experimental study conducted with 324 college students including eye self-examination and eye assessment performed by the researchers in a public university. Kappa coefficient was used to verify agreement. Results: reliability coefficients between Interraters ranged from 0.85 to 0.95, with statistical significance at 0.05. The exams to check for near acuity and peripheral vision presented a reasonable kappa >0.2. The remaining coefficients were higher, ranging from very to totally reliable. Conclusion: comparatively, the results of both methods were similar. The virtual manual on eye self-examination can be used to screen for eye conditions.

Download Full-text