Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

An Alternative to Cohen's κ

European Psychologist ◽

10.1027/1016-9040.11.1.12 ◽

2006 ◽

Vol 11 (1) ◽

pp. 12-24 ◽

Cited By ~ 19

Author(s):

Alexander von Eye

Keyword(s):

Simulation Study ◽

Null Hypothesis ◽

Categorical Variables ◽

Alternative Measure ◽

Rater Agreement ◽

Verbal Processing ◽

Heavy Tailed ◽

Applicant Selection

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used. The most popular of these is Cohen's κ. In this article, a new coefficient, κ s , is proposed as an alternative measure of rater agreement. Both κ and κ s allow researchers to determine whether agreement in groups of two or more raters is significantly beyond chance. Stouffer's z is used to test the null hypothesis that κ s = 0. The coefficient κ s allows one, in addition to evaluating rater agreement in a fashion parallel to κ, to (1) examine subsets of cells in agreement tables, (2) examine cells that indicate disagreement, (3) consider alternative chance models, (4) take covariates into account, and (5) compare independent samples. Results from a simulation study are reported, which suggest that (a) the four measures of rater agreement, Cohen's κ, Brennan and Prediger's κ n , raw agreement, and κ s are sensitive to the same data characteristics when evaluating rater agreement and (b) both the z-statistic for Cohen's κ and Stouffer's z for κ s are unimodally and symmetrically distributed, but slightly heavy-tailed. Examples use data from verbal processing and applicant selection.

Download Full-text

The relationship between behavioral specificity, rater agreement, and performance ratings

PsycEXTRA Dataset ◽

10.1037/e518532013-767 ◽

2007 ◽

Author(s):

Traxler W. Littlejohn ◽

Anthony R. Paquin

Keyword(s):

Performance Ratings ◽

Rater Agreement ◽

And Performance ◽

The Relationship

Download Full-text

Supplemental Material for Personality Characteristics Below Facets: A Replication and Meta-Analysis of Cross-Rater Agreement, Rank-Order Stability, Heritability, and Utility of Personality Nuances

Journal of Personality and Social Psychology ◽

10.1037/pspp0000202.supp ◽

2018 ◽

Keyword(s):

Rank Order ◽

Meta Analysis ◽

Personality Characteristics ◽

Rater Agreement

Download Full-text

538-P: Inter-Rater Agreement of Vibration and Pressure Sensation in the Assessment of Sensorimotor Dysfunction in Diabetes

Diabetes ◽

10.2337/db20-538-p ◽

2020 ◽

Vol 69 (Supplement 1) ◽

pp. 538-P

Author(s):

EDWARD B. JUDE ◽

ANASTASIOS TENTOLOURIS ◽

IOANNA ELEFTHERIADOU ◽

NIKOLAOS TENTOLOURIS

Keyword(s):

Rater Agreement ◽

Pressure Sensation

Download Full-text

Faculty Opinions recommendation of Reproducibility of the Endometriosis Fertility Index: a prospective inter/intra-rater agreement study.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.736218416.793563314 ◽

2019 ◽

Author(s):

Jim Tsaltas

Keyword(s):

Rater Agreement ◽

Fertility Index ◽

Agreement Study

Download Full-text

Evaluation of the Modified Naranjo Criteria for Assessing Causal Attribution of Clinical Outcome to Homeopathic Intervention as Presented in Case Reports

Homeopathy ◽

10.1055/s-0040-1701251 ◽

2020 ◽

Vol 109 (04) ◽

pp. 191-197

Author(s):

Chetna Deep Lamba ◽

Vishwa Kumar Gupta ◽

Robbert van Haselen ◽

Lex Rutten ◽

Nidhi Mahajan ◽

...

Keyword(s):

Clinical Outcome ◽

Causal Relationship ◽

Clinical Case ◽

Assessment Tool ◽

Causal Attribution ◽

Case Reports ◽

Face Validity ◽

Rater Agreement ◽

The Face ◽

Selection Of

Abstract Objectives The objective of this study was to establish the reliability and content validity of the “Modified Naranjo Criteria for Homeopathy—Causal Attribution Inventory” as a tool for attributing a causal relationship between the homeopathic intervention and outcome in clinical case reports. Methods Purposive sampling was adopted for the selection of information-rich case reports using pre-defined criteria. Eligible case reports had to fulfil a minimum of nine items of the CARE Clinical Case Reporting Guideline checklist and a minimum of three of the homeopathic HOM-CASE CARE extension items. The Modified Naranjo Criteria for Homeopathy Inventory consists of 10 domains. Inter-rater agreement in the scoring of these domains was determined by calculating the percentage agreement and kappa (κ) values. A κ greater than 0.4, indicating fair agreement between raters, in conjunction with the absence of concerns regarding the face validity, was taken to indicate the validity of a given domain. Each domain was assessed by four raters for the selected case reports. Results Sixty case reports met the inclusion criteria. Inter-rater agreement/concordance per domain was “perfect” for domains 1 (100%, κ = 1.00) and 2 (100%, κ = 1.00); “almost perfect” for domain 8 (97.5%, κ = 0.86); “substantial” for domains 3 (96.7%, κ = 0.80) and 5 (91.1%, κ = 0.70); “moderate” for domains 4 (83.3%, κ = 0.60), 7 (67.8%, κ = 0.46) and 9 (99.2%, κ = 0.50); and “fair” for domain 10 (56.1%, κ = 0.38). For domains 6A (46.7%, κ = 0.03) and 6B (50.3%, κ = 0.18), there was “slight agreement” only. Thus, the validity of the Modified Naranjo Criteria for Homeopathy tool was established for each of its domains, except for the two that pertain to direction of cure (domains 6A and 6B). Conclusion The Modified Naranjo Criteria for Homeopathy—Causal Attribution Inventory was identified as a valid tool for assessing the likelihood of a causal relationship between a homeopathic intervention and clinical outcome. Improved wordings for several criteria have been proposed for the assessment tool, under the new acronym “MONARCH”. Further assessment of two MONARCH domains is required.

Download Full-text

Reliability and Accuracy of Remote Fiberoptic Nasopharyngolaryngoscopy in the Pediatric Population

Ear Nose & Throat Journal ◽

10.1177/0145561320919109 ◽

2020 ◽

pp. 014556132091910

Author(s):

Lauren E. Miller ◽

Adva Buzi ◽

Ashley Williams ◽

Rachel S. Rogers ◽

Angel G. Ortiz ◽

...

Keyword(s):

Pediatric Population ◽

Upper Airway ◽

Office Visit ◽

Rater Agreement ◽

Vocal Cords ◽

Coupling Device ◽

Overall Evaluation ◽

Survey Results ◽

Device Use ◽

False Vocal

Introduction: Telemedicine is an increasingly prevalent component of medical practice. In otolaryngology, there is the potential for telemedicine services to be performed in conjunction with device use, such as with a nasolaryngoscope. This study evaluates the reliability of remote examinations of the upper airway through an iPhone recording using a coupling device attached to a nasopharyngolaryngoscope (NPL). Methods: A prospective, blinded study was performed for pediatric patients requiring an NPL during an office visit. The NPL was performed using a coupling device attached to a smartphone to record the examination. A second, remote otolaryngologist then evaluated the recorded examination. Both otolaryngologists evaluated findings of anatomic sites including nasopharynx, oropharynx, base of tongue, larynx including subsites of epiglottis, arytenoids, aryepiglottic folds, false vocal cords, true vocal cords, patency of airway, and diagnostic impression, all of which were documented through a survey. Results of the survey were evaluated through inter-rater agreement using the κ statistic. Results: Forty-five patients underwent an NPL, all of which were included in the study. The average age was 4.9 years. The most common complaint requiring NPL was noisy breathing (n = 16). The inter-rater agreement for overall diagnosis was 0.74 with 80% percent agreement, rated as “good.” Other anatomic subsites with “good” or better inter-rater agreement were nasopharynx (0.75), oropharynx (0.75), and true vocal cords (0.71), with strong percentage agreement of 89%, 91%, and 87%, respectively. Both users of the adaptor found the recording setup to run smoothly. Conclusion: A telemedicine device for NPL use demonstrates strong diagnostic accuracy across providers and good overall evaluation. It holds potential for use in remote settings.

Download Full-text

Interobserver agreement of the Paris and simplified classifications of superficial colonic lesions: a Western study

Endoscopy International Open ◽

10.1055/a-1352-3437 ◽

2021 ◽

Vol 09 (03) ◽

pp. E388-E394

Author(s):

Francesco Cocomazzi ◽

Marco Gentile ◽

Francesco Perri ◽

Antonio Merla ◽

Fabrizio Bossa ◽

...

Keyword(s):

Interobserver Agreement ◽

Classification Systems ◽

Sensitivity Analyses ◽

Size Estimation ◽

Rater Agreement ◽

Staff Members ◽

Video Clips ◽

Paris Classification ◽

Level Of Agreement

Abstract Background and study aims The Paris classification of superficial colonic lesions has been widely adopted, but a simplified description that subgroups the shape into pedunculated, sessile/flat and depressed lesions has been proposed recently. The aim of this study was to evaluate the accuracy and inter-rater agreement among 13 Western endoscopists for the two classification systems. Methods Seventy video clips of superficial colonic lesions were classified according to the two classifications, and their size estimated. The interobserver agreement for each classification was assessed using both Cohen k and AC1 statistics. Accuracy was taken as the concordance between the standard morphology definition and that made by participants. Sensitivity analyses investigated agreement between trainees (T) and staff members (SM), simple or mixed lesions, distinct lesion phenotypes, and for laterally spreading tumors (LSTs). Results Overall, the interobserver agreement for the Paris classification was substantial (κ = 0.61; AC1 = 0.66), with 79.3 % accuracy. Between SM and T, the values were superimposable. For size estimation, the agreement was 0.48 by the κ-value, and 0.50 by AC1. For single or mixed lesions, κ-values were 0.60 and 0.43, respectively; corresponding AC1 values were 0.68 and 0.57. Evaluating the several different polyp subtypes separately, agreement differed significantly when analyzed by the k-statistics (0.08–0.12) or the AC1 statistics (0.59–0.71). Analyses of LSTs provided a κ-value of 0.50 and an AC1 score of 0.62, with 77.6 % accuracy. The simplified classification outperformed the Paris classification: κ = 0.68, AC1 = 0.82, accuracy = 91.6 %. Conclusions Agreement is often measured with Cohen’s κ, but we documented higher levels of agreement when analyzed with the AC1 statistic. The level of agreement was substantial for the Paris classification, and almost perfect for the simplified system.

Download Full-text

Inter‐rater agreement for sonographic stomach position classification in fetal diaphragmatic hernia across the North American Fetal Therapy Network (NAFTNet)

Prenatal Diagnosis ◽

10.1002/pd.5949 ◽

2021 ◽

Author(s):

Nimrah Abbasi ◽

Greg Ryan ◽

Rodrigo Ruano ◽

Magda Sanz Cortes ◽

Xiang Y. Ye ◽

...

Keyword(s):

Diaphragmatic Hernia ◽

North American ◽

Fetal Therapy ◽

Rater Agreement ◽

The North

Download Full-text

Assessment of 2D ultrasound fluid volume estimation accuracy in different shaped objects: an in vitro study

Acta Radiologica ◽

10.1177/0284185119854198 ◽

2019 ◽

Vol 61 (2) ◽

pp. 253-259

Author(s):

Iroshani Kodikara ◽

Iroshini Abeysekara ◽

Dhanusha Gamage ◽

Isurani Ilayperuma

Keyword(s):

Estimation Error ◽

In Vitro Study ◽

High Accuracy ◽

Volume Estimation ◽

Estimation Accuracy ◽

Rater Agreement ◽

Actual Volume ◽

2D Ultrasound ◽

One Way Anova

Background Volume estimation of organs using two-dimensional (2D) ultrasonography is frequently warranted. Considering the influence of estimated volume on patient management, maintenance of its high accuracy is empirical. However, data are scarce regarding the accuracy of estimated volume of non-globular shaped objects of different volumes. Purpose To evaluate the volume estimation accuracy of different shaped and sized objects using high-end 2D ultrasound scanners. Material and Methods Globular (n=5); non-globular elongated (n=5), and non-globular near-spherical shaped (n=4) hollow plastic objects were scanned to estimate the volumes; actual volumes were compared with estimated volumes. T-test and one-way ANOVA were used to compare means; P<0.05 was considered significant. Results The actual volumes of the objects were in the range of 10–445 mL; estimated volumes ranged from 6.4–425 mL ( P=0.067). The estimated volume was lower than the actual volume; such volume underestimation was marked for non-globular elongated objects. Regardless of the scanner, the highest volume estimation error was for non-globular elongated objects (<40%) followed by non-globular near-spherical shaped objects (<23.88%); the lowest was for globular objects (<3.6%). Irrespective of the shape or the volume of the object, volume estimation difference among the scanners was not significant: globular (F=0.430, P=0.66); non-globular elongated (F=3.69, P=0.064); and non-globular near-spherical (F=4.00, P=0.06). A good inter-rater agreement (R=0.99, P<0.001) and a good correlation between actual versus estimated volumes (R=0.98, P<0.001) were noted. Conclusion The 2D ultrasonography can be recommended for volume estimation purposes of different shaped and different sized objects, regardless the type of the high-end scanner used.

Download Full-text