Inter-Rater Agreement Measures and the Refinement of Metrics in the PLATO MT Evaluation Paradigm

Author(s):  
Keith J. Miller ◽  
Michelle Vanni
2006 ◽  
Vol 11 (1) ◽  
pp. 12-24 ◽  
Author(s):  
Alexander von Eye

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used. The most popular of these is Cohen's κ. In this article, a new coefficient, κ s , is proposed as an alternative measure of rater agreement. Both κ and κ s allow researchers to determine whether agreement in groups of two or more raters is significantly beyond chance. Stouffer's z is used to test the null hypothesis that κ s = 0. The coefficient κ s allows one, in addition to evaluating rater agreement in a fashion parallel to κ, to (1) examine subsets of cells in agreement tables, (2) examine cells that indicate disagreement, (3) consider alternative chance models, (4) take covariates into account, and (5) compare independent samples. Results from a simulation study are reported, which suggest that (a) the four measures of rater agreement, Cohen's κ, Brennan and Prediger's κ n , raw agreement, and κ s are sensitive to the same data characteristics when evaluating rater agreement and (b) both the z-statistic for Cohen's κ and Stouffer's z for κ s are unimodally and symmetrically distributed, but slightly heavy-tailed. Examples use data from verbal processing and applicant selection.


Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 538-P
Author(s):  
EDWARD B. JUDE ◽  
ANASTASIOS TENTOLOURIS ◽  
IOANNA ELEFTHERIADOU ◽  
NIKOLAOS TENTOLOURIS

Homeopathy ◽  
2020 ◽  
Vol 109 (04) ◽  
pp. 191-197
Author(s):  
Chetna Deep Lamba ◽  
Vishwa Kumar Gupta ◽  
Robbert van Haselen ◽  
Lex Rutten ◽  
Nidhi Mahajan ◽  
...  

Abstract Objectives The objective of this study was to establish the reliability and content validity of the “Modified Naranjo Criteria for Homeopathy—Causal Attribution Inventory” as a tool for attributing a causal relationship between the homeopathic intervention and outcome in clinical case reports. Methods Purposive sampling was adopted for the selection of information-rich case reports using pre-defined criteria. Eligible case reports had to fulfil a minimum of nine items of the CARE Clinical Case Reporting Guideline checklist and a minimum of three of the homeopathic HOM-CASE CARE extension items. The Modified Naranjo Criteria for Homeopathy Inventory consists of 10 domains. Inter-rater agreement in the scoring of these domains was determined by calculating the percentage agreement and kappa (κ) values. A κ greater than 0.4, indicating fair agreement between raters, in conjunction with the absence of concerns regarding the face validity, was taken to indicate the validity of a given domain. Each domain was assessed by four raters for the selected case reports. Results Sixty case reports met the inclusion criteria. Inter-rater agreement/concordance per domain was “perfect” for domains 1 (100%, κ = 1.00) and 2 (100%, κ = 1.00); “almost perfect” for domain 8 (97.5%, κ = 0.86); “substantial” for domains 3 (96.7%, κ = 0.80) and 5 (91.1%, κ = 0.70); “moderate” for domains 4 (83.3%, κ = 0.60), 7 (67.8%, κ = 0.46) and 9 (99.2%, κ = 0.50); and “fair” for domain 10 (56.1%, κ = 0.38). For domains 6A (46.7%, κ = 0.03) and 6B (50.3%, κ = 0.18), there was “slight agreement” only. Thus, the validity of the Modified Naranjo Criteria for Homeopathy tool was established for each of its domains, except for the two that pertain to direction of cure (domains 6A and 6B). Conclusion The Modified Naranjo Criteria for Homeopathy—Causal Attribution Inventory was identified as a valid tool for assessing the likelihood of a causal relationship between a homeopathic intervention and clinical outcome. Improved wordings for several criteria have been proposed for the assessment tool, under the new acronym “MONARCH”. Further assessment of two MONARCH domains is required.


2020 ◽  
pp. 014556132091910
Author(s):  
Lauren E. Miller ◽  
Adva Buzi ◽  
Ashley Williams ◽  
Rachel S. Rogers ◽  
Angel G. Ortiz ◽  
...  

Introduction: Telemedicine is an increasingly prevalent component of medical practice. In otolaryngology, there is the potential for telemedicine services to be performed in conjunction with device use, such as with a nasolaryngoscope. This study evaluates the reliability of remote examinations of the upper airway through an iPhone recording using a coupling device attached to a nasopharyngolaryngoscope (NPL). Methods: A prospective, blinded study was performed for pediatric patients requiring an NPL during an office visit. The NPL was performed using a coupling device attached to a smartphone to record the examination. A second, remote otolaryngologist then evaluated the recorded examination. Both otolaryngologists evaluated findings of anatomic sites including nasopharynx, oropharynx, base of tongue, larynx including subsites of epiglottis, arytenoids, aryepiglottic folds, false vocal cords, true vocal cords, patency of airway, and diagnostic impression, all of which were documented through a survey. Results of the survey were evaluated through inter-rater agreement using the κ statistic. Results: Forty-five patients underwent an NPL, all of which were included in the study. The average age was 4.9 years. The most common complaint requiring NPL was noisy breathing (n = 16). The inter-rater agreement for overall diagnosis was 0.74 with 80% percent agreement, rated as “good.” Other anatomic subsites with “good” or better inter-rater agreement were nasopharynx (0.75), oropharynx (0.75), and true vocal cords (0.71), with strong percentage agreement of 89%, 91%, and 87%, respectively. Both users of the adaptor found the recording setup to run smoothly. Conclusion: A telemedicine device for NPL use demonstrates strong diagnostic accuracy across providers and good overall evaluation. It holds potential for use in remote settings.


2021 ◽  
Vol 09 (03) ◽  
pp. E388-E394
Author(s):  
Francesco Cocomazzi ◽  
Marco Gentile ◽  
Francesco Perri ◽  
Antonio Merla ◽  
Fabrizio Bossa ◽  
...  

Abstract Background and study aims The Paris classification of superficial colonic lesions has been widely adopted, but a simplified description that subgroups the shape into pedunculated, sessile/flat and depressed lesions has been proposed recently. The aim of this study was to evaluate the accuracy and inter-rater agreement among 13 Western endoscopists for the two classification systems. Methods Seventy video clips of superficial colonic lesions were classified according to the two classifications, and their size estimated. The interobserver agreement for each classification was assessed using both Cohen k and AC1 statistics. Accuracy was taken as the concordance between the standard morphology definition and that made by participants. Sensitivity analyses investigated agreement between trainees (T) and staff members (SM), simple or mixed lesions, distinct lesion phenotypes, and for laterally spreading tumors (LSTs). Results Overall, the interobserver agreement for the Paris classification was substantial (κ = 0.61; AC1 = 0.66), with 79.3 % accuracy. Between SM and T, the values were superimposable. For size estimation, the agreement was 0.48 by the κ-value, and 0.50 by AC1. For single or mixed lesions, κ-values were 0.60 and 0.43, respectively; corresponding AC1 values were 0.68 and 0.57. Evaluating the several different polyp subtypes separately, agreement differed significantly when analyzed by the k-statistics (0.08–0.12) or the AC1 statistics (0.59–0.71). Analyses of LSTs provided a κ-value of 0.50 and an AC1 score of 0.62, with 77.6 % accuracy. The simplified classification outperformed the Paris classification: κ = 0.68, AC1 = 0.82, accuracy = 91.6 %. Conclusions Agreement is often measured with Cohen’s κ, but we documented higher levels of agreement when analyzed with the AC1 statistic. The level of agreement was substantial for the Paris classification, and almost perfect for the simplified system.


2019 ◽  
Vol 61 (2) ◽  
pp. 253-259
Author(s):  
Iroshani Kodikara ◽  
Iroshini Abeysekara ◽  
Dhanusha Gamage ◽  
Isurani Ilayperuma

Background Volume estimation of organs using two-dimensional (2D) ultrasonography is frequently warranted. Considering the influence of estimated volume on patient management, maintenance of its high accuracy is empirical. However, data are scarce regarding the accuracy of estimated volume of non-globular shaped objects of different volumes. Purpose To evaluate the volume estimation accuracy of different shaped and sized objects using high-end 2D ultrasound scanners. Material and Methods Globular (n=5); non-globular elongated (n=5), and non-globular near-spherical shaped (n=4) hollow plastic objects were scanned to estimate the volumes; actual volumes were compared with estimated volumes. T-test and one-way ANOVA were used to compare means; P<0.05 was considered significant. Results The actual volumes of the objects were in the range of 10–445 mL; estimated volumes ranged from 6.4–425 mL ( P=0.067). The estimated volume was lower than the actual volume; such volume underestimation was marked for non-globular elongated objects. Regardless of the scanner, the highest volume estimation error was for non-globular elongated objects (<40%) followed by non-globular near-spherical shaped objects (<23.88%); the lowest was for globular objects (<3.6%). Irrespective of the shape or the volume of the object, volume estimation difference among the scanners was not significant: globular (F=0.430, P=0.66); non-globular elongated (F=3.69, P=0.064); and non-globular near-spherical (F=4.00, P=0.06). A good inter-rater agreement (R=0.99, P<0.001) and a good correlation between actual versus estimated volumes (R=0.98, P<0.001) were noted. Conclusion The 2D ultrasonography can be recommended for volume estimation purposes of different shaped and different sized objects, regardless the type of the high-end scanner used.


Sign in / Sign up

Export Citation Format

Share Document