One EEG, one read – a manifesto towards reducing interrater variability among experts

Background Techniques are needed to assess anesthesiologists' performance when responding to critical events. Patient simulators allow presentation of similar crisis situations to different clinicians. This study evaluated ratings of performance, and the interrater variability of the ratings, made by multiple independent observers viewing videotapes of simulated crises. Methods Raters scored the videotapes of 14 different teams that were managing two scenarios: malignant hyperthermia (MH) and cardiac arrest. Technical performance and crisis management behaviors were rated. Technical ratings could range from 0.0 to 1.0 based on scenario-specific checklists of appropriate actions. Ratings of 12 crisis management behaviors were made using a five-point ordinal scale. Several statistical assessments of interrater variability were applied. Results Technical ratings were high for most teams in both scenarios (0.78 +/- 0.08 for MH, 0.83 +/- 0.06 for cardiac arrest). Ratings of crisis management behavior varied, with some teams rated as minimally acceptable or poor (28% for MH, 14% for cardiac arrest). The agreement between raters was fair to excellent, depending on the item rated and the statistical test used. Conclusions Both technical and behavioral performance can be assessed from videotapes of simulations. The behavioral rating system can be improved; one particular difficulty was aggregating a single rating for a behavior that fluctuated over time. These performance assessment tools might be useful for educational research or for tracking a resident's progress. The rating system needs more refinement before it can be used to assess clinical competence for residency graduation or board certification.

Download Full-text

Interrater Variability in the Assessment of Skin Reactions in Breast Cancer Radiation Therapy: Impact of Grading Scales

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2013.06.1306 ◽

2013 ◽

Vol 87 (2) ◽

pp. S494 ◽

Cited By ~ 1

Author(s):

A. Kapur ◽

C. Evans ◽

B. Bloom ◽

J. Ames ◽

C. Morgenstern ◽

...

Keyword(s):

Breast Cancer ◽

Radiation Therapy ◽

Skin Reactions ◽

Interrater Variability ◽

Grading Scales

Download Full-text

Interrater Variability in Diagnosis of Cervical Biopsies from Women with HIV-1: Results from the Womenʼs Interagency HIV Study

Journal of Lower Genital Tract Disease ◽

10.1097/00128360-200004040-00002 ◽

2000 ◽

Vol 4 (4) ◽

pp. 190-194

Author(s):

Stewart L. Massad ◽

Lynn Kirstein ◽

Teresa Darragh ◽

Pincas Bitterman ◽

Mary Sidawy ◽

...

Keyword(s):

Interrater Variability ◽

Women With Hiv ◽

Hiv 1

Download Full-text

Could Residents Adequately Assess the Severity of Hidradenitis Suppurativa? Interrater and Intrarater Reliability Assessment of Major Scoring Systems

Dermatology ◽

10.1159/000501771 ◽

2019 ◽

Vol 236 (1) ◽

pp. 8-14 ◽

Cited By ~ 1

Author(s):

Katarzyna Włodarek ◽

Aleksandra Stefaniak ◽

Łukasz Matusiak ◽

Jacek C. Szepietowski

Keyword(s):

Interrater Reliability ◽

Hidradenitis Suppurativa ◽

Intraclass Correlation ◽

Scoring Systems ◽

Staging System ◽

Severity Index ◽

Assessment Tools ◽

Intrarater Reliability ◽

Global Assessment Scale ◽

Interrater Variability

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.

Download Full-text

Systematic review and meta-analysis of diagnostic agreement in suspected TIA

Neurology Clinical Practice ◽

10.1212/cpj.0000000000000830 ◽

2020 ◽

pp. 10.1212/CPJ.0000000000000830

Author(s):

Seong Hoon Lee ◽

Kah Long Aw ◽

Ferghal McVerry ◽

Mark O. McCarron

Keyword(s):

Systematic Review ◽

Confidence Interval ◽

Administrative Data ◽

Meta Analysis ◽

Diagnostic Agreement ◽

Number Of Patients ◽

Interrater Variability ◽

Good For ◽

Patient Referrals ◽

Good Agreement

ObjectiveTo determine the interrater variability for TIA diagnostic agreement among expert clinicians (neurologists/stroke physicians), administrative data, and nonspecialists.MethodsWe performed a meta-analysis of studies from January 1984 to January 2019 using MEDLINE, EMBASE, and PubMed. Two reviewers independently screened for eligible studies and extracted interrater variability measurements using Cohen's kappa scores to assess diagnostic agreement.ResultsNineteen original studies consisting of 19,421 patients were included. Expert clinicians demonstrate good agreement for TIA diagnosis (κ = 0.71, 95% confidence interval [CI] = 0.62–0.81). Interrater variability between clinicians' TIA diagnosis and administrative data also demonstrated good agreement (κ = 0.68, 95% CI = 0.62–0.74). There was moderate agreement (κ = 0.41, 95% CI = 0.22–0.61) between referring clinicians and clinicians at TIA clinics receiving the referrals. Sixty percent of 748 patient referrals to TIA clinics were TIA mimics.ConclusionsOverall agreement between expert clinicians was good for TIA diagnosis, although variation still existed for a sizeable proportion of cases. Diagnostic agreement for TIA decreased among nonspecialists. The substantial number of patients being referred to TIA clinics with other (often neurologic) diagnoses was large, suggesting that clinicians, who are proficient in managing TIAs and their mimics, should run TIA clinics.

Download Full-text

Reliability Assessment of the Endoscopic Examination in Patients with Allergic Rhinitis

Allergy & Rhinology ◽

10.2500/ar.2016.7.0176 ◽

2016 ◽

Vol 7 (3) ◽

pp. ar.2016.7.0176

Author(s):

Georges K. Ziade ◽

Reem A. Karami ◽

Ghina B. Fakhri ◽

Elie S. Alam ◽

Abdul Latif Hamdan ◽

...

Keyword(s):

Allergic Rhinitis ◽

Endoscopic Examination ◽

Nasal Endoscope ◽

Mild Disease ◽

Endoscopic Findings ◽

Interrater Variability ◽

Mucosal Edema ◽

Nasal Secretions ◽

A Prospective Study ◽

Bluish Discoloration

Objective To study if nasal endoscope can be a reliable tool in assessing patients with allergic rhinitis. Materials and Methods A prospective study. Patients who were diagnosed with allergic rhinitis underwent a nasal endoscopic examination performed by two physicians blinded to the scoring of each other. A correlation was made among symptom severity, endoscopic findings, and interrater variability. Results Ninety patients were included in the study: 34 patients had mild disease and 56 had moderate-to-severe allergic rhinitis according to the Allergic Rhinitis and its Impact on Asthma guidelines. Increases in mucosal edema and bluish discoloration were predictive of the severity of allergic rhinitis disease (p < 0.05). The presence of nasal secretions was not predictive of allergic rhinitis. Interrater reliability was fair for mucosal edema, moderate-to-almost perfect for the rest of the endoscopic findings. Conclusion Nasal endoscopy may reveal signs that are predictive of the severity of allergic rhinitis. A detailed checklist is needed for the nasal endoscopic examination to decrease interrater variability.

Download Full-text

Interrater Variability Amongst Radiation and Medical Oncologists in the Assessment of Radiation Dermatitis for Breast Cancer Patients

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2015.07.664 ◽

2015 ◽

Vol 93 (3) ◽

pp. E48 ◽

Cited By ~ 1

Author(s):

A. Kapur ◽

B.F. Bloom ◽

L. Potters

Keyword(s):

Breast Cancer ◽

Cancer Patients ◽

Radiation Dermatitis ◽

Breast Cancer Patients ◽

Medical Oncologists ◽

Interrater Variability

Download Full-text

Prehospital Unassisted Assessment of Stroke Severity Using Telemedicine

Stroke ◽

10.1161/strokeaha.113.002079 ◽

2013 ◽

Vol 44 (10) ◽

pp. 2907-2909 ◽

Cited By ~ 45

Author(s):

Robbert-Jan Van Hooff ◽

Melissa Cambron ◽

Rita Van Dyck ◽

Ann De Smedt ◽

Maarten Moens ◽

...

Keyword(s):

Concurrent Validity ◽

Mobile Network ◽

National Institutes Of Health ◽

Fourth Generation ◽

Stroke Severity ◽

Examination Time ◽

Excellent Internal Consistency ◽

Remote Assessment ◽

Interrater Variability ◽

The Mean

Background and Purpose— We evaluated the feasibility and the reliability of remote stroke severity quantification in the prehospital setting using the Unassisted TeleStroke Scale (UTSS) via a telestroke ambulance system and a fourth-generation mobile network. Methods— The technical feasibility and the reliability of the UTSS were studied in healthy volunteers mimicking 41 stroke syndromes during ambulance transportation. Results— Except for 1 issue, high-quality telestroke assessment was feasible in all scenarios. The mean examination time for the UTSS was 3.1 minutes (SD, 0.4). The UTSS showed excellent intrarater and interrater variability (ρ=0.98 and 0.97; P <0.001), as well as excellent internal consistency and rater agreement. Adequate concurrent validity can be derived from the strong correlation between the UTSS and the National Institutes of Health Stroke Scale (ρ=0.90; P <0.001). Conclusions— Remote assessment of stroke severity in fast-moving ambulances using a system dedicated to prehospital telemedicine, 4G technology, and the UTSS is feasible and reliable.

Download Full-text

Classical, Generalizability, and Multifaceted Rasch Detection of Interrater Variability in Large, Sparse Data Sets

The Journal of Experimental Education ◽

10.1080/00220970009598501 ◽

2000 ◽

Vol 68 (2) ◽

pp. 167-190 ◽

Cited By ~ 10

Author(s):

Peter D. Macmillan

Keyword(s):

Sparse Data ◽

Data Sets ◽

Interrater Variability ◽

Sparse Data Sets

Download Full-text

Diagnostic performance of brain MRI in pharmacovigilance of natalizumab-treated MS patients

Multiple Sclerosis Journal ◽

10.1177/1352458515615225 ◽

2016 ◽

Vol 22 (9) ◽

pp. 1174-1183 ◽

Cited By ~ 15

Author(s):

Mike P Wattjes ◽

Martijn T Wijburg ◽

Anke Vennegoor ◽

Birgit I Witte ◽

Stefan D Roosendaal ◽

...

Keyword(s):

Sensitivity And Specificity ◽

Diagnostic Performance ◽

Brain Mri ◽

Lesion Detection ◽

Interrater Agreement ◽

Diagnostic Study ◽

Mri Scans ◽

Interrater Variability ◽

Magnetic Resonance Imaging Mri

Background: In natalizumab-treated multiple sclerosis (MS) patients, magnetic resonance imaging (MRI) is considered as a sensitive tool in detecting both MS disease activity and progressive multifocal leukoencephalopathy (PML). Objective: To investigate the performance of neuroradiologists using brain MRI in detecting new MS lesions and asymptomatic PML lesions and in differentiating between MS and PML lesions in natalizumab-treated MS patients. The secondary aim was to investigate interrater variability. Methods: In this retrospective diagnostic study, four blinded neuroradiologists assessed reference and follow-up brain MRI scans of 48 natalizumab-treated MS patients with new asymptomatic PML lesions ( n = 21) or new MS lesions ( n = 20) or no new lesions ( n = 7). Sensitivity and specificity for detection of new lesions in general (MS and PML lesions), MS and PML lesion differentiation, and PML detection were determined. Interrater agreement was calculated. Results: Overall sensitivity and specificity for the detection of new lesions, regardless of the nature of the lesions, were 77.4% and 89.3%, respectively; for PML-MS lesion differentiation, 74.2% and 84.7%, respectively; and for asymptomatic PML lesion detection, 59.5% and 91.7%, respectively. Interrater agreement for the tested categories was fair to moderate. Conclusion: The diagnostic performance of trained neuroradiologists using brain MRI in pharmacovigilance of natalizumab-treated MS patients is moderately good. Interrater agreement among trained readers is fair to moderate.

Download Full-text