scholarly journals One EEG, one read – a manifesto towards reducing interrater variability among experts

Author(s):  
Fábio A. Nascimento ◽  
Jin Jing ◽  
Sándor Beniczky ◽  
Selim R. Benbadis ◽  
Jay Gavvala ◽  
...  
1998 ◽  
Vol 89 (1) ◽  
pp. 8-18 ◽  
Author(s):  
David M. Gaba ◽  
Steven K. Howard ◽  
Brendan Flanagan ◽  
Brian E. Smith ◽  
Kevin J. Fish ◽  
...  

Background Techniques are needed to assess anesthesiologists' performance when responding to critical events. Patient simulators allow presentation of similar crisis situations to different clinicians. This study evaluated ratings of performance, and the interrater variability of the ratings, made by multiple independent observers viewing videotapes of simulated crises. Methods Raters scored the videotapes of 14 different teams that were managing two scenarios: malignant hyperthermia (MH) and cardiac arrest. Technical performance and crisis management behaviors were rated. Technical ratings could range from 0.0 to 1.0 based on scenario-specific checklists of appropriate actions. Ratings of 12 crisis management behaviors were made using a five-point ordinal scale. Several statistical assessments of interrater variability were applied. Results Technical ratings were high for most teams in both scenarios (0.78 +/- 0.08 for MH, 0.83 +/- 0.06 for cardiac arrest). Ratings of crisis management behavior varied, with some teams rated as minimally acceptable or poor (28% for MH, 14% for cardiac arrest). The agreement between raters was fair to excellent, depending on the item rated and the statistical test used. Conclusions Both technical and behavioral performance can be assessed from videotapes of simulations. The behavioral rating system can be improved; one particular difficulty was aggregating a single rating for a behavior that fluctuated over time. These performance assessment tools might be useful for educational research or for tracking a resident's progress. The rating system needs more refinement before it can be used to assess clinical competence for residency graduation or board certification.


2000 ◽  
Vol 4 (4) ◽  
pp. 190-194
Author(s):  
Stewart L. Massad ◽  
Lynn Kirstein ◽  
Teresa Darragh ◽  
Pincas Bitterman ◽  
Mary Sidawy ◽  
...  

Dermatology ◽  
2019 ◽  
Vol 236 (1) ◽  
pp. 8-14 ◽  
Author(s):  
Katarzyna Włodarek ◽  
Aleksandra Stefaniak ◽  
Łukasz Matusiak ◽  
Jacek C. Szepietowski

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.


2020 ◽  
pp. 10.1212/CPJ.0000000000000830
Author(s):  
Seong Hoon Lee ◽  
Kah Long Aw ◽  
Ferghal McVerry ◽  
Mark O. McCarron

ObjectiveTo determine the interrater variability for TIA diagnostic agreement among expert clinicians (neurologists/stroke physicians), administrative data, and nonspecialists.MethodsWe performed a meta-analysis of studies from January 1984 to January 2019 using MEDLINE, EMBASE, and PubMed. Two reviewers independently screened for eligible studies and extracted interrater variability measurements using Cohen's kappa scores to assess diagnostic agreement.ResultsNineteen original studies consisting of 19,421 patients were included. Expert clinicians demonstrate good agreement for TIA diagnosis (κ = 0.71, 95% confidence interval [CI] = 0.62–0.81). Interrater variability between clinicians' TIA diagnosis and administrative data also demonstrated good agreement (κ = 0.68, 95% CI = 0.62–0.74). There was moderate agreement (κ = 0.41, 95% CI = 0.22–0.61) between referring clinicians and clinicians at TIA clinics receiving the referrals. Sixty percent of 748 patient referrals to TIA clinics were TIA mimics.ConclusionsOverall agreement between expert clinicians was good for TIA diagnosis, although variation still existed for a sizeable proportion of cases. Diagnostic agreement for TIA decreased among nonspecialists. The substantial number of patients being referred to TIA clinics with other (often neurologic) diagnoses was large, suggesting that clinicians, who are proficient in managing TIAs and their mimics, should run TIA clinics.


2016 ◽  
Vol 7 (3) ◽  
pp. ar.2016.7.0176
Author(s):  
Georges K. Ziade ◽  
Reem A. Karami ◽  
Ghina B. Fakhri ◽  
Elie S. Alam ◽  
Abdul Latif Hamdan ◽  
...  

Objective To study if nasal endoscope can be a reliable tool in assessing patients with allergic rhinitis. Materials and Methods A prospective study. Patients who were diagnosed with allergic rhinitis underwent a nasal endoscopic examination performed by two physicians blinded to the scoring of each other. A correlation was made among symptom severity, endoscopic findings, and interrater variability. Results Ninety patients were included in the study: 34 patients had mild disease and 56 had moderate-to-severe allergic rhinitis according to the Allergic Rhinitis and its Impact on Asthma guidelines. Increases in mucosal edema and bluish discoloration were predictive of the severity of allergic rhinitis disease (p < 0.05). The presence of nasal secretions was not predictive of allergic rhinitis. Interrater reliability was fair for mucosal edema, moderate-to-almost perfect for the rest of the endoscopic findings. Conclusion Nasal endoscopy may reveal signs that are predictive of the severity of allergic rhinitis. A detailed checklist is needed for the nasal endoscopic examination to decrease interrater variability.


Stroke ◽  
2013 ◽  
Vol 44 (10) ◽  
pp. 2907-2909 ◽  
Author(s):  
Robbert-Jan Van Hooff ◽  
Melissa Cambron ◽  
Rita Van Dyck ◽  
Ann De Smedt ◽  
Maarten Moens ◽  
...  

Background and Purpose— We evaluated the feasibility and the reliability of remote stroke severity quantification in the prehospital setting using the Unassisted TeleStroke Scale (UTSS) via a telestroke ambulance system and a fourth-generation mobile network. Methods— The technical feasibility and the reliability of the UTSS were studied in healthy volunteers mimicking 41 stroke syndromes during ambulance transportation. Results— Except for 1 issue, high-quality telestroke assessment was feasible in all scenarios. The mean examination time for the UTSS was 3.1 minutes (SD, 0.4). The UTSS showed excellent intrarater and interrater variability (ρ=0.98 and 0.97; P <0.001), as well as excellent internal consistency and rater agreement. Adequate concurrent validity can be derived from the strong correlation between the UTSS and the National Institutes of Health Stroke Scale (ρ=0.90; P <0.001). Conclusions— Remote assessment of stroke severity in fast-moving ambulances using a system dedicated to prehospital telemedicine, 4G technology, and the UTSS is feasible and reliable.


2016 ◽  
Vol 22 (9) ◽  
pp. 1174-1183 ◽  
Author(s):  
Mike P Wattjes ◽  
Martijn T Wijburg ◽  
Anke Vennegoor ◽  
Birgit I Witte ◽  
Stefan D Roosendaal ◽  
...  

Background: In natalizumab-treated multiple sclerosis (MS) patients, magnetic resonance imaging (MRI) is considered as a sensitive tool in detecting both MS disease activity and progressive multifocal leukoencephalopathy (PML). Objective: To investigate the performance of neuroradiologists using brain MRI in detecting new MS lesions and asymptomatic PML lesions and in differentiating between MS and PML lesions in natalizumab-treated MS patients. The secondary aim was to investigate interrater variability. Methods: In this retrospective diagnostic study, four blinded neuroradiologists assessed reference and follow-up brain MRI scans of 48 natalizumab-treated MS patients with new asymptomatic PML lesions ( n = 21) or new MS lesions ( n = 20) or no new lesions ( n = 7). Sensitivity and specificity for detection of new lesions in general (MS and PML lesions), MS and PML lesion differentiation, and PML detection were determined. Interrater agreement was calculated. Results: Overall sensitivity and specificity for the detection of new lesions, regardless of the nature of the lesions, were 77.4% and 89.3%, respectively; for PML-MS lesion differentiation, 74.2% and 84.7%, respectively; and for asymptomatic PML lesion detection, 59.5% and 91.7%, respectively. Interrater agreement for the tested categories was fair to moderate. Conclusion: The diagnostic performance of trained neuroradiologists using brain MRI in pharmacovigilance of natalizumab-treated MS patients is moderately good. Interrater agreement among trained readers is fair to moderate.


Sign in / Sign up

Export Citation Format

Share Document