rater variability Latest Research Papers

Inter-rater variability in scoring of Addenbrooke’s Cognitive Examination – third edition (ACE-III)

10.31219/osf.io/7y9jf ◽

2022 ◽

Author(s):

Miranda Julia Say ◽

Ciarán O'Driscoll

Keyword(s):

Healthcare Professionals ◽

Dementia Diagnosis ◽

Mental Health Team ◽

Domain Specific ◽

Self Confidence ◽

Rater Variability ◽

Scoring Accuracy ◽

Level Of Experience ◽

Addenbrooke’S Cognitive Examination ◽

Scoring Error

Background: Despite its wide use in dementia diagnosis on the basis of cut-off points, the inter-rater variability of the ACE-III has been poorly studied.Methods: 31 healthcare professionals from an older adults’ mental health team scored two ACE-III protocols based on mock patients in a computerised form. Scoring accuracy, as well as total and domain-specific scoring variability, were calculated; factors relevant to participants were obtained, including their level of experience and self-rated confidence administering the ACE-III.Results: There was considerable inter-rater variability (up to 18 points for one of the cases), and one case’s mean score was significantly higher (by four points) than the true score. The Fluency, Visuospatial and Attention domains had greater levels of variability than Language and Memory. Higher levels of scoring accuracy were not associated with either greater levels of experience not higher self-confidence in administering the ACE-III.Conclusions: The results suggest that the ACE-III is susceptible to scoring error and considerable inter-rater variability, which highlights the critical importance of initial, and continued, administration and scoring training.

Download Full-text

IIB or not IIB, part 2: assessing inter-rater and intra-rater repeatability of the Kenney–Doig scale in equine endometrial biopsy evaluation

Journal of Veterinary Diagnostic Investigation ◽

10.1177/10406387211062866 ◽

2021 ◽

pp. 104063872110628

Author(s):

Jane Westendorf ◽

Bruce Wobeser ◽

Tasha Epp

Keyword(s):

Veterinary Medicine ◽

Reproductive Potential ◽

Pairwise Comparison ◽

Endometrial Biopsy ◽

Observer Variability ◽

Rater Agreement ◽

Grading Scale ◽

Rater Variability ◽

Grading Scales

Inter- and intra-rater variability negatively affects the reliability of various histopathology grading scales used as prognostic aids in human and veterinary medicine. The Kenney–Doig categorization (grading) scale, which is used to associate equine endometrial histologic lesions with prognostic estimation of a broodmare’s reproductive potential, has not been evaluated for inter- or intra-rater variability, to our knowledge. To assess whether the Kenney–Doig system produces reliable results among observers, 8 pathologists, all with American College of Veterinary Pathologists certification, were recruited to blindly categorize the same set of 63 digital equine endometrial biopsy slides as well as to re-evaluate anonymously 21 of 63 of these slides at a later time. Cohen kappa values for pairwise comparison of final Kenney–Doig categories were −0.05 to 0.46 (unweighted) and 0.08–0.64 (weighted), with an average Light kappa of 0.19 (unweighted) and 0.36 (weighted) across all 8 pathologists, 0.14 (unweighted) and 0.33 (weighted) for pathologists at different institutions, and 0.22 (unweighted) and 0.46 (weighted) for pathologists at the same institution. Intra-class correlations measuring intra-rater agreement were 0.12–0.77 with an average of 0.55 for all 8 pathologists. We found that only slight-to-moderate inter-rater agreement and poor-to-good intra-rater agreement was produced by 8 pathologists using the Kenney–Doig scale, suggesting that the system is subject to significant observer variability and care should be taken when communicating Kenney–Doig categories to submitting clinicians with emphasis on the quality of endometrial lesions present instead of the category and associated expected foaling rate.

Download Full-text

Author Correction: Radiomics feature reproducibility under inter-rater variability in segmentations of CT images

Scientific Reports ◽

10.1038/s41598-021-02114-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Christoph Haarburger ◽

Gustav Müller‑Franzes ◽

Leon Weninger ◽

Christiane Kuhl ◽

Daniel Truhn ◽

...

Keyword(s):

Ct Images ◽

Rater Variability

Download Full-text

Accuracy and Reproducibility of a Software Prototype for Semi-Automated Computer-Aided Volumetry of the solid and subsolid Components of part-solid Pulmonary Nodules

RöFo - Fortschritte auf dem Gebiet der Röntgenstrahlen und der bildgebenden Verfahren ◽

10.1055/a-1656-9834 ◽

2021 ◽

Author(s):

Sebastian Werner ◽

Regina Gast ◽

Rainer Grimmer ◽

Andreas Wimmer ◽

Marius Horger

Keyword(s):

Intraclass Correlation ◽

Pulmonary Nodules ◽

Visual Assessment ◽

Reference Standard ◽

Measurement Variability ◽

Experienced Radiologist ◽

Computer Aided ◽

Solid Part ◽

Rater Variability ◽

Software Prototype

Purpose To test the accuracy and reproducibility of a software prototype for semi-automated computer-aided volumetry (CAV) of part-solid pulmonary nodules (PSN) with separate segmentation of the solid part. Materials and Methods 66 PSNs were retrospectively identified in 34 thin-slice unenhanced chest CTs of 19 patients. CAV was performed by two medical students. Manual volumetry (MV) was carried out by two radiology residents. The reference standard was determined by an experienced radiologist in consensus with one of the residents. Visual assessment of CAV accuracy was performed. Measurement variability between CAV/MV and the reference standard as a measure of accuracy, CAV inter- and intra-rater variability as well as CAV intrascan variability between two recontruction kernels was determined via the Bland-Altman method and intraclass correlation coefficients (ICC). Results Subjectively assessed accuracy of CAV/MV was 77 %/79 %–80 % for the solid part and 67 %/73 %–76 % for the entire nodule. Measurement variability between CAV and the reference standard ranged from –151–117 % for the solid part and –106–54 % for the entire nodule. Interrater variability was –16–16 % for the solid part (ICC 0.998) and –102–65 % for the entire nodule (ICC 0.880). Intra-rater variability was –70–49 % for the solid part (ICC 0.992) and –111–31 % for the entire nodule (ICC 0.929). Intrascan variability between the smooth and the sharp reconstruction kernel was –45–39 % for the solid part and –21–46 % for the entire nodule. Conclusion Although the software prototype delivered satisfactory results when segmentation is evaluated subjectively, quantitative statistical analysis revealed room for improvement especially regarding the segmentation accuracy of the solid part and the reproducibility of measurements of the nodule’s subsolid margins. Key points:

Download Full-text

Infratentorial lesions in multiple sclerosis patients: intra- and inter-rater variability in comparison to a fully automated segmentation using 3D convolutional neural networks

European Radiology ◽

10.1007/s00330-021-08329-3 ◽

2021 ◽

Author(s):

Julia Krüger ◽

Ann-Christin Ostwaldt ◽

Lothar Spies ◽

Benjamin Geisler ◽

Alexander Schlaefer ◽

...

Keyword(s):

Multiple Sclerosis ◽

Neural Networks ◽

Convolutional Neural Networks ◽

Automated Segmentation ◽

Rater Variability ◽

Multiple Sclerosis Patients

Download Full-text

Patient-Ventilator Synchronization During Non-invasive Ventilation: A Pilot Study of an Automated Analysis System

Frontiers in Medical Technology ◽

10.3389/fmedt.2021.690442 ◽

2021 ◽

Vol 3 ◽

Author(s):

Christophe Letellier ◽

Manel Lujan ◽

Jean-Michel Arnal ◽

Annalisa Carlucci ◽

Michelle Chatwin ◽

...

Keyword(s):

Automated Analysis ◽

Invasive Ventilation ◽

Asynchrony Index ◽

Non Invasive ◽

Non Invasive Ventilation ◽

Rater Variability ◽

Analysis System ◽

The One ◽

Sensitivity Specificity ◽

Automatic Scoring

Background: Patient-ventilator synchronization during non-invasive ventilation (NIV) can be assessed by visual inspection of flow and pressure waveforms but it remains time consuming and there is a large inter-rater variability, even among expert physicians. SyncSmart™ software developed by Breas Medical (Mölnycke, Sweden) provides an automatic detection and scoring of patient-ventilator asynchrony to help physicians in their daily clinical practice. This study was designed to assess performance of the automatic scoring by the SyncSmart software using expert clinicians as a reference in patient with chronic respiratory failure receiving NIV.Methods: From nine patients, 20 min data sets were analyzed automatically by SyncSmart software and reviewed by nine expert physicians who were asked to score auto-triggering (AT), double-triggering (DT), and ineffective efforts (IE). The study procedure was similar to the one commonly used for validating the automatic sleep scoring technique. For each patient, the asynchrony index was computed by automatic scoring and each expert, respectively. Considering successively each expert scoring as a reference, sensitivity, specificity, positive predictive value (PPV), κ-coefficients, and agreement were calculated.Results: The asynchrony index assessed by SynSmart was not significantly different from the one assessed by the experts (18.9 ± 17.7 vs. 12.8 ± 9.4, p = 0.19). When compared to an expert, the sensitivity and specificity provided by SyncSmart for DT, AT, and IE were significantly greater than those provided by an expert when compared to another expert.Conclusions:SyncSmart software is able to score asynchrony events within the inter-rater variability. When the breathing frequency is not too high (<24), it therefore provides a reliable assessment of patient-ventilator asynchrony; AT is over detected otherwise.

Download Full-text

The effect of workshop training on rater variability in children’s oral narrative assessment

Child Language Teaching and Therapy ◽

10.1177/02656590211023839 ◽

2021 ◽

pp. 026565902110238

Author(s):

Ava Karusoo-Musumeci ◽

Wendy M Pearce ◽

Michelle Donaghy

Keyword(s):

Undergraduate Students ◽

Scoring Systems ◽

Oral Narratives ◽

School Age Children ◽

Oral Narrative ◽

Speech Pathology ◽

Story Grammar ◽

Narrative Assessment ◽

Rater Variability ◽

The Impact

Oral narrative assessments are important for diagnosis of language disorders in school-age children so scoring needs to be reliable and consistent. This study explored the impact of training on the variability of story grammar scores in children’s oral narrative assessments scored by multiple raters. Fifty-one speech pathologists and 19 final-year speech pathology students attended training workshops on oral narrative assessment scoring and analysis. Participants scored two oral narratives prompted by two different story stimuli and produced by two children of differing ages. Demographic information, story grammar scores and a confidence survey were collected pre- and post-training. The total story grammar score changed significantly for one of the two oral narratives. A significant effect was observed for rater years of experience and the change in total story grammar scores post training, with undergraduate students showing the greatest change. Two story grammar elements, character and attempt, changed significantly for both stories, with an overall trend of increased element scores post-training. Confidence ratings also increased post-training. Findings indicated that training via an interactive workshop can reduce rater variability when using researcher-developed narrative scoring systems.

Download Full-text

INTER‐RATER VARIABILITY IN PATCH TEST READINGS AND FINAL INTERPRETATION USING STORE‐FORWARD TELEDERMATOLOGY

Contact Dermatitis ◽

10.1111/cod.13856 ◽

2021 ◽

Author(s):

Molly C. Goodier ◽

Joel G. DeKoven ◽

James S. Taylor ◽

Denis Sasseville ◽

Joseph F. Fowler ◽

...

Keyword(s):

Patch Test ◽

Final Interpretation ◽

Rater Variability

Download Full-text

Audit of sweat chloride testing reveals analytical errors

Clinical Chemistry and Laboratory Medicine (CCLM) ◽

10.1515/cclm-2020-1661 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Freerk Prenzel ◽

Uta Ceglarek ◽

Ines Adams ◽

Jutta Hammermann ◽

Ulrike Issa ◽

...

Keyword(s):

Proficiency Testing ◽

Sample Volume ◽

Pass Rate ◽

Counseling Session ◽

Sweat Chloride ◽

Technical Errors ◽

Coefficients Of Variation ◽

Before And After ◽

Rater Variability ◽

Quality Assessments

Abstract Objectives Sweat chloride testing (SCT) is the mainstay for the diagnosis of cystic fibrosis (CF) and biomarker in the evaluation of CFTR-modifying drugs. To be a reliable and valid tool, analytical variance (CVA) must be minimized. However, external quality assessments have revealed significant deviations in routine clinical practice. Our goal was to identify and quantify technical errors through proficiency testing and simulations. Methods Chloride concentrations of three blinded samples (each as triplicates) were measured in 9 CF centers using a chloridometer in a routine setting. Technical errors were simulated and quantified in a series of measurements. We compared imprecision and bias before and after a counseling session by evaluating coefficients of variation (CV), adherence to tolerance limits, and inter-rater variability coefficients. Results Pipetting errors resulting in changes in sample volume were identified as the main source of error with deviations up to 41%. After the counseling session, the overall CVA decreased from 7.6 to 5.2%, the pass rate increased from 67 to 92%, and the inter-rater variability diminished. Significant deviations continued to be observed in individual centers. Conclusions Prevention of technical errors in SCT decreases imprecision and bias. Quality assurance programs must be established in all CF centers, including staff training, standard operating procedures, and proficiency testing.

Download Full-text

Comparison between the ATS/ERS/JRS/ALAT criteria of 2011 and 2018 for Usual Interstitial Pneumonia on HRCT: a cross-sectional study

British Journal of Radiology ◽

10.1259/bjr.20201159 ◽

2021 ◽

Vol 94 (1120) ◽

pp. 20201159

Author(s):

Laura L. Wuyts ◽

Michael Camerlinck ◽

Didier De Surgeloose ◽

Liesbet Vermeiren ◽

David Ceulemans ◽

...

Keyword(s):

Interstitial Pneumonia ◽

Usual Interstitial Pneumonia ◽

Cross Sectional Study ◽

Classification Systems ◽

Sectional Study ◽

Diagnostic Agreement ◽

Cross Sectional ◽

Radiological Criteria ◽

Diagnostic Certainty ◽

Rater Variability

Objectives: To determine whether the revised 2018 ATS/ERS/JRS/ALAT radiological criteria for usual interstitial pneumonia (UIP) provide better diagnostic agreement compared to the 2011 guidelines. Methods: Cohort for this cross-sectional study (single center, nonacademic) was recruited from a multidisciplinary team discussion (MDD) from July 2010 until November 2018, with clinical suspicion of fibrosing interstitial lung disease (n= 325). Exclusion criteria were technical HRCT issues, known connective tissue disease (rheumatoid arthritis, systemic sclerosis, poly-or dermatomyositis), exposure to pulmonary toxins or lack of working diagnosis after MDD. Four readers with varying degrees in HRCT interpretation independently categorized 192 HRCTs, according to both the previous and current ATS/ERS/JRS/ALAT radiological criteria. An inter-rater variability analysis (Gwet’s second-order agreement coefficient, AC2) was performed. Results: The resulting Gwet’s AC2 for the 2011 and 2018 ATS/ERS/JRS/ALAT radiological criteria is 0.62 (±0.05) and 0.65 (±0.05), respectively. We report only minor differences in agreement level among the readers. Distribution according to the 2011 guidelines is as follows: 57.3% ‘UIP pattern’, 24% ‘possible UIP pattern’, 18.8% ‘inconsistent with UIP pattern’ and for the 2018 guidelines: 59.6% ‘UIP’, 14.5% ‘probable UIP’, 15.9% ‘indeterminate for UIP’ and 10% ‘alternative diagnosis’. Conclusions: No statistically significant higher degree of diagnostic agreement is observed when applying the revised 2018 ATS/ERS/JRS/ALAT radiological criteria for UIP compared to those of 2011. The inter-rater variability for categorizing the HRCT patterns is moderate for both classification systems, independent of experience in HRCT interpretation. The major advantage of the current guidelines is the better subdivision in the categories with a lower diagnostic certainty for UIP. Advances in knowledge: - In 2018, a revision of the 2011 ATS/ERS/JRS/ALAT radiological criteria for UIP was published, part of diagnostic guidelines for idiopathic pulmonary fibrosis. - The inter-rater agreement among radiologist is moderate for both classification systems, without a significantly higher degree of agreement when applying the revised radiological criteria.

Download Full-text

rater variability
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Inter-rater variability in scoring of Addenbrooke’s Cognitive Examination – third edition (ACE-III)

IIB or not IIB, part 2: assessing inter-rater and intra-rater repeatability of the Kenney–Doig scale in equine endometrial biopsy evaluation

Author Correction: Radiomics feature reproducibility under inter-rater variability in segmentations of CT images

Accuracy and Reproducibility of a Software Prototype for Semi-Automated Computer-Aided Volumetry of the solid and subsolid Components of part-solid Pulmonary Nodules

Infratentorial lesions in multiple sclerosis patients: intra- and inter-rater variability in comparison to a fully automated segmentation using 3D convolutional neural networks

Patient-Ventilator Synchronization During Non-invasive Ventilation: A Pilot Study of an Automated Analysis System

The effect of workshop training on rater variability in children’s oral narrative assessment

INTER‐RATER VARIABILITY IN PATCH TEST READINGS AND FINAL INTERPRETATION USING STORE‐FORWARD TELEDERMATOLOGY

Audit of sweat chloride testing reveals analytical errors

Comparison between the ATS/ERS/JRS/ALAT criteria of 2011 and 2018 for Usual Interstitial Pneumonia on HRCT: a cross-sectional study

Export Citation Format

rater variabilityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Inter-rater variability in scoring of Addenbrooke’s Cognitive Examination – third edition (ACE-III)

IIB or not IIB, part 2: assessing inter-rater and intra-rater repeatability of the Kenney–Doig scale in equine endometrial biopsy evaluation

Author Correction: Radiomics feature reproducibility under inter-rater variability in segmentations of CT images

Accuracy and Reproducibility of a Software Prototype for Semi-Automated Computer-Aided Volumetry of the solid and subsolid Components of part-solid Pulmonary Nodules

Infratentorial lesions in multiple sclerosis patients: intra- and inter-rater variability in comparison to a fully automated segmentation using 3D convolutional neural networks

Patient-Ventilator Synchronization During Non-invasive Ventilation: A Pilot Study of an Automated Analysis System

The effect of workshop training on rater variability in children’s oral narrative assessment

INTER‐RATER VARIABILITY IN PATCH TEST READINGS AND FINAL INTERPRETATION USING STORE‐FORWARD TELEDERMATOLOGY

Audit of sweat chloride testing reveals analytical errors

Comparison between the ATS/ERS/JRS/ALAT criteria of 2011 and 2018 for Usual Interstitial Pneumonia on HRCT: a cross-sectional study

rater variability
Recently Published Documents