rater variability
Recently Published Documents


TOTAL DOCUMENTS

70
(FIVE YEARS 27)

H-INDEX

10
(FIVE YEARS 1)

2022 ◽  
Author(s):  
Miranda Julia Say ◽  
Ciarán O'Driscoll

Background: Despite its wide use in dementia diagnosis on the basis of cut-off points, the inter-rater variability of the ACE-III has been poorly studied.Methods: 31 healthcare professionals from an older adults’ mental health team scored two ACE-III protocols based on mock patients in a computerised form. Scoring accuracy, as well as total and domain-specific scoring variability, were calculated; factors relevant to participants were obtained, including their level of experience and self-rated confidence administering the ACE-III.Results: There was considerable inter-rater variability (up to 18 points for one of the cases), and one case’s mean score was significantly higher (by four points) than the true score. The Fluency, Visuospatial and Attention domains had greater levels of variability than Language and Memory. Higher levels of scoring accuracy were not associated with either greater levels of experience not higher self-confidence in administering the ACE-III.Conclusions: The results suggest that the ACE-III is susceptible to scoring error and considerable inter-rater variability, which highlights the critical importance of initial, and continued, administration and scoring training.


2021 ◽  
pp. 104063872110628
Author(s):  
Jane Westendorf ◽  
Bruce Wobeser ◽  
Tasha Epp

Inter- and intra-rater variability negatively affects the reliability of various histopathology grading scales used as prognostic aids in human and veterinary medicine. The Kenney–Doig categorization (grading) scale, which is used to associate equine endometrial histologic lesions with prognostic estimation of a broodmare’s reproductive potential, has not been evaluated for inter- or intra-rater variability, to our knowledge. To assess whether the Kenney–Doig system produces reliable results among observers, 8 pathologists, all with American College of Veterinary Pathologists certification, were recruited to blindly categorize the same set of 63 digital equine endometrial biopsy slides as well as to re-evaluate anonymously 21 of 63 of these slides at a later time. Cohen kappa values for pairwise comparison of final Kenney–Doig categories were −0.05 to 0.46 (unweighted) and 0.08–0.64 (weighted), with an average Light kappa of 0.19 (unweighted) and 0.36 (weighted) across all 8 pathologists, 0.14 (unweighted) and 0.33 (weighted) for pathologists at different institutions, and 0.22 (unweighted) and 0.46 (weighted) for pathologists at the same institution. Intra-class correlations measuring intra-rater agreement were 0.12–0.77 with an average of 0.55 for all 8 pathologists. We found that only slight-to-moderate inter-rater agreement and poor-to-good intra-rater agreement was produced by 8 pathologists using the Kenney–Doig scale, suggesting that the system is subject to significant observer variability and care should be taken when communicating Kenney–Doig categories to submitting clinicians with emphasis on the quality of endometrial lesions present instead of the category and associated expected foaling rate.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Christoph Haarburger ◽  
Gustav Müller‑Franzes ◽  
Leon Weninger ◽  
Christiane Kuhl ◽  
Daniel Truhn ◽  
...  
Keyword(s):  

Author(s):  
Sebastian Werner ◽  
Regina Gast ◽  
Rainer Grimmer ◽  
Andreas Wimmer ◽  
Marius Horger

Purpose To test the accuracy and reproducibility of a software prototype for semi-automated computer-aided volumetry (CAV) of part-solid pulmonary nodules (PSN) with separate segmentation of the solid part. Materials and Methods 66 PSNs were retrospectively identified in 34 thin-slice unenhanced chest CTs of 19 patients. CAV was performed by two medical students. Manual volumetry (MV) was carried out by two radiology residents. The reference standard was determined by an experienced radiologist in consensus with one of the residents. Visual assessment of CAV accuracy was performed. Measurement variability between CAV/MV and the reference standard as a measure of accuracy, CAV inter- and intra-rater variability as well as CAV intrascan variability between two recontruction kernels was determined via the Bland-Altman method and intraclass correlation coefficients (ICC). Results Subjectively assessed accuracy of CAV/MV was 77 %/79 %–80 % for the solid part and 67 %/73 %–76 % for the entire nodule. Measurement variability between CAV and the reference standard ranged from –151–117 % for the solid part and –106–54 % for the entire nodule. Interrater variability was –16–16 % for the solid part (ICC 0.998) and –102–65 % for the entire nodule (ICC 0.880). Intra-rater variability was –70–49 % for the solid part (ICC 0.992) and –111–31 % for the entire nodule (ICC 0.929). Intrascan variability between the smooth and the sharp reconstruction kernel was –45–39 % for the solid part and –21–46 % for the entire nodule. Conclusion Although the software prototype delivered satisfactory results when segmentation is evaluated subjectively, quantitative statistical analysis revealed room for improvement especially regarding the segmentation accuracy of the solid part and the reproducibility of measurements of the nodule’s subsolid margins. Key points: 


2021 ◽  
Vol 3 ◽  
Author(s):  
Christophe Letellier ◽  
Manel Lujan ◽  
Jean-Michel Arnal ◽  
Annalisa Carlucci ◽  
Michelle Chatwin ◽  
...  

Background: Patient-ventilator synchronization during non-invasive ventilation (NIV) can be assessed by visual inspection of flow and pressure waveforms but it remains time consuming and there is a large inter-rater variability, even among expert physicians. SyncSmart™ software developed by Breas Medical (Mölnycke, Sweden) provides an automatic detection and scoring of patient-ventilator asynchrony to help physicians in their daily clinical practice. This study was designed to assess performance of the automatic scoring by the SyncSmart software using expert clinicians as a reference in patient with chronic respiratory failure receiving NIV.Methods: From nine patients, 20 min data sets were analyzed automatically by SyncSmart software and reviewed by nine expert physicians who were asked to score auto-triggering (AT), double-triggering (DT), and ineffective efforts (IE). The study procedure was similar to the one commonly used for validating the automatic sleep scoring technique. For each patient, the asynchrony index was computed by automatic scoring and each expert, respectively. Considering successively each expert scoring as a reference, sensitivity, specificity, positive predictive value (PPV), κ-coefficients, and agreement were calculated.Results: The asynchrony index assessed by SynSmart was not significantly different from the one assessed by the experts (18.9 ± 17.7 vs. 12.8 ± 9.4, p = 0.19). When compared to an expert, the sensitivity and specificity provided by SyncSmart for DT, AT, and IE were significantly greater than those provided by an expert when compared to another expert.Conclusions:SyncSmart software is able to score asynchrony events within the inter-rater variability. When the breathing frequency is not too high (<24), it therefore provides a reliable assessment of patient-ventilator asynchrony; AT is over detected otherwise.


2021 ◽  
pp. 026565902110238
Author(s):  
Ava Karusoo-Musumeci ◽  
Wendy M Pearce ◽  
Michelle Donaghy

Oral narrative assessments are important for diagnosis of language disorders in school-age children so scoring needs to be reliable and consistent. This study explored the impact of training on the variability of story grammar scores in children’s oral narrative assessments scored by multiple raters. Fifty-one speech pathologists and 19 final-year speech pathology students attended training workshops on oral narrative assessment scoring and analysis. Participants scored two oral narratives prompted by two different story stimuli and produced by two children of differing ages. Demographic information, story grammar scores and a confidence survey were collected pre- and post-training. The total story grammar score changed significantly for one of the two oral narratives. A significant effect was observed for rater years of experience and the change in total story grammar scores post training, with undergraduate students showing the greatest change. Two story grammar elements, character and attempt, changed significantly for both stories, with an overall trend of increased element scores post-training. Confidence ratings also increased post-training. Findings indicated that training via an interactive workshop can reduce rater variability when using researcher-developed narrative scoring systems.


2021 ◽  
Author(s):  
Molly C. Goodier ◽  
Joel G. DeKoven ◽  
James S. Taylor ◽  
Denis Sasseville ◽  
Joseph F. Fowler ◽  
...  

Author(s):  
Freerk Prenzel ◽  
Uta Ceglarek ◽  
Ines Adams ◽  
Jutta Hammermann ◽  
Ulrike Issa ◽  
...  

Abstract Objectives Sweat chloride testing (SCT) is the mainstay for the diagnosis of cystic fibrosis (CF) and biomarker in the evaluation of CFTR-modifying drugs. To be a reliable and valid tool, analytical variance (CVA) must be minimized. However, external quality assessments have revealed significant deviations in routine clinical practice. Our goal was to identify and quantify technical errors through proficiency testing and simulations. Methods Chloride concentrations of three blinded samples (each as triplicates) were measured in 9 CF centers using a chloridometer in a routine setting. Technical errors were simulated and quantified in a series of measurements. We compared imprecision and bias before and after a counseling session by evaluating coefficients of variation (CV), adherence to tolerance limits, and inter-rater variability coefficients. Results Pipetting errors resulting in changes in sample volume were identified as the main source of error with deviations up to 41%. After the counseling session, the overall CVA decreased from 7.6 to 5.2%, the pass rate increased from 67 to 92%, and the inter-rater variability diminished. Significant deviations continued to be observed in individual centers. Conclusions Prevention of technical errors in SCT decreases imprecision and bias. Quality assurance programs must be established in all CF centers, including staff training, standard operating procedures, and proficiency testing.


2021 ◽  
Vol 94 (1120) ◽  
pp. 20201159
Author(s):  
Laura L. Wuyts ◽  
Michael Camerlinck ◽  
Didier De Surgeloose ◽  
Liesbet Vermeiren ◽  
David Ceulemans ◽  
...  

Objectives: To determine whether the revised 2018 ATS/ERS/JRS/ALAT radiological criteria for usual interstitial pneumonia (UIP) provide better diagnostic agreement compared to the 2011 guidelines. Methods: Cohort for this cross-sectional study (single center, nonacademic) was recruited from a multidisciplinary team discussion (MDD) from July 2010 until November 2018, with clinical suspicion of fibrosing interstitial lung disease (n= 325). Exclusion criteria were technical HRCT issues, known connective tissue disease (rheumatoid arthritis, systemic sclerosis, poly-or dermatomyositis), exposure to pulmonary toxins or lack of working diagnosis after MDD. Four readers with varying degrees in HRCT interpretation independently categorized 192 HRCTs, according to both the previous and current ATS/ERS/JRS/ALAT radiological criteria. An inter-rater variability analysis (Gwet’s second-order agreement coefficient, AC2) was performed. Results: The resulting Gwet’s AC2 for the 2011 and 2018 ATS/ERS/JRS/ALAT radiological criteria is 0.62 (±0.05) and 0.65 (±0.05), respectively. We report only minor differences in agreement level among the readers. Distribution according to the 2011 guidelines is as follows: 57.3% ‘UIP pattern’, 24% ‘possible UIP pattern’, 18.8% ‘inconsistent with UIP pattern’ and for the 2018 guidelines: 59.6% ‘UIP’, 14.5% ‘probable UIP’, 15.9% ‘indeterminate for UIP’ and 10% ‘alternative diagnosis’. Conclusions: No statistically significant higher degree of diagnostic agreement is observed when applying the revised 2018 ATS/ERS/JRS/ALAT radiological criteria for UIP compared to those of 2011. The inter-rater variability for categorizing the HRCT patterns is moderate for both classification systems, independent of experience in HRCT interpretation. The major advantage of the current guidelines is the better subdivision in the categories with a lower diagnostic certainty for UIP. Advances in knowledge: - In 2018, a revision of the 2011 ATS/ERS/JRS/ALAT radiological criteria for UIP was published, part of diagnostic guidelines for idiopathic pulmonary fibrosis. - The inter-rater agreement among radiologist is moderate for both classification systems, without a significantly higher degree of agreement when applying the revised radiological criteria.


Sign in / Sign up

Export Citation Format

Share Document