Reliability of the Grading System for Voiding Cystourethrograms in the Management of Vesicoureteral Reflux: An Interrater Comparison

Aim. Vesicoureteral reflux (VUR) is one of the most common conditions seen in pediatric urology. Fortunately, there are many treatment options for this disorder. The grading system for VUR varies among doctors, and the literature on its reliability is sparse. Here, we assessed the effectiveness of the current VUR grading system.Methods. A series of 40 voiding cystourethrogram (VCUG) studies were selected. Four pediatric urologists (PU) and four pediatric radiologists (PR) independently graded each VCUG and then agreed on a uniform interpretation. For statistical analysis the intraclass correlation coefficient (ICC) was applied to assess interrater agreement.Results. ICC values ranging from 0.82 to 0.88 reflected the strong reliability of VCUG for grading cases of VUR among pediatric urologists and radiologists as separate groups, and the reliability between the two groups was also good, as indicated by an ICC of 0.89. Despite the high ICC, disagreement existed between raters; the lowest agreement was associated with middle grades (III and IV).Conclusions. The interrater reliability of the international grading system for VUR was high but imperfect. Thus, grading differences at middle grades can profoundly influence the type of treatment pursued.

Download Full-text

Intrarater and Interrater Reliability of Infrared Image Analysis of Forearm Acupoints before and after Moxibustion

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2020/6328756 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Jiali Lou ◽

Yongliang Jiang ◽

Hantong Hu ◽

Xiaoyu Li ◽

Yajun Zhang ◽

...

Keyword(s):

Image Analysis ◽

Correlation Coefficient ◽

Temperature Change ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Infrared Image ◽

Infrared Images ◽

Intrarater Reliability ◽

Before And After

The objective of this study was to determine the intrarater and interrater reliabilities of infrared image analysis of forearm acupoints before and after moxibustion. In this work, infrared images of acupoints in the forearm of 20 volunteers (M/F, 10/10) were collected prior to and after moxibustion by infrared thermography (IRT). Two trained raters performed the analysis of infrared images in two different periods at a one-week interval. The intraclass correlation coefficient (ICC) was calculated to determine the intrarater and interrater reliabilities. With regard to the intrarater reliability, ICC values were between 0.758 and 0.994 (substantial to excellent). For the interrater reliability, ICC values ranged from 0.707 to 0.964 (moderate to excellent). Given that the intrarater and interrater reliability levels show excellent concordance, IRT could be a reliable tool to monitor the temperature change of forearm acupoints induced by moxibustion.

Download Full-text

Repeatability and Agreement of Central Corneal Thickness and Keratometry Measurements between Four Different Devices

Journal of Ophthalmology ◽

10.1155/2017/6181405 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 9

Author(s):

Laszlo Kiraly ◽

Jana Stange ◽

Kathleen S. Kunert ◽

Saadettin Sel

Keyword(s):

Clinical Practice ◽

Statistical Analysis ◽

Correlation Coefficient ◽

Central Corneal Thickness ◽

Corneal Thickness ◽

Intraclass Correlation ◽

Bland Altman Method ◽

Significant Difference ◽

Healthy Eyes ◽

Iolmaster 700

Background.To estimate repeatability and comparability of central corneal thickness (CCT) and keratometry measurements obtained by four different devices in healthy eyes.Methods.Fifty-five healthy eyes from 55 volunteers were enrolled in this study. CCT (IOLMaster 700, Pentacam HR, and Cirrus HD-OCT) and keratometry readings (IOLMaster 700, Pentacam HR, and iDesign) were measured. For statistical analysis, the corneal spherocylinder was converted into power vectors (J0, J45). Repeatability was assessed by intraclass correlation coefficient (ICC). Agreement of measurements between the devices was evaluated by the Bland-Altman method.Results.The analysis of repeatability of CCT data of IOLMaster 700, Pentacam HR, and Cirrus HD-OCT showed high ICCs (range 0.995 to 0.999). The comparison of CCT measurements revealed statistically significant differences between Pentacam HR versus IOLMaster 700 (p<0.0001) and Pentacam HR versus Cirrus HD-OCT (p<0.0001), respectively. There was no difference in CCT measurements between IOLMaster 700 and Cirrus HD-OCT (p=0.519). The repeatability of keratometry readings (J0 and J45) of IOLMaster 700, Pentacam HR, and iDesign was also high with ICCs ranging from 0.974 to 0.999. The Pentacam HR revealed significantly higher J0 in comparison to IOLMaster 700 (p=0.009) and iDesign (p=0.041); however, no significant difference was between IOLMaster 700 and iDesign (p=0.426). Comparison of J45 showed no significant difference between IOLMaster 700, Pentacam HR, and iDesign. These results were in accordance with Bland-Altman plots.Conclusion.In clinical practice, the devices analyzed should not be used interchangeably due to low agreement regarding CCT as well as keratometry readings.

Download Full-text

Evaluating interrater agreement with intraclass correlation coefficient in SPICE-based software process assessment

Third International Conference on Quality Software, 2003. Proceedings. ◽

10.1109/qsic.2003.1319115 ◽

2003 ◽

Cited By ~ 2

Author(s):

Hyung-Min Park ◽

Ho-Won Jung

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Software Process ◽

Intraclass Correlation ◽

Interrater Agreement ◽

Process Assessment ◽

Software Process Assessment

Download Full-text

Reliability of the International Knee Documentation Committee Radiographic Grading System

The American Journal of Sports Medicine ◽

10.1177/0363546507299742 ◽

2007 ◽

Vol 35 (6) ◽

pp. 933-935 ◽

Cited By ~ 28

Author(s):

Vishal M. Mehta ◽

Liz W. Paxton ◽

Stefan X. Fornalski ◽

Rick P. Csintalan ◽

Donald C. Fithian

Keyword(s):

Correlation Coefficient ◽

Interrater Reliability ◽

Patellofemoral Joint ◽

Case Series ◽

International Knee Documentation Committee ◽

Joint Space ◽

Grading System ◽

Intrarater Reliability ◽

Mixed Effect ◽

Radiographic Grading

Background The International Knee Documentation Committee (IKDC) forms are commonly used to measure outcomes after anterior cruciate ligament (ACL) reconstruction. The knee examination portion of the IKDC forms includes a radiographic grading system to grade degenerative changes. The interrater and intrarater reliability of this radiographic grading system remain unknown. Hypothesis We hypothesize that the IKDC radiographic grading system will have acceptable interrater and intrarater reliability. Study Design Case series (diagnosis); Level of evidence, 4. Methods Radiographs of 205 ACL-reconstructed knees were obtained at 5-year follow-up. Specifically, weightbearing posteroanterior radiographs of the operative knee in 35° to 45° of flexion and a lateral radiograph in 30° of flexion were used. The radiographs were independently graded by 2 sports medicine fellowship—trained orthopaedic surgeons using the IKDC 2000 standard instructions. One surgeon graded the same radiographs 6 months apart, blinded to patient and prior IKDC grades. The percentage agreement was calculated for each of the 5 knee compartments as defined by the IKDC. Interrater reliability was evaluated using the intraclass correlation coefficient (ICC) 2-way mixed effect model with absolute agreement. The Spearman rank-order correlation coefficient (rs) was applied to evaluate intrarater reliability. Results The interrater agreement between the 2 surgeons was 59% for the medial joint space (ICC = 0.46; 95% confidence interval [CI] = 0.35-0.56), 54% for the lateral joint space (ICC = 0.45; 95% CI = 0.27-0.58), 49% for the patellofemoral joint (ICC = 0.40; 95% CI = 0.26-0.52), 63% for the anterior joint space (ICC = 0.20; 95% CI = 0.05-0.34), and 44% for the posterior joint space (ICC = 0.28; 95% CI = 0.15-0.40). The intrarater agreement was 83% for the medial joint space (rs = .77, P < .001), 86% for the lateral joint space (rs = .76, P < .001), 81% for the patellofemoral joint (rs = .79, P < .001), 91% for the anterior joint space (rs = .48, P < .001), and 69% for the posterior joint space (rs = .64, P < .001). Conclusions While intrarater reliability was acceptable, interrater reliability was poor. These findings suggest that multiple raters may score the same radiographs differently using the IKDC radiographic grading system. The use of a single rater to grade all radiographs when using the IKDC radiographic grading system maximizes reliability.

Download Full-text

Assessment of Occupational Functioning for Screening of Patients to Occupational Therapy in General Psychiatric Care

The Occupational Therapy Journal of Research ◽

10.1177/153944929801800405 ◽

1998 ◽

Vol 18 (4) ◽

pp. 193-206 ◽

Cited By ~ 6

Author(s):

Lena Haglund ◽

Lars-Hakan Thorell ◽

Jan Walinder

Keyword(s):

Occupational Therapy ◽

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

Psychiatric Care ◽

Rating Scale ◽

Intraclass Correlation ◽

Case Analysis ◽

Occupational Therapist ◽

Swedish Version

A Swedish version of the Occupational Case Analysis Interview and Rating Scale (OCAIRS-S) has been tested earlier for interrater reliability. The present study, using the second version of OCAIRS-S and including a sample of 145 patients, showed interrater correlations between .88 and .96 (Intraclass Correlation Coefficient). The results indicate that OCAIRS-S predicts which patients should be included in and excluded from occupational therapy and identifies patients who should be observed more before making such decisions. The study indicates a need for further investigations regarding which components in OCAIRS-S influence the occupational therapist in judging the patient's need for occupational therapy.

Download Full-text

The reliability and validity of goniometric elbow measurements in adults: A systematic review of the literature

Shoulder & Elbow ◽

10.1177/1758573218774326 ◽

2018 ◽

Vol 10 (4) ◽

pp. 274-284 ◽

Cited By ~ 8

Author(s):

Suzanne F van Rijn ◽

Elisa L Zwerus ◽

Koen LM Koenraadt ◽

Wilco CH Jacobs ◽

Michel PJ van den Bekerom ◽

...

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

Literature Search ◽

Meta Analysis ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Diagnostic Reliability ◽

Measuring Tool ◽

Simple Measuring

Background The universal goniometer is a simple measuring tool. With this review we aimed to investigate the reliability and validity of the universal goniometer in measurements of the adults' elbow. Methods Preferred Reporting Items for Systematic reviews and Meta-Analysis guidelines were followed and our study protocol was published online at PROSPERO. A literature search was conducted on relevant studies. Methodological quality was assessed using the Quality Appraisal of Diagnostic Reliability (QAREL) scoring system. Results Out of 697 studies yielded from our literature search, 12 were included. Six studies were rated as high quality. The intrarater reliability intraclass correlation coefficient ranged from 0.45 to 0.99, the interrater reliability ranged from intraclass correlation coefficient 0.53–0.97. One study providing instructions on goniometric alignment did not find a difference in expert versus non-expert examiners. Another study in which examiners were not instructed found a higher interrater reliability in expert examiners. One study investigating the validity of the goniometer in elbow measurements found a maximum standard error of the mean of 11.5° for total range of motion. Discussion Overall, the studies showed high intra- and interrater reliability of the universal goniometer. The reliability of the universal goniometer in non-expert examiners can be increased by clear instructions on goniometric alignment.

Download Full-text

Interrater reliability of 2 sedation scales in a medical intensive care unit: a preliminary report

American Journal of Critical Care ◽

10.4037/ajcc2001.10.2.79 ◽

2001 ◽

Vol 10 (2) ◽

pp. 79-83 ◽

Cited By ~ 23

Author(s):

LH Hogg ◽

MB Bobek ◽

LC Mion ◽

BM Legere ◽

S Banjac ◽

...

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Motor Activity ◽

Correlation Coefficient ◽

Interrater Reliability ◽

Medical Intensive Care Unit ◽

Intraclass Correlation ◽

Assessment Scale ◽

Activity Assessment ◽

Medical Intensive Care

BACKGROUND: Critical care nurses must assess the effectiveness of sedatives and analgesic agents in order to titrate doses. OBJECTIVES: To measure the interrater reliability of 2 sedation scales used to assess patients in medical intensive care units. METHODS: The interrater reliabilities of the Motor Activity Assessment Scale and the Luer sedation scale were compared prospectively in 31 patients receiving mechanical ventilation in an 18-bed medical intensive care unit of a tertiary care institution. Three registered nurses, 1 clinical pharmacist, and 1 physician simultaneously and independently followed a standardized procedure to rate each patient by using the 2 scales. Scales were randomly ordered to counteract ordering effect. Analysis of variance with post hoc Duncan multiple range tests was used to detect bias; a correlation coefficient matrix was used to examine degree of association among raters; and the intraclass correlation coefficient was measured to control for multiple raters. RESULTS: No significant bias was detected with either scale. The Motor Activity Assessment Scale had less variation (Pearson r = 0.75-0.92) than did the Luer scale (Pearson r = 0.37-0.94) and had a stronger intraclass correlation coefficient (0.81 vs 0.79). CONCLUSIONS: The Motor Activity Assessment Scale showed the highest consistency among raters.

Download Full-text

Interrater Agreement and Interrater Reliability: Implications for Multilevel Research

Oxford Research Encyclopedia of Business and Management ◽

10.1093/acrefore/9780190224851.013.222 ◽

2021 ◽

Author(s):

Jenell L. S. Wittmer ◽

James M. LeBreton

Keyword(s):

Customer Service ◽

Interrater Reliability ◽

Research Question ◽

Intraclass Correlation ◽

Interrater Agreement ◽

Main Research ◽

Level Data ◽

Multilevel Research ◽

Data Points ◽

Research Questions

Statistics used to index interrater similarity are prevalent in many areas of the social sciences, with multilevel research being one of the most common domains for estimating interrater similarity. Multilevel research spans multiple hierarchical levels, such as individuals, teams, departments, and the organization. There are three main research questions that multilevel researchers answer using indices of interrater agreement and interrater reliability: (a) Does the nesting of lower-level units (e.g., employees) within higher-level units (e.g., work teams) result in the non-independence of residuals, which is an assumption of the general linear model?; (b) Is there sufficient agreement between scores on measures collected from lower-level units (e.g., employees perceptions of customer service climate) to justify aggregating data to the higher-level (e.g., team-level climate)?; and (c) Following data aggregation, how effective are the higher-level unit means at distinguishing between those higher levels (e.g., how reliably do team climate scores distinguish between the teams)? Interrater agreement and interrater reliability refer to the extent to which lower-level data nested or clustered within a higher-level unit are similar to one another. While closely related, interrater agreement and reliability differ from one another in how similarity is defined. Interrater reliability is the relative consistency in lower-level data. For example, to what degree do the scores assigned by raters tend to correlate with one another? Alternatively, interrater agreement is the consensus of the lower-level data points. For example, estimates of interrater agreement are used to determine the extent to which ratings made by judges/observers could be considered interchangeable or equivalent in terms of their values. Thus, while interrater agreement and reliability both estimate the similarity of ratings by judges/observers, but they define interrater similarity in slightly different ways, and these statistics are suited to address different types of research questions. The first research question that these statistics address, the issue of non-independence, is typically measured using an interclass correlation statistic that is a function of both interrater reliability and agreement. However, in the context of non-independence, the intraclass correlation is most often interpreted as an effect size. The second multilevel research question, concerning adequate agreement to aggregate lower-level data to a higher level, would require a measure on interrater agreement, as the research is looking for consensus among raters. Finally, the third multilevel research question, concerning the reliability of higher-level means, not only requires a different variation of the intraclass correlation, but is also a function of both interrater reliability and agreement. Multilevel research requires researchers to appropriately apply interrater agreement and/or reliability statistics to their data, as well as follow best practices for calculating and interpreting these statistics.

Download Full-text

Perineal Ultrasound as a Complement to POP-Q in the Assessment of Cystoceles

BioMed Research International ◽

10.1155/2014/740925 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Laila Najjari ◽

Julia Hennemann ◽

Pia Larscheid ◽

Thomas Papathemelis ◽

Nicolai Maass

Keyword(s):

Correlation Coefficient ◽

Classification System ◽

Interrater Reliability ◽

Pearson Correlation ◽

Kappa Coefficient ◽

Interrater Agreement ◽

Perineal Ultrasound ◽

Clinical Use ◽

Pearson Correlation Coefficient ◽

Reliable Tool

Purpose. In the present study we want to propose a classification system to quantify cystoceles by perineal ultrasound (PUS).Materials and Methods.120 PUS data were analyzed measuring the distance between the lowest point of the bladder and the midpubic line (MPL) during rest and Valsalva. Results were classified into groups and compared to POP-Q using theκ-coefficient. Results for exact bladder position were checked for interrater reliability using ICC and Pearson’s coefficient and results for classification were checked using theκ-coefficient. Bladder positions at rest and Valsalva were correlated with the distance between these points.Results. Highly significant differences concerning the position at rest and the distance between rest and Valsalva were found between the groups. For the interrater agreement, the Pearson correlation coefficient wasρ=0.98, the ICC (A-1) = 0.98, andκ=1.00. Comparing the classification results for POP-Q and PUS, the kappa-coefficient wasκ=0.65.Conclusion. PUS using the MPL and the classification system is a highly reliable tool for the evaluation of cystoceles. PUS shows good correlation with POP-Q. Furthermore, PUS offers a doubtless identification of the descending organ. Further studies are needed to evaluate the clinical use of the classification system proposed here.

Download Full-text

Reliability of Measurement of Glenohumeral Internal Rotation, External Rotation, and Total Arc of Motion in 3 Test Positions

Journal of Athletic Training ◽

10.4085/1062-6050-49.3.31 ◽

2014 ◽

Vol 49 (5) ◽

pp. 640-646 ◽

Cited By ~ 10

Author(s):

Mark A. Kevern ◽

Michael Beecher ◽

Smita Rao

Keyword(s):

Internal Rotation ◽

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

External Rotation ◽

Intraclass Correlation ◽

Test Procedure ◽

Intrarater Reliability ◽

Testing Procedures ◽

Test Position

Context: Athletes who participate in throwing and racket sports consistently demonstrate adaptive changes in glenohumeral-joint internal and external rotation in the dominant arm. Measurements of these motions have demonstrated excellent intrarater and poor interrater reliability. Objective: To determine intrarater reliability, interrater reliability, and standard error of measurement for shoulder internal rotation, external rotation, and total arc of motion using an inclinometer in 3 testing procedures in National Collegiate Athletic Association Division I baseball and softball athletes. Design: Cross-sectional study. Setting: Athletic department. Patients or Other Participants Thirty-eight players participated in the study. Shoulder internal rotation, external rotation, and total arc of motion were measured by 2 investigators in 3 test positions. The standard supine position was compared with a side-lying test position, as well as a supine test position without examiner overpressure. Results: Excellent intrarater reliability was noted for all 3 test positions and ranges of motion, with intraclass correlation coefficient values ranging from 0.93 to 0.99. Results for interrater reliability were less favorable. Reliability for internal rotation was highest in the side-lying position (0.68) and reliability for external rotation and total arc was highest in the supine-without-overpressure position (0.774 and 0.713, respectively). The supine-with-overpressure position yielded the lowest interrater reliability results in all positions. The side-lying position had the most consistent results, with very little variation among intraclass correlation coefficient values for the various test positions. Conclusions: The results of our study clearly indicate that the side-lying test procedure is of equal or greater value than the traditional supine-with-overpressure method.

Download Full-text