Observer reliability of arteriovenous malformations grading scales using current imaging modalities

2014 ◽  
Vol 120 (5) ◽  
pp. 1179-1187 ◽  
Author(s):  
Christoph J. Griessenauer ◽  
Joseph H. Miller ◽  
Bonita S. Agee ◽  
Winfield S. Fisher ◽  
Joel K. Curé ◽  
...  

Object The aim of this study was to examine observer reliability of frequently used arteriovenous malformation (AVM) grading scales, including the 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale, using current imaging modalities in a setting closely resembling routine clinical practice. Methods Five experienced raters, including 1 vascular neurosurgeon, 2 neuroradiologists, and 2 senior neurosurgical residents independently reviewed 15 MRI studies, 15 CT angiograms, and 15 digital subtraction angiograms obtained at the time of initial diagnosis. Assessments of 5 scans of each imaging modality were repeated for measurement of intrarater reliability. Three months after the initial assessment, raters reassessed those scans where there was disagreement. In this second assessment, raters were asked to justify their rating with comments and illustrations. Generalized kappa (κ) analysis for multiple raters, Kendall's coefficient of concordance (W), and interclass correlation coefficient (ICC) were applied to determine interrater reliability. For intrarater reliability analysis, Cohen's kappa (κ), Kendall's correlation coefficient (tau-b), and ICC were used to assess repeat measurement agreement for each rater. Results Interrater reliability for the overall 5-tier Spetzler-Martin scale was fair to good (ICC = 0.69) to extremely strong (Kendall's W = 0.73) on initial assessment and improved on reassessment. Assessment of CT angiograms resulted in the highest agreement, followed by MRI and digital subtraction angiography. Agreement for the overall 3-tier Spetzler-Ponce grade was fair to good (ICC = 0.68) to strong (Kendall's W = 0.70) on initial assessment, improved on reassessment, and was comparable to agreement for the 5-tier Spetzler-Martin scale. Agreement for the overall Pollock-Flickinger radiosurgery-based grade was excellent (ICC = 0.89) to extremely strong (Kendall's W = 0.81). Intrarater reliability for the overall 5-tier Spetzler-Martin grade was excellent (ICC > 0.75) in 3 of the 5 raters and fair to good (ICC > 0.40) in the other 2 raters. Conclusion The 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale all showed a high level of agreement. The improved reliability on reassessment was explained by a training effect from the initial assessment and the requirement to defend the rating, which outlines a potential downside for grades determined as part of routine clinical practice to be used for scientific purposes.

2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Jiali Lou ◽  
Yongliang Jiang ◽  
Hantong Hu ◽  
Xiaoyu Li ◽  
Yajun Zhang ◽  
...  

The objective of this study was to determine the intrarater and interrater reliabilities of infrared image analysis of forearm acupoints before and after moxibustion. In this work, infrared images of acupoints in the forearm of 20 volunteers (M/F, 10/10) were collected prior to and after moxibustion by infrared thermography (IRT). Two trained raters performed the analysis of infrared images in two different periods at a one-week interval. The intraclass correlation coefficient (ICC) was calculated to determine the intrarater and interrater reliabilities. With regard to the intrarater reliability, ICC values were between 0.758 and 0.994 (substantial to excellent). For the interrater reliability, ICC values ranged from 0.707 to 0.964 (moderate to excellent). Given that the intrarater and interrater reliability levels show excellent concordance, IRT could be a reliable tool to monitor the temperature change of forearm acupoints induced by moxibustion.


2017 ◽  
Vol 127 (1) ◽  
pp. 32-35 ◽  
Author(s):  
Paul M. Foreman ◽  
Christoph J. Griessenauer ◽  
Kimberly P. Kicielinski ◽  
Philip G. R. Schmalz ◽  
Brandon G. Rocque ◽  
...  

OBJECTIVEBlunt traumatic cerebrovascular injury (TCVI) represents structural injury to a vessel due to high-energy trauma. The Biffl Scale is a widely accepted grading scheme for these injuries that was developed using digital subtraction angiography. In recent years, screening CT angiography (CTA) has been used to identify patients with TCVI. The reliability of this scale, with injuries assessed using CTA, has not yet been determined.METHODSSeven independent raters, including 2 neurosurgeons, 2 neuroradiologists, 2 neurosurgical residents, and 1 neurosurgical vascular fellow, independently reviewed each presenting CTA of the neck performed in 40 patients with confirmed TCVI and assigned a Biffl grade. Ten images were repeated to assess intrarater reliability, for a total of 50 CTAs. Fleiss' multirater kappa (κ) and interclass correlation were calculated as a measure of interrater reliability. Weighted Cohen's κ was used to assess intrarater reliability.RESULTSFleiss' multirater κ was 0.65 (95% CI 0.61–0.69), indicating substantial agreement as to the Biffl grade assignment among the 7 raters. Interclass correlation was 0.82, demonstrating excellent agreement among the raters. Intrarater reliability was perfect (weighted Cohen's κ = 1) in 2 raters, and near perfect (weighted Cohen's κ > 0.8) in the remaining 5 raters.CONCLUSIONSGrading of TCVI with CTA using the Biffl Scale is reliable.


2007 ◽  
Vol 35 (6) ◽  
pp. 933-935 ◽  
Author(s):  
Vishal M. Mehta ◽  
Liz W. Paxton ◽  
Stefan X. Fornalski ◽  
Rick P. Csintalan ◽  
Donald C. Fithian

Background The International Knee Documentation Committee (IKDC) forms are commonly used to measure outcomes after anterior cruciate ligament (ACL) reconstruction. The knee examination portion of the IKDC forms includes a radiographic grading system to grade degenerative changes. The interrater and intrarater reliability of this radiographic grading system remain unknown. Hypothesis We hypothesize that the IKDC radiographic grading system will have acceptable interrater and intrarater reliability. Study Design Case series (diagnosis); Level of evidence, 4. Methods Radiographs of 205 ACL-reconstructed knees were obtained at 5-year follow-up. Specifically, weightbearing posteroanterior radiographs of the operative knee in 35° to 45° of flexion and a lateral radiograph in 30° of flexion were used. The radiographs were independently graded by 2 sports medicine fellowship—trained orthopaedic surgeons using the IKDC 2000 standard instructions. One surgeon graded the same radiographs 6 months apart, blinded to patient and prior IKDC grades. The percentage agreement was calculated for each of the 5 knee compartments as defined by the IKDC. Interrater reliability was evaluated using the intraclass correlation coefficient (ICC) 2-way mixed effect model with absolute agreement. The Spearman rank-order correlation coefficient (rs) was applied to evaluate intrarater reliability. Results The interrater agreement between the 2 surgeons was 59% for the medial joint space (ICC = 0.46; 95% confidence interval [CI] = 0.35-0.56), 54% for the lateral joint space (ICC = 0.45; 95% CI = 0.27-0.58), 49% for the patellofemoral joint (ICC = 0.40; 95% CI = 0.26-0.52), 63% for the anterior joint space (ICC = 0.20; 95% CI = 0.05-0.34), and 44% for the posterior joint space (ICC = 0.28; 95% CI = 0.15-0.40). The intrarater agreement was 83% for the medial joint space (rs = .77, P < .001), 86% for the lateral joint space (rs = .76, P < .001), 81% for the patellofemoral joint (rs = .79, P < .001), 91% for the anterior joint space (rs = .48, P < .001), and 69% for the posterior joint space (rs = .64, P < .001). Conclusions While intrarater reliability was acceptable, interrater reliability was poor. These findings suggest that multiple raters may score the same radiographs differently using the IKDC radiographic grading system. The use of a single rater to grade all radiographs when using the IKDC radiographic grading system maximizes reliability.


Author(s):  
James C. Borders ◽  
Jordanna S. Sevitz ◽  
Jaime Bauer Malandraki ◽  
Georgia A. Malandraki ◽  
Michelle S. Troche

Purpose The COVID-19 pandemic has drastically increased the use of telehealth. Prior studies of telehealth clinical swallowing evaluations provide positive evidence for telemanagement of swallowing. However, the reliability of these measures in clinical practice, as opposed to well-controlled research conditions, remains unknown. This study aimed to investigate the reliability of outcome measures derived from clinical swallowing tele-evaluations in real-world clinical practice (e.g., variability in devices and Internet connectivity, lack of in-person clinician assistance, or remote patient/caregiver training). Method Seven raters asynchronously judged clinical swallowing tele-evaluations of 12 movement disorders patients. Outcomes included the Timed Water Swallow Test (TWST), Test of Masticating and Swallowing Solids (TOMASS), and common observations of oral intake. Statistical analyses were performed to examine inter- and intrarater reliability, as well as qualitative analyses exploring patient and clinician-specific factors impacting reliability. Results Forty-four trials were included for reliability analyses. All rater dyads demonstrated “good” to “excellent” interrater reliability for measures of the TWST (intraclass correlation coefficients [ICCs] ≥ .93) and observations of oral intake (≥ 77% agreement). The majority of TOMASS outcomes demonstrated “good” to “excellent” interrater reliability (ICCs ≥ .84), with the exception of the number of bites (ICCs = .43–.99) and swallows (ICCs = .21–.85). Immediate and delayed intrarater reliability were “excellent” for most raters across all tasks, ranging between ICCs of .63 and 1.00. Exploratory factors potentially impacting reliability included infrequent instances of suboptimal video quality, reduced camera stability, camera distance, and obstruction of the patient's mouth during tasks. Conclusions Subjective observations of oral intake and objective measures taken from the TWST and the TOMASS can be reliably measured via telehealth in clinical practice. Our results provide support for the feasibility and reliability of telehealth for outpatient clinical swallowing evaluations during COVID-19 and beyond. Supplemental Material https://doi.org/10.23641/asha.13661378


2014 ◽  
Vol 49 (5) ◽  
pp. 640-646 ◽  
Author(s):  
Mark A. Kevern ◽  
Michael Beecher ◽  
Smita Rao

Context: Athletes who participate in throwing and racket sports consistently demonstrate adaptive changes in glenohumeral-joint internal and external rotation in the dominant arm. Measurements of these motions have demonstrated excellent intrarater and poor interrater reliability. Objective: To determine intrarater reliability, interrater reliability, and standard error of measurement for shoulder internal rotation, external rotation, and total arc of motion using an inclinometer in 3 testing procedures in National Collegiate Athletic Association Division I baseball and softball athletes. Design: Cross-sectional study. Setting: Athletic department. Patients or Other Participants Thirty-eight players participated in the study. Shoulder internal rotation, external rotation, and total arc of motion were measured by 2 investigators in 3 test positions. The standard supine position was compared with a side-lying test position, as well as a supine test position without examiner overpressure. Results: Excellent intrarater reliability was noted for all 3 test positions and ranges of motion, with intraclass correlation coefficient values ranging from 0.93 to 0.99. Results for interrater reliability were less favorable. Reliability for internal rotation was highest in the side-lying position (0.68) and reliability for external rotation and total arc was highest in the supine-without-overpressure position (0.774 and 0.713, respectively). The supine-with-overpressure position yielded the lowest interrater reliability results in all positions. The side-lying position had the most consistent results, with very little variation among intraclass correlation coefficient values for the various test positions. Conclusions: The results of our study clearly indicate that the side-lying test procedure is of equal or greater value than the traditional supine-with-overpressure method.


2010 ◽  
Vol 11 (2) ◽  
pp. 113-124 ◽  
Author(s):  
Elizabeth Davis ◽  
Jane Galvin ◽  
Cheryl Soo

AbstractIntroduction:The ability to use both hands to interact with objects is required in daily activities and is therefore important to measure in clinical practice. The Assisting Hand Assessment (AHA) is unique in evaluating the function of a child or youth's assisting hand, through observing the spontaneous manipulation of objects during bimanual activity. The AHA was developed for children with unilateral motor impairment, and shows strong psychometric properties when used with children who have cerebral palsy (CP) or obstetric brachial plexus palsy (OBPP). The AHA is currently used in clinical practice with children who have an acquired brain injury (ABI), however there is limited research on the measurement properties of its use with this population.Objectives:The study aimed to determine the interrater and intrarater reliability of the AHA for children and youth with unilateral motor impairment following ABI. Methods: For interrater reliability, two occupational therapists (OT1 and OT2) independently rated the same 26 children and youth. For intrarater reliability, OT2 conducted a second assessment on the 26 participants 1 week later. Association between item scores on the AHA were analysed using weighted kappa (Kw), while intraclass correlation coefficients (ICCs) were used for domain and total scores.Results:The AHA items demonstrated good to excellent intrarater reliability (Kw= 0.67–1.00). Interrater reliability was good to excellent (Kw=0.60–0.84) for 20 of the 22 items of the AHA. Interrater and intrarater reliability coefficients for all domain and total scores were in the excellent range (ICC = 0.85–0.99).Conclusion:The current study indicates that the AHA shows good interrater and intrarater reliability when used with the paediatric ABI population. Findings provide preliminary support for the continued use of the AHA for children and youth with acquired hemiplegia.


Children ◽  
2020 ◽  
Vol 7 (12) ◽  
pp. 306
Author(s):  
Iñaki Pastor-Pons ◽  
María Orosia Lucha-López ◽  
Marta Barrau-Lalmolda ◽  
Iñaki Rodes-Pastor ◽  
Ángel Luis Rodríguez-Fernández ◽  
...  

(1) Background: anthropometric measurements with calipers are used to objectify cranial asymmetry in positional plagiocephaly but there is controversy regarding the reliability of different methodologies. Purpose: to analyze the interrater and intrarater reliability of direct anthropometric measurements with caliper on defined craniofacial references in infants with positional plagiocephaly. (2) Methods: 62 subjects (<28 weeks), with a difference of at least 5 mm between cranial diagonal diameters. Maximal cranial circumference, length and width and diagonal cranial diameters were measured. Intrarater (2 measurements) and interrater (2 raters) reliability was analyzed. (3) Results: intra- and interrater reliability of the maximal cranial length and width and right cranial diagonal was excellent: intraclass correlation coefficient (ICC) > 0.9. Intrarater and interrater reliability for the left cranial diagonal was excellent: ICC > 0.9 and difference in agreement in the Bland-Altman plot 0.0 mm, respectively. Intrarater and interrater reliability for the maximal cranial circumference was good: differences in agreement in Bland-Altman plots: intra: −0.03 cm; inter: −0.12 cm. (4) Conclusions: anthropometric measurements in a sample of infants with moderate positional plagiocephaly have shown excellent intra- and interrater reliability for maximal cranial length, maximal cranial width, and right and left cranial diagonals, and good intra- and interrater reliability in maximal cranial circumference measurement.


2013 ◽  
Vol 93 (4) ◽  
pp. 551-561 ◽  
Author(s):  
Nienke M. de Vries ◽  
J. Bart Staal ◽  
Marcel G.M. Olde Rikkert ◽  
Maria W.G. Nijhuis-van der Sanden

BackgroundPhysical activity is assumed to be important in the prevention and treatment of frailty. It is unclear, however, to what extent frailty can be influenced because instruments designed to assess frailty have not been validated as evaluative outcome instruments in clinical practice.ObjectivesThe aims of this study were: (1) to develop a frailty index (ie, the Evaluative Frailty Index for Physical Activity [EFIP]) based on the method of deficit accumulation and (2) to test the clinimetric properties of the EFIP.DesignThe content of the EFIP was determined using a written Delphi procedure. Intrarater reliability, interrater reliability, and construct validity were determined in an observational study (n=24).MethodIntrarater reliability and interrater reliability were calculated using Cohen kappa and intraclass correlation coefficients (ICCs). Construct validity was determined by correlating the score on the EFIP with those on the Timed “Up & Go” Test (TUG), the Performance-Oriented Mobility Assessment (POMA), and the Cumulative Illness Rating Scale for Geriatrics (CIRS-G).ResultsFifty items were included in the EFIP. Interrater reliability (Cohen kappa=0.72, ICC=.96) and intrarater reliability (Cohen kappa=0.77 and 0.80, ICC=.93 and .98) were good. As expected, a fair to moderate correlation with the TUG, POMA, and CIRS-G was found (.61, −.70, and .66, respectively).LimitationsReliability and validity of the EFIP have been tested in a small sample. These and other clinimetric properties, such as responsiveness, will be assessed or reassessed in a larger study population.ConclusionThe EFIP is a reliable and valid instrument to evaluate the effect of physical activity on frailty in research and in clinical practice.


1989 ◽  
Vol 34 (4) ◽  
pp. 283-290 ◽  
Author(s):  
J.H. Beitchman ◽  
Bastian Kruidenier ◽  
Marjorie Clegg ◽  
Jane Hood ◽  
Angela Corradini

There have been few attempts to standardize assessment methods in Child Psychiatry. This paper describes a semi-structured approach to diagnostic interviewing of the child. Thirty-four children six to 13 years of age, and their parents, were interviewed two weeks apart by two different psychiatrists. A diagnostic coding form consisting of 29 clinical symptom items, eight summary items, and nine positive health ratings was used. Three diagnostic items were also included: “severity of clinical condition,” “probability of disorder,” and “adjustment status.” Twelve of the Time 2 interviews with the child and parent were videotaped and rated by three different psychiatrists. Results indicated that summary items had higher reliability than individual symptom items and the three diagnostic items had the highest reliability, suggesting reliability is better for broad classes of behaviour. Interrater reliability was higher for the face-to-face rating than videotaped ratings. This suggests first that face-to-face interviews are reasonably stable over a two week period and second, since videotaped ratings had lowest reliability on items that depended on inferences about the child's feedlings, an important source of variance in assessment may be the clinician's ability to empathize with the child and draw inferences about internal feeling-states. It was concluded that this interview schedule can be a part of routine clinical practice. It ensures a reasonably standard, yet flexible and reliable approach to diagnostic interviewing.


2013 ◽  
Vol 93 (8) ◽  
pp. 1102-1115 ◽  
Author(s):  
Charlotte S.L. Tsang ◽  
Lin-Rong Liao ◽  
Raymond C.K. Chung ◽  
Marco Y.C. Pang

BackgroundThe Mini-Balance Evaluation Systems Test (Mini-BESTest) is a new balance assessment, but its psychometric properties have not been specifically tested in individuals with stroke.ObjectivesThe purpose of this study was to examine the reliability and validity of the Mini-BESTest and its accuracy in categorizing people with stroke based on fall history.DesignAn observational measurement study with a test-retest design was conducted.MethodsOne hundred six people with chronic stroke were recruited. Intrarater reliability was evaluated by repeating the Mini-BESTest within 10 days by the same rater. The Mini-BESTest was administered by 2 independent raters to establish interrater reliability. Validity was assessed by correlating Mini-BESTest scores with scores of other balance measures (Berg Balance Scale, one-leg-standing, Functional Reach Test, and Timed “Up & Go” Test) in the stroke group and by comparing Mini-BESTest scores between the stroke group and 48 control participants, and between fallers (≥1 falls in the previous 12 months, n=25) and nonfallers (n=81) in the stroke group.ResultsThe Mini-BESTest had excellent internal consistency (Cronbach alpha=.89–.94), intrarater reliability (intraclass correlation coefficient [3,1]=.97), and interrater reliability (intraclass correlation coefficient [2,1]=.96). The minimal detectable change at 95% confidence interval was 3.0 points. The Mini-BESTest was strongly correlated with other balance measures. Significant differences in Mini-BESTest total scores were found between the stroke and control groups and between fallers and nonfallers in the stroke group. In terms of floor and ceiling effects, the Mini-BESTest was significantly less skewed than other balance measures, except for one-leg-standing on the nonparetic side. The Berg Balance Scale showed significantly better ability to identify fallers (positive likelihood ratio=2.6) than the Mini-BESTest (positive likelihood ratio=1.8).LimitationsThe results are generalizable only to people with mild to moderate chronic stroke.ConclusionsThe Mini-BESTest is a reliable and valid tool for evaluating balance in people with chronic stroke.


Sign in / Sign up

Export Citation Format

Share Document