scholarly journals Reliability assessment of the Biffl Scale for blunt traumatic cerebrovascular injury as detected on computer tomography angiography

2017 ◽  
Vol 127 (1) ◽  
pp. 32-35 ◽  
Author(s):  
Paul M. Foreman ◽  
Christoph J. Griessenauer ◽  
Kimberly P. Kicielinski ◽  
Philip G. R. Schmalz ◽  
Brandon G. Rocque ◽  
...  

OBJECTIVEBlunt traumatic cerebrovascular injury (TCVI) represents structural injury to a vessel due to high-energy trauma. The Biffl Scale is a widely accepted grading scheme for these injuries that was developed using digital subtraction angiography. In recent years, screening CT angiography (CTA) has been used to identify patients with TCVI. The reliability of this scale, with injuries assessed using CTA, has not yet been determined.METHODSSeven independent raters, including 2 neurosurgeons, 2 neuroradiologists, 2 neurosurgical residents, and 1 neurosurgical vascular fellow, independently reviewed each presenting CTA of the neck performed in 40 patients with confirmed TCVI and assigned a Biffl grade. Ten images were repeated to assess intrarater reliability, for a total of 50 CTAs. Fleiss' multirater kappa (κ) and interclass correlation were calculated as a measure of interrater reliability. Weighted Cohen's κ was used to assess intrarater reliability.RESULTSFleiss' multirater κ was 0.65 (95% CI 0.61–0.69), indicating substantial agreement as to the Biffl grade assignment among the 7 raters. Interclass correlation was 0.82, demonstrating excellent agreement among the raters. Intrarater reliability was perfect (weighted Cohen's κ = 1) in 2 raters, and near perfect (weighted Cohen's κ > 0.8) in the remaining 5 raters.CONCLUSIONSGrading of TCVI with CTA using the Biffl Scale is reliable.

2011 ◽  
Vol 101 (5) ◽  
pp. 407-414 ◽  
Author(s):  
Paul Jeong Kim ◽  
Ruth Peace ◽  
Jamie Mieras ◽  
Tanya Thoms ◽  
Denise Freeman ◽  
...  

Background: Goniometric measurement is currently being used as a diagnostic and outcomes assessment tool for ankle joint dorsiflexion. Despite its common use, its interrater and intrarater reliability has been questioned. Methods: This is a prospective study examining whether the experience of the examiner or the technique used affects the interrater and intrarater reliability for measuring ankle joint dorsiflexion. Fourteen asymptomatic individuals (8 male and 6 female) with a mean age of 28.2 years (range, 23–52) were enrolled into this study. The years of clinical experience of the five examiners averaged 10.4 years (range, 0–26). Four examiners used a modified Root, Weed and Orien method of measuring ankle joint dorsiflexion. The fifth examiner utilized a nonstandardized technique. A standard goniometer was used for bilateral measurements of ankle joint dorsiflexion with the knee extended and flexed. All five examiners repeated each measurement three times during each of the three sessions, with each session spaced at least 1 week apart. Results: The interclass correlation coefficient reveals a moderate intrarater and poor interrater reliability in ankle joint dorsiflexion measurements using a standard goniometer. More importantly, further analysis indicates that the use of a standardized technique for measurement of ankle joint dorsiflexion or years of clinical experience does not increase the intrarater or interrater reliability. Conclusions: The utility of the goniometric measurement of ankle joint dorsiflexion may be limited. (J Am Podiatr Med Assoc 101(5): 407–414, 2011)


2014 ◽  
Vol 120 (5) ◽  
pp. 1179-1187 ◽  
Author(s):  
Christoph J. Griessenauer ◽  
Joseph H. Miller ◽  
Bonita S. Agee ◽  
Winfield S. Fisher ◽  
Joel K. Curé ◽  
...  

Object The aim of this study was to examine observer reliability of frequently used arteriovenous malformation (AVM) grading scales, including the 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale, using current imaging modalities in a setting closely resembling routine clinical practice. Methods Five experienced raters, including 1 vascular neurosurgeon, 2 neuroradiologists, and 2 senior neurosurgical residents independently reviewed 15 MRI studies, 15 CT angiograms, and 15 digital subtraction angiograms obtained at the time of initial diagnosis. Assessments of 5 scans of each imaging modality were repeated for measurement of intrarater reliability. Three months after the initial assessment, raters reassessed those scans where there was disagreement. In this second assessment, raters were asked to justify their rating with comments and illustrations. Generalized kappa (κ) analysis for multiple raters, Kendall's coefficient of concordance (W), and interclass correlation coefficient (ICC) were applied to determine interrater reliability. For intrarater reliability analysis, Cohen's kappa (κ), Kendall's correlation coefficient (tau-b), and ICC were used to assess repeat measurement agreement for each rater. Results Interrater reliability for the overall 5-tier Spetzler-Martin scale was fair to good (ICC = 0.69) to extremely strong (Kendall's W = 0.73) on initial assessment and improved on reassessment. Assessment of CT angiograms resulted in the highest agreement, followed by MRI and digital subtraction angiography. Agreement for the overall 3-tier Spetzler-Ponce grade was fair to good (ICC = 0.68) to strong (Kendall's W = 0.70) on initial assessment, improved on reassessment, and was comparable to agreement for the 5-tier Spetzler-Martin scale. Agreement for the overall Pollock-Flickinger radiosurgery-based grade was excellent (ICC = 0.89) to extremely strong (Kendall's W = 0.81). Intrarater reliability for the overall 5-tier Spetzler-Martin grade was excellent (ICC > 0.75) in 3 of the 5 raters and fair to good (ICC > 0.40) in the other 2 raters. Conclusion The 5-tier Spetzler-Martin scale, the 3-tier Spetzler-Ponce scale, and the Pollock-Flickinger radiosurgery-based scale all showed a high level of agreement. The improved reliability on reassessment was explained by a training effect from the initial assessment and the requirement to defend the rating, which outlines a potential downside for grades determined as part of routine clinical practice to be used for scientific purposes.


Author(s):  
Wen Liu ◽  
Melissa Batchelor ◽  
Kristine Williams

Abstract Background and Objectives Mealtime engagement is defined as verbal and nonverbal assistance provided by caregivers to guide and motivate care recipients in eating. Quality mealtime engagement is critical to improve mealtime difficulties and intake among older adults with dementia requiring eating assistance. Few tools are feasible and valid to measure mealtime engagement. This study developed and tested the Mealtime Engagement Scale (MES). Research Design and Methods Items were developed based on literature review and expert review and finalized based on content validity and corrected item-total correlation. A secondary analysis of 87 videotaped observations capturing 18 nursing home staff providing mealtime care to residents with dementia was conducted. Internal consistency, interrater reliability, and intrarater reliability were assessed. Concurrent and convergent validity were examined through correlation (rs) with the Relational Behavior Scale (RBS) and the Mealtime Relational Care Checklist (M-RCC), respectively. Results The 18-item MES was developed with adequate content validity (Scale-content validity index [CVI] = 1.00; Scale-CVI/Average = 0.962–0.987). Each item is scored from 0 (never) to 3 (always). The total scale score ranges from 0 to 54. Higher scores indicate greater mealtime engagement. The MES had very good internal consistency (Cronbach’s α = 0.837), outstanding interrater reliability (interclass correlation = 0.920), outstanding intrarater reliability (interclass correlation = 0.956), adequate concurrent validity based on strong correlation with the RBS (rs = 0.821, p < .001), and fair convergent validity based on weak correlation with the M-RCC (rs = 0.219, p = .042). Discussion and Implications Findings provide preliminary psychometric evidence of MES to measure mealtime engagement. Future testing is needed among more and diverse samples in different care settings to accumulate psychometric evidence.


2017 ◽  
Vol 5 (1) ◽  
pp. 59-68 ◽  
Author(s):  
Pauli Olavi Rintala ◽  
Arja Kaarina Sääkslahti ◽  
Susanna Iivonen

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.


2013 ◽  
Vol 19 (3) ◽  
pp. 269-278 ◽  
Author(s):  
Christopher P. Ames ◽  
Justin S. Smith ◽  
Justin K. Scheer ◽  
Christopher I. Shaffrey ◽  
Virginie Lafage ◽  
...  

Object Cervical spine osteotomies are powerful techniques to correct rigid cervical spine deformity. Many variations exist, however, and there is no current standardized system with which to describe and classify cervical osteotomies. This complicates the ability to compare outcomes across procedures and studies. The authors' objective was to establish a universal nomenclature for cervical spine osteotomies to provide a common language among spine surgeons. Methods A proposed nomenclature with 7 anatomical grades of increasing extent of bone/soft tissue resection and destabilization was designed. The highest grade of resection is termed the major osteotomy, and an approach modifier is used to denote the surgical approach(es), including anterior (A), posterior (P), anterior-posterior (AP), posterior-anterior (PA), anterior-posterior-anterior (APA), and posterior-anterior-posterior (PAP). For cases in which multiple grades of osteotomies were performed, the highest grade is termed the major osteotomy, and lower-grade osteotomies are termed minor osteotomies. The nomenclature was evaluated by 11 reviewers through 25 different radiographic clinical cases. The review was performed twice, separated by a minimum 1-week interval. Reliability was assessed using Fleiss kappa coefficients. Results The average intrarater reliability was classified as “almost perfect agreement” for the major osteotomy (0.89 [range 0.60–1.00]) and approach modifier (0.99 [0.95–1.00]); it was classified as “moderate agreement” for the minor osteotomy (0.73 [range 0.41–1.00]). The average interrater reliability for the 2 readings was the following: major osteotomy, 0.87 (“almost perfect agreement”); approach modifier, 0.99 (“almost perfect agreement”); and minor osteotomy, 0.55 (“moderate agreement”). Analysis of only major osteotomy plus approach modifier yielded a classification that was “almost perfect” with an average intrarater reliability of 0.90 (0.63–1.00) and an interrater reliability of 0.88 and 0.86 for the two reviews. Conclusions The proposed cervical spine osteotomy nomenclature provides the surgeon with a simple, standard description of the various cervical osteotomies. The reliability analysis demonstrated that this system is consistent and directly applicable. Future work will evaluate the relationship between this system and health-related quality of life metrics.


2018 ◽  
Vol 160 (3) ◽  
pp. 533-539 ◽  
Author(s):  
Steven Coppess ◽  
Reema Padia ◽  
David Horn ◽  
Sanjay R. Parikh ◽  
Andrew Inglis ◽  
...  

Objective While the Benjamin-Inglis classification system is widely used to categorize laryngeal clefts, it does not clearly differentiate a type 1 cleft from normal anatomy, and there is no widely accepted or validated protocol for systematically evaluating interarytenoid mucosal height. We sought to propose the interarytenoid assessment protocol as a method to standardize the description of the interarytenoid anatomy and to test its reliability. Study Design Retrospective review of endoscopic videos. Setting Pediatric academic center. Subjects and Methods The interarytenoid assessment protocol comprises 4 steps for evaluation of the interarytenoid region relative to known anatomic landmarks in the supraglottis, glottis, and subglottis. Thirty consecutively selected videos of the protocol were reviewed by 4 otolaryngologists. The raters were blinded to identifying information, and the video order was randomized for each review. We assessed protocol completion times and calculated Cohen’s linear-weighted κ coefficient between blinded expert raters and with the operating surgeon to evaluate interrater/intrarater reliability. Results Median age was 4.9 years (59 months; range, 1 month to 20 years). Median completion time was 144 seconds. Interrater and intrarater reliability showed substantial agreement (interrater κ = 0.71 [95% confidence interval (CI), 0.55-0.87]; intrarater mean κ = 0.70 [95% CI, 0.59-0.92/rater 1, 0.47-0.85/rater 2]; P < .001). Comparing raters to the operating surgeon demonstrated substantial agreement (mean κ = 0.62; 95% CI, 0.31-0.79/rater 1, 0.48-0.89/rater 2; P < .001). Conclusion The interarytenoid assessment protocol appears reliable in describing interarytenoid anatomy. Rapid completion times and substantial interrater/intrarater reliability were demonstrated. Incorporation of this protocol may provide important steps toward improved standardization in the anatomic description of the interarytenoid region in pediatric dysphagia.


2013 ◽  
Vol 11 (5) ◽  
pp. 547-551 ◽  
Author(s):  
Fabio A. Frisoli ◽  
Shih-Shan Lang ◽  
Arastoo Vossough ◽  
Anne Marie Cahill ◽  
Gregory G. Heuer ◽  
...  

Object Cerebral arteriovenous malformations (AVMs) have a higher postresection recurrence rate in children than in adults. The authors' previous study demonstrated that a diffuse AVM (low compactness score) predicts postresection recurrence. The aims of this study were to evaluate the intra- and interrater reliability of the AVM compactness score. Methods Angiograms of 24 patients assigned a preoperative compactness score (scale of 1–3; 1 = most diffuse, 3 = most compact) in the authors' previous study were rerated by the same pediatric neuroradiologist 9 months later. A pediatric neurosurgeon, pediatric neuroradiology fellow, and interventional radiologist blinded to each other's ratings, the original ratings, and AVM recurrence also rated each AVM's compactness. Intrarater and interrater reliability were calculated using the κ statistic. Results Of the 24 AVMs, scores by the original neuroradiologist were 1 in 6 patients, 2 in 16 patients, and 3 in 2 patients. Intrarater reliability was 1.0. The κ statistic among the 4 raters was 0.69 (95% CI 0.44–0.89), which indicates substantial reliability. The interrater reliability between the neuroradiologist and neuroradiology fellow was moderate (κ = 0.59 [95%CI 0.20–0.89]) and was substantial between the neuroradiologist and neurosurgeon (κ = 0.74 [95% CI 0.41–1.0]). The neuroradiologist and interventional radiologist had perfect agreement (κ = 1.0). Conclusions Intrarater and interrater reliability of the AVM compactness score were excellent and substantial, respectively. These results demonstrate that the AVM compactness score is reproducible. However, the neuroradiologist and interventional radiologist had perfect agreement, which indicates that the compactness score is applied most accurately by those with extensive angiography experience.


Author(s):  
Emily Q Zhang ◽  
Vivian SY Leung ◽  
Daniel SJ Pang

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.


Dermatology ◽  
2019 ◽  
Vol 236 (1) ◽  
pp. 8-14 ◽  
Author(s):  
Katarzyna Włodarek ◽  
Aleksandra Stefaniak ◽  
Łukasz Matusiak ◽  
Jacek C. Szepietowski

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.


2020 ◽  
Vol 100 (3) ◽  
pp. 468-476 ◽  
Author(s):  
Bolette S Rafn ◽  
Chiara A Singh ◽  
Julie Midtgaard ◽  
Pat G Camp ◽  
Margaret L McNeely ◽  
...  

Abstract Background Early identification of breast cancer–related upper body issues is important to enable timely physical therapist treatment. Objective This study evaluated the feasibility and reliability of women performing self-managed prospective surveillance for upper body issues in the early postoperative phase as part of a hospital-based physical therapy program. Design This was a prospective, single-site, single-group feasibility and reliability study. Methods Presurgery arm circumference measurements were completed at home and at the hospital by participants and by a physical therapist. Instruction in self-measurement was provided using a video guide. After surgery, all circumference measurements were repeated along with self-assessment and therapist assessment for shoulder flexion and abduction active range of motion. Feasibility was determined by recruitment/retention rates and participant-reported ease of performing self-measurements (1 [very difficult] to 10 [very easy]). Reliability was determined as intrarater reliability, interrater reliability, and agreement. Results Thirty-three women who were 53.4 (SD = 11.4) years old participated, with recruitment and retention rates of 79% and 94%, respectively. Participant-reported ease of measurement was 8.2 (SD = 2.2) before surgery and 8.0 (SD = 1.9) after surgery. The intrarater reliability and interrater reliability were excellent before surgery (intraclass correlation coefficient [ICC] ≥ 0.94; 95% confidence interval = 0.87–0.97) and after surgery (ICC ≥ 0.91; 95% confidence interval = 0.76–0.96). Agreement between self-assessed and therapist-assessed active shoulder flexion (κ = 0.79) and abduction (κ = 0.71) was good. Limitations Further testing is needed using a prospective design with a longer follow-up to determine whether self-managed prospective surveillance and timely treatment can hinder the development of chronic breast cancer–related upper body issues Conclusions Self-measured arm circumference and shoulder range of motion are reliable, and their inclusion in a hospital-based program of prospective surveillance for upper body issues seems feasible. This approach may improve early detection and treatment


Sign in / Sign up

Export Citation Format

Share Document