Interrater and Intrarater Reliability of the Beighton Score: A Systematic Review

Background: The Beighton score is commonly used to assess the degree of hypermobility in patients with hypermobility spectrum disorder. Since proper diagnosis and treatment in this challenging patient population require valid, reliable, and responsive clinical assessments such as the Beighton score, studies must properly evaluate efficacy and effectiveness. Purpose: To succinctly present a systematic review to determine the inter- and intrarater reliability of the Beighton score and the methodological quality of all analyzed studies for use in clinical applications. Study Design: Systematic review; Level of evidence, 3. Methods: A systematic review of the MEDLINE, Embase, CINAHL, and SPORTDiscus databases was performed. Studies that measured inter- or intrarater reliability of the Beighton score in humans with and without hypermobility were included. Non-English, animal, cadaveric, level 5 evidence, and studies utilizing the Beighton score self-assessment version were excluded. Data were extracted to compare scoring methods, population characteristics, and measurements of inter- and intrarater reliability. Risk of bias was assessed with the COSMIN (Consensus-Based Standards for the Selection of Health Measurement Instruments) 2017 checklist. Results: Twenty-four studies were analyzed (1333 patients; mean ± SD age, 28.19 ± 17.34 years [range, 4-71 years]; 640 females, 594 males, 273 unknown sex). Of the 24 studies, 18 reported raters were health care professionals or health care professional students. For interrater reliability, 5 of 8 (62.5%) intraclass correlation coefficients and 12 of 19 (63.2%) kappa values were substantial to almost perfect. Intrarater reliability was reported as excellent in all studies utilizing intraclass correlation coefficients, and 3 of the 7 articles using kappa values reported almost perfect values. Utilizing the COSMIN criteria, we determined that 1 study met “very good” criteria, 7 met “adequate,” 15 met “doubtful,” and 1 met “inadequate” for overall risk of bias in the reliability domain. Conclusion: The Beighton score is a highly reliable clinical tool that shows substantial to excellent inter- and intrarater reliability when used by raters of variable backgrounds and experience levels. While individual components of risk of bias among studies demonstrated large discrepancy, most of the items were adequate to very good.

Download Full-text

Dynamic Footprint Measurement Collection Technique and Intrarater Reliability

Journal of the American Podiatric Medical Association ◽

10.7547/1020130 ◽

2012 ◽

Vol 102 (2) ◽

pp. 130-138 ◽

Cited By ~ 17

Author(s):

Jeanna M. Fascione ◽

Ryan T. Crews ◽

James S. Wrobel

Keyword(s):

Repeated Measures ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Healthy Population ◽

Intrarater Reliability ◽

Intraclass Correlation Coefficients ◽

Foot Posture ◽

Arch Index ◽

Post Hoc ◽

Dynamic Footprint

Background: Identifying the variability of footprint measurement collection techniques and the reliability of footprint measurements would assist with appropriate clinical foot posture appraisal. We sought to identify relationships between these measures in a healthy population. Methods: On 30 healthy participants, midgait dynamic footprint measurements were collected using an ink mat, paper pedography, and electronic pedography. The footprints were then digitized, and the following footprint indices were calculated with photo digital planimetry software: footprint index, arch index, truncated arch index, Chippaux-Smirak Index, and Staheli Index. Differences between techniques were identified with repeated-measures analysis of variance with post hoc test of Scheffe. In addition, to assess practical similarities between the different methods, intraclass correlation coefficients (ICCs) were calculated. To assess intrarater reliability, footprint indices were calculated twice on 10 randomly selected ink mat footprint measurements, and the ICC was calculated. Results: Dynamic footprint measurements collected with an ink mat significantly differed from those collected with paper pedography (ICC, 0.85–0.96) and electronic pedography (ICC, 0.29–0.79), regardless of the practical similarities noted with ICC values (P = .00). Intrarater reliability for dynamic ink mat footprint measurements was high for the footprint index, arch index, truncated arch index, Chippaux-Smirak Index, and Staheli Index (ICC, 0.74–0.99). Conclusions: Footprint measurements collected with various techniques demonstrate differences. Interchangeable use of exact values without adjustment is not advised. Intrarater reliability of a single method (ink mat) was found to be high. (J Am Podiatr Med Assoc 102(2): 130–138, 2012)

Download Full-text

Assessment of the Intrarater and Interrater Reliability of an Established Clinical Task Analysis Methodology

Anesthesiology ◽

10.1097/00000542-200205000-00016 ◽

2002 ◽

Vol 96 (5) ◽

pp. 1129-1139 ◽

Cited By ~ 46

Author(s):

Jason Slagle ◽

Matthew B. Weinger ◽

My-Than T. Dinh ◽

Vanessa V. Brumer ◽

Kevin Williams

Keyword(s):

Real Time ◽

Task Analysis ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Intrarater Reliability ◽

Intraclass Correlation Coefficients ◽

Percent Time ◽

Analysis Methodology ◽

And Task

Background Task analysis may be useful for assessing how anesthesiologists alter their behavior in response to different clinical situations. In this study, the authors examined the intraobserver and interobserver reliability of an established task analysis methodology. Methods During 20 routine anesthetic procedures, a trained observer sat in the operating room and categorized in real-time the anesthetist's activities into 38 task categories. Two weeks later, the same observer performed task analysis from videotapes obtained intraoperatively. A different observer performed task analysis from the videotapes on two separate occasions. Data were analyzed for percent of time spent on each task category, average task duration, and number of task occurrences. Rater reliability and agreement were assessed using intraclass correlation coefficients. Results Intrarater reliability was generally good for categorization of percent time on task and task occurrence (mean intraclass correlation coefficients of 0.84-0.97). There was a comparably high concordance between real-time and video analyses. Interrater reliability was generally good for percent time and task occurrence measurements. However, the interrater reliability of the task duration metric was unsatisfactory, primarily because of the technique used to capture multitasking. Conclusions A task analysis technique used in anesthesia research for several decades showed good intrarater reliability. Off-line analysis of videotapes is a viable alternative to real-time data collection. Acceptable interrater reliability requires the use of strict task definitions, sophisticated software, and rigorous observer training. New techniques must be developed to more accurately capture multitasking. Substantial effort is required to conduct task analyses that will have sufficient reliability for purposes of research or clinical evaluation.

Download Full-text

How Important Is Randomization of Swallows During Kinematic Analyses of Swallow Function?

American Journal of Speech-Language Pathology ◽

10.1044/2020_ajslp-20-00020 ◽

2020 ◽

Vol 29 (3) ◽

pp. 1650-1654

Author(s):

Cara Donohue ◽

James L. Coyle

Keyword(s):

Intraclass Correlation ◽

Correlation Coefficients ◽

Random Order ◽

Preliminary Evidence ◽

Intrarater Reliability ◽

Kinematic Data ◽

Intraclass Correlation Coefficients ◽

Exact Agreement ◽

Kinematic Analyses ◽

Swallow Function

Purpose In dysphagia research involving kinematic analyses on individual swallow parameters, randomization is used to ensure judges are not influenced by judgments made for other parameters within the same swallow or by judgments made for other swallows from the same participant. Yet, the necessity of randomizing swallows to avoid bias during kinematic analyses is largely assumed and untested. This study investigated whether randomization of the order of swallows presented to judges impacts analyses of temporal kinematic events from videofluoroscopic swallow studies. Method One hundred twenty-seven swallows were analyzed from 18 healthy adults who underwent standardized videofluoroscopic swallow studies. Swallows were first analyzed by two trained raters sequentially, analyzing all kinematic events within each swallow, and then a second time in random order, measuring one kinematic event at a time. Intrarater reliability measurements were calculated between random and sequential swallow judgments for all kinematic events using intraclass correlation coefficient and percent exact agreement within a three-frame tolerance. Results Intraclass correlation coefficients (1.00) and percent exact agreement (89%) were excellent for all kinematic events between analyses methods, indicating there were no significant differences in measurements performed in random or sequential order. Conclusions This study provides preliminary evidence that randomization may be unnecessary during temporal swallow kinematic data analyses for research, which may lead to more efficient analyses and dissemination of findings, and alignment of findings with clinical interpretations. Replication of this design with swallows from people with dysphagia would strengthen the generalizability of the results.

Download Full-text

Reliability and Accuracy of Biomechanical Measurements of the Lower Extremities

Journal of the American Podiatric Medical Association ◽

10.7547/87507315-92-6-317 ◽

2002 ◽

Vol 92 (6) ◽

pp. 317-326 ◽

Cited By ~ 60

Author(s):

Bart Van Gheluwe ◽

Kevin A. Kirby ◽

Philip Roosen ◽

Robert D. Phillips

Keyword(s):

Intraclass Correlation ◽

Correlation Coefficients ◽

Lower Extremities ◽

Intrarater Reliability ◽

Intraclass Correlation Coefficients ◽

Measurement Protocol ◽

Left And Right ◽

Asymptomatic Subjects ◽

Forefoot Varus ◽

Biomechanical Measurements

The reliability of biomechanical measurements of the lower extremities, as they are commonly used in podiatric practice, was quantified by means of intraclass correlation coefficients (ICCs). This was done not only to evaluate interrater and intrarater reliability but also to provide an estimate for the accuracy of the measurements. The measurement protocol involved 30 asymptomatic subjects and five raters of varying experience. Each subject was measured twice by the same rater, with the retest immediately following the test. The study demonstrated that the interrater ICCs were quite low (≤0.51), except for the measurements of relaxed calcaneal stance position and forefoot varus (both 0.61 and 0.62 for left and right, respectively). However, the intrarater ICCs were relatively high (>0.8) for most raters and measurement variables. Measurement accuracy was moderate between raters. (J Am Podiatr Med Assoc 92(6): 317-326, 2002)

Download Full-text

Objective and Subjective Clinical Swallowing Outcomes via Telehealth: Reliability in Outpatient Clinical Practice

American Journal of Speech-Language Pathology ◽

10.1044/2020_ajslp-20-00234 ◽

2021 ◽

pp. 1-11

Author(s):

James C. Borders ◽

Jordanna S. Sevitz ◽

Jaime Bauer Malandraki ◽

Georgia A. Malandraki ◽

Michelle S. Troche

Keyword(s):

Clinical Practice ◽

Interrater Reliability ◽

Video Quality ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Oral Intake ◽

Caregiver Training ◽

Intrarater Reliability ◽

Intraclass Correlation Coefficients ◽

Remote Patient

Purpose The COVID-19 pandemic has drastically increased the use of telehealth. Prior studies of telehealth clinical swallowing evaluations provide positive evidence for telemanagement of swallowing. However, the reliability of these measures in clinical practice, as opposed to well-controlled research conditions, remains unknown. This study aimed to investigate the reliability of outcome measures derived from clinical swallowing tele-evaluations in real-world clinical practice (e.g., variability in devices and Internet connectivity, lack of in-person clinician assistance, or remote patient/caregiver training). Method Seven raters asynchronously judged clinical swallowing tele-evaluations of 12 movement disorders patients. Outcomes included the Timed Water Swallow Test (TWST), Test of Masticating and Swallowing Solids (TOMASS), and common observations of oral intake. Statistical analyses were performed to examine inter- and intrarater reliability, as well as qualitative analyses exploring patient and clinician-specific factors impacting reliability. Results Forty-four trials were included for reliability analyses. All rater dyads demonstrated “good” to “excellent” interrater reliability for measures of the TWST (intraclass correlation coefficients [ICCs] ≥ .93) and observations of oral intake (≥ 77% agreement). The majority of TOMASS outcomes demonstrated “good” to “excellent” interrater reliability (ICCs ≥ .84), with the exception of the number of bites (ICCs = .43–.99) and swallows (ICCs = .21–.85). Immediate and delayed intrarater reliability were “excellent” for most raters across all tasks, ranging between ICCs of .63 and 1.00. Exploratory factors potentially impacting reliability included infrequent instances of suboptimal video quality, reduced camera stability, camera distance, and obstruction of the patient's mouth during tasks. Conclusions Subjective observations of oral intake and objective measures taken from the TWST and the TOMASS can be reliably measured via telehealth in clinical practice. Our results provide support for the feasibility and reliability of telehealth for outpatient clinical swallowing evaluations during COVID-19 and beyond. Supplemental Material https://doi.org/10.23641/asha.13661378

Download Full-text

Feasibility, Reliability, and Agreement of the WeeFIM Instrument in Dutch Children With Burns

Physical Therapy ◽

10.2522/ptj.20110419 ◽

2012 ◽

Vol 92 (7) ◽

pp. 958-966 ◽

Cited By ~ 3

Author(s):

Anuschka S. Niemeijer ◽

Heleen A. Reinders-Messelink ◽

Laurien M. Disseldorp ◽

Marianne K. Nieuwenhuis

Keyword(s):

Intraclass Correlation ◽

Correlation Coefficients ◽

Functional Independence ◽

Intrarater Reliability ◽

Burn Centers ◽

Intraclass Correlation Coefficients ◽

Functional Consequences ◽

Standard Error Of Measurement ◽

Dutch Children ◽

Error Of Measurement

Background Burns occur frequently in young children. To date, insufficient data are available to fully describe the functional consequences of burns. In different patient populations and countries, the WeeFIM instrument (“WeeFIM”) often is used to measure functional independence in children. Objective The purpose of this study was to examine the psychometric properties of the WeeFIM instrument for use in Dutch burn centers. Design This was an observational study. Methods The WeeFIM instrument was translated into Dutch. All clinicians who rated the children with the instrument passed the WeeFIM credentialing examination. They scored consecutive children (n=134) aged 6 months to 16 years admitted to Dutch burn centers with acute burns during a 1-year period at 2 to 3 weeks, 3 months, and 6 months postburn. To examine reliability, 2 raters scored a child at the same time (n=52, 9 raters) or the same rater scored a child twice within 1 week (n=7, 3 raters). Results After a few weeks, the WeeFIM assessment could be administered in less than 15 minutes. Clinicians thought it was difficult to rate a child aged between 2 and 4 years as well as the cognitive items. Nevertheless, reliability was good (all intraclass correlation coefficients [1,1] were above .80). The standard error of measurement was 3.7. Limitations Intrarater reliability was based on only 7 test-retest measurements. Within our clinical setting, it turned out to be difficult to schedule the same rater and patient twice in one week for repeated assessments. Assessments for interrater reliability, on the other hand, worked out well. Conclusions The WeeFIM instrument is a feasible and reliable instrument for use in children with burns. For evaluation of a child's individual progress, at least 11 points' improvement should be observed to state that a child has significantly improved.

Download Full-text

Interrater and Intrarater Reliability of the Tuck Jump Assessment by Health Professionals of Varied Educational Backgrounds

Journal of Sports Medicine ◽

10.1155/2013/483503 ◽

2013 ◽

Vol 2013 ◽

pp. 1-5 ◽

Cited By ~ 9

Author(s):

Lisa A. Dudley ◽

Craig A. Smith ◽

Brandon K. Olson ◽

Nicole J. Chimera ◽

Brian Schmitz ◽

...

Keyword(s):

Health Professionals ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Clinical Implementation ◽

Intrarater Reliability ◽

Study Objective ◽

Intraclass Correlation Coefficients ◽

Educational Backgrounds ◽

And Training

Objective. The Tuck Jump Assessment (TJA), a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds.Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp.) and intrarater (3 raters) reliability.Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI) 0.33–0.62). Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68) for session one to 0.69 (95% CI 0.55–0.81) for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68) to 0.72 (95% CI 0.55–0.84).Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation.

Download Full-text

Reliability of Measuring Insertion Depth in Cochlear Implanted Infants and Children Using Cochlear View Radiography

Otolaryngology ◽

10.1177/0194599820921857 ◽

2020 ◽

Vol 163 (4) ◽

pp. 822-828

Author(s):

Anisha R. Noble ◽

Erin Christianson ◽

Susan J. Norton ◽

Henry C. Ou ◽

Grace S. Phillips ◽

...

Keyword(s):

Cochlear Implant ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Infants And Children ◽

X Rays ◽

Intrarater Reliability ◽

Tertiary Referral ◽

Insertion Depth ◽

Intraclass Correlation Coefficients ◽

Plain Films

Objectives Cochlear implant depth of insertion affects audiologic outcomes and can be measured in adults using plain films obtained in the “cochlear view.” The objective of this study was to assess interrater and intrarater reliability of measuring depth of insertion using cochlear view radiography. Study Design Prospective, observational. Setting Tertiary referral pediatric hospital. Subjects and Methods Patients aged 11 months to 20 years (median, 4 years; interquartile range [IQR], 1-8 years) undergoing cochlear implantation at our institution were studied over 1 year. Children underwent cochlear view imaging on postoperative day 1. Films were deidentified and 1 image per ear was selected. Two cochlear implant surgeons and 2 radiologists evaluated each image and determined angular depth of insertion. Images were re-reviewed 6 weeks later by all raters. Inter- and intrarater reliability were calculated with intraclass correlation coefficients (ICCs). Results Fifty-seven ears were imaged from 42 children. Forty-nine ears (86%) had successful cochlear view x-rays. Median angular depth of insertion was 381° (minimum, 272°; maximum, 450°; IQR, 360°-395°) during the first round of measurement. Measurements of the same images reviewed 6 weeks later showed median depth of insertion of 382° (minimum, 272°; maximum, 449°; IQR, 360°-397°). Interrater and intrarater reliability ICCs ranged between 0.81 and 0.96, indicating excellent reliability. Conclusions Postoperative cochlear view radiography is a reliable tool for measurement of cochlear implant depth of insertion in infants and children. Further studies are needed to determine reliability of intraoperatively obtained cochlear view radiographs in this population.

Download Full-text

Development and Reliability of an Athlete Introductory Movement Screen for Use in Emerging Junior Athletes

Pediatric Exercise Science ◽

10.1123/pes.2018-0244 ◽

2019 ◽

Vol 31 (4) ◽

pp. 448-457 ◽

Cited By ~ 2

Author(s):

Simon A. Rogers ◽

Peter Hassmén ◽

Alexandra H. Roberts ◽

Alison Alcock ◽

Wendy L. Gilleard ◽

...

Keyword(s):

Intraclass Correlation ◽

Correlation Coefficients ◽

Weighted Kappa ◽

Intrarater Reliability ◽

Training Interventions ◽

Intraclass Correlation Coefficients ◽

Junior Athletes ◽

Sum Score ◽

Movement Screening ◽

Sum Scores

Purpose: A novel 4-task Athlete Introductory Movement Screen was developed and tested to provide an appropriate and reliable movement screening tool for youth sport practitioners. Methods: The overhead squat, lunge, push-up, and a prone brace with shoulder touches were selected based on previous assessments. A total of 28 mixed-sport junior athletes (18 boys and 10 girls; mean age = 15.7 [1.8] y) completed screening after viewing standardized demonstration videos. Athletes were filmed performing 8 repetitions of each task and assessed retrospectively by 2 independent raters using a 3-point scale. The primary rater reassessed the footage 3 weeks later. A subgroup (n = 11) repeated the screening 7 days later, and a further 8 athletes were reassessed 6 months later. Intraclass correlation coefficients (ICC), typical error (TE), coefficient of variation (CV%), and weighted kappa (k) were used in reliability analysis. Results: For the Athlete Introductory Movement Screen 4-task sum score, intrarater reliability was high (ICC = .97; CV = 2.8%), whereas interrater reliability was good (intraclass correlation coefficient = .88; CV = 5.6%). There was a range of agreement from fair to almost perfect (k = .31–.89) between raters across individual movements. A 7-day and 6-month test–retest held good reliability and acceptable CVs (≤ 10%) for sum scores. Conclusion: The 4-task Athlete Introductory Movement Screen appears to be a reliable tool for profiling emerging athletes. Reliability was strongest within the same rater; it was lower, yet acceptable, between 2 raters. Scores can provide an overview of appropriate movement competencies, helping practitioners assess training interventions in the athlete development pathway.

Download Full-text