Visual assessment of movement quality: a study on intra- and interrater reliability of a multi-segmental single leg squat test

Abstract Background The Single Leg Squat test (SLS) is a common tool used in clinical examination to set and evaluate rehabilitation goals, but there is not one established SLS test used in the clinic. Based on previous scientific findings on the reliability of the SLS test and with a methodological rigorous setup, the aim of the present study was to investigate the intra- and interrater reliability of a standardised multi-segmental SLS test. Methods We performed a study of measurement properties to investigate the intra- and interrater reliability of a standardised multi-segmental SLS test including the assessment of the foot, knee, pelvis, and trunk. Novice and experienced physiotherapists rated 65 video recorded SLS tests from 34 test persons. We followed the Quality Appraisal for Reliability Studies checklist. Results Regardless of the raters experience, the interrater reliability varied between “moderate” for the knee variable (ĸ = 0.41, 95% CI 0.10–0.72) and “almost perfect” for the foot (ĸ = 1.00, 95% CI 1.00–1.00). The intrarater reliability varied between “slight” (pelvic variable; ĸ = 0.17, 95% CI -0.22-0.55) to “almost perfect” (foot variable; ĸ = 1.00, 95% CI 1.00–1.00; trunk variable; ĸ = 0.82, 95% CI 0.66–0.97). A generalised kappa coefficient including the values from all raters and segments reached “moderate” interrater reliability (ĸ = 0.52, 95% CI 0.43–0.61), the corresponding value for the intrarater reliability reached “almost perfect” (ĸ = 0.82, 95% CI 0.77–0.86). Conclusions The present study shows a “moderate” interrater reliability and an “almost perfect” intrarater reliability for the variable all segments regardless of the raters experience. Thus, we conclude that the proposed standardised multi-segmental SLS test is reliable enough to be used in an active population.

Download Full-text

Visual assessment of movement quality in the single leg squat test: a review and meta-analysis of inter-rater and intrarater reliability

BMJ Open Sport & Exercise Medicine ◽

10.1136/bmjsem-2019-000541 ◽

2019 ◽

Vol 5 (1) ◽

pp. e000541 ◽

Cited By ~ 3

Author(s):

John Ressman ◽

Wilhelmus Johannes Andreas Grooten ◽

Eva Rasmussen Barr

Keyword(s):

Rating Scales ◽

Rating Scale ◽

Meta Analysis ◽

Intraclass Correlation ◽

Cochrane Library ◽

Intrarater Reliability ◽

Rater Reliability ◽

Movement Quality ◽

Step Down ◽

Single Leg Squat

Single leg squat (SLS) is a common tool used in clinical examination to set and evaluate rehabilitation goals, but also to assess lower extremity function in active people.ObjectivesTo conduct a review and meta-analysis on the inter-rater and intrarater reliability of the SLS, including the lateral step-down (LSD) and forward step-down (FSD) tests.DesignReview with meta-analysis.Data sourcesCINAHL, Cochrane Library, Embase, Medline (OVID) and Web of Science was searched up until December 2018.Eligibility criteriaStudies were eligible for inclusion if they were methodological studies which assessed the inter-rater and/or intrarater reliability of the SLS, FSD and LSD through observation of movement quality.ResultsThirty-one studies were included. The reliability varied largely between studies (inter-rater: kappa/intraclass correlation coefficients (ICC) = 0.00–0.95; intrarater: kappa/ICC = 0.13–1.00), but most of the studies reached ‘moderate’ measures of agreement. The pooled results of ICC/kappa showed a ‘moderate’ agreement for inter-rater reliability, 0.58 (95% CI 0.50 to 0.65), and a ‘substantial’ agreement for intrarater reliability, 0.68 (95% CI 0.60 to 0.74). Subgroup analyses showed a higher pooled agreement for inter-rater reliability of ≤3-point rating scales while no difference was found for different numbers of segmental assessments.ConclusionOur findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale. Since most of the included studies were affected with some form of methodological bias, our findings must be interpreted with caution.PROSPERO registration numberCRD42018077822.

Download Full-text

Functional Index-3: A Valid and Reliable Functional Outcome Assessment Measure in Patients With Dermatomyositis and Polymyositis

The Journal of Rheumatology ◽

10.3899/jrheum.191374 ◽

2020 ◽

Vol 48 (1) ◽

pp. 94-100 ◽

Cited By ~ 2

Author(s):

Floranne C. Ernste ◽

Christopher Chong ◽

Cynthia S. Crowson ◽

Tanaz A. Kermani ◽

Orla Ni Mhuircheartaigh ◽

...

Keyword(s):

Construct Validity ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Health Assessment ◽

Correlation Coefficients ◽

Measurement Properties ◽

Muscle Endurance ◽

Spearman Correlation ◽

Intrarater Reliability ◽

Functional Index

Objective.Patients with dermatomyositis (DM) and polymyositis (PM) have reduced muscle endurance.The aim of this study was to streamline the Functional Index-2 (FI-2) by developing the Functional Index-3 (FI-3) and to evaluate its measurement properties, content and construct validity, and intra- and interrater reliability.Methods.A dataset of the previously performed and validated FI-2 (n = 63) was analyzed for internal redundancy, floor, and ceiling effects. The content of the FI-2 was revised into the FI-3. Construct validity and intrarater reliability of FI-3 were tested on 43 DM and PM patients at 2 rheumatology centers. Interrater reliability was tested in 25 patients. The construct validity was compared with the Myositis Activities Profile (MAP), Health Assessment Questionnaire (HAQ), and Borg CR-10 using Spearman correlation coefficient.Results.Spearman correlation coefficients of 63 patients performing FI-3 revealed moderate to high correlations between shoulder flexion and hip flexion tasks and similar correlations with MAP and HAQ scores; there were lower correlations for neck flexion task. All FI-3 tasks had very low to moderate correlations with the Borg scale. Intraclass correlation coefficients (ICC) of FI-3 tasks for intrarater reliability (n = 25) were moderate to good (0.88–0.98). ICC of FI-3 tasks for interrater reliability (n = 17) were fair to good (range 0.83–0.96).Conclusion.The FI-3 is an efficient and valid method for clinically assessing muscle endurance in DM and PM patients. FI-3 construct validity is supported by the significant correlations between functional tasks and the MAP, HAQ, and Borg CR-10 scores.

Download Full-text

Reliability of the Assisting Hand Assessment (AHA) for Children and Youth With Acquired Brain Injury

Brain Impairment ◽

10.1375/brim.11.2.113 ◽

2010 ◽

Vol 11 (2) ◽

pp. 113-124 ◽

Cited By ~ 3

Author(s):

Elizabeth Davis ◽

Jane Galvin ◽

Cheryl Soo

Keyword(s):

Clinical Practice ◽

Brain Injury ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Motor Impairment ◽

Acquired Brain Injury ◽

Weighted Kappa ◽

Children And Youth ◽

Measurement Properties ◽

Intrarater Reliability

AbstractIntroduction:The ability to use both hands to interact with objects is required in daily activities and is therefore important to measure in clinical practice. The Assisting Hand Assessment (AHA) is unique in evaluating the function of a child or youth's assisting hand, through observing the spontaneous manipulation of objects during bimanual activity. The AHA was developed for children with unilateral motor impairment, and shows strong psychometric properties when used with children who have cerebral palsy (CP) or obstetric brachial plexus palsy (OBPP). The AHA is currently used in clinical practice with children who have an acquired brain injury (ABI), however there is limited research on the measurement properties of its use with this population.Objectives:The study aimed to determine the interrater and intrarater reliability of the AHA for children and youth with unilateral motor impairment following ABI. Methods: For interrater reliability, two occupational therapists (OT1 and OT2) independently rated the same 26 children and youth. For intrarater reliability, OT2 conducted a second assessment on the 26 participants 1 week later. Association between item scores on the AHA were analysed using weighted kappa (Kw), while intraclass correlation coefficients (ICCs) were used for domain and total scores.Results:The AHA items demonstrated good to excellent intrarater reliability (Kw= 0.67–1.00). Interrater reliability was good to excellent (Kw=0.60–0.84) for 20 of the 22 items of the AHA. Interrater and intrarater reliability coefficients for all domain and total scores were in the excellent range (ICC = 0.85–0.99).Conclusion:The current study indicates that the AHA shows good interrater and intrarater reliability when used with the paediatric ABI population. Findings provide preliminary support for the continued use of the AHA for children and youth with acquired hemiplegia.

Download Full-text

Reliability Assessment of Scores From Video-Recorded TGMD-3 Performances

Journal of Motor Learning and Development ◽

10.1123/jmld.2016-0007 ◽

2017 ◽

Vol 5 (1) ◽

pp. 59-68 ◽

Cited By ~ 16

Author(s):

Pauli Olavi Rintala ◽

Arja Kaarina Sääkslahti ◽

Susanna Iivonen

Keyword(s):

Motor Development ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Intrarater Reliability ◽

Gross Motor ◽

Gross Motor Development ◽

Percent Agreement ◽

Two Samples ◽

Ball Skills

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.

Download Full-text

A standardized nomenclature for cervical spine soft-tissue release and osteotomy for deformity correction

Journal of Neurosurgery Spine ◽

10.3171/2013.5.spine121067 ◽

2013 ◽

Vol 19 (3) ◽

pp. 269-278 ◽

Cited By ~ 50

Author(s):

Christopher P. Ames ◽

Justin S. Smith ◽

Justin K. Scheer ◽

Christopher I. Shaffrey ◽

Virginie Lafage ◽

...

Keyword(s):

Cervical Spine ◽

Soft Tissue ◽

Interrater Reliability ◽

Soft Tissue Release ◽

Spine Deformity ◽

Intrarater Reliability ◽

Perfect Agreement ◽

Moderate Agreement ◽

Posterior Anterior ◽

Anterior Posterior

Object Cervical spine osteotomies are powerful techniques to correct rigid cervical spine deformity. Many variations exist, however, and there is no current standardized system with which to describe and classify cervical osteotomies. This complicates the ability to compare outcomes across procedures and studies. The authors' objective was to establish a universal nomenclature for cervical spine osteotomies to provide a common language among spine surgeons. Methods A proposed nomenclature with 7 anatomical grades of increasing extent of bone/soft tissue resection and destabilization was designed. The highest grade of resection is termed the major osteotomy, and an approach modifier is used to denote the surgical approach(es), including anterior (A), posterior (P), anterior-posterior (AP), posterior-anterior (PA), anterior-posterior-anterior (APA), and posterior-anterior-posterior (PAP). For cases in which multiple grades of osteotomies were performed, the highest grade is termed the major osteotomy, and lower-grade osteotomies are termed minor osteotomies. The nomenclature was evaluated by 11 reviewers through 25 different radiographic clinical cases. The review was performed twice, separated by a minimum 1-week interval. Reliability was assessed using Fleiss kappa coefficients. Results The average intrarater reliability was classified as “almost perfect agreement” for the major osteotomy (0.89 [range 0.60–1.00]) and approach modifier (0.99 [0.95–1.00]); it was classified as “moderate agreement” for the minor osteotomy (0.73 [range 0.41–1.00]). The average interrater reliability for the 2 readings was the following: major osteotomy, 0.87 (“almost perfect agreement”); approach modifier, 0.99 (“almost perfect agreement”); and minor osteotomy, 0.55 (“moderate agreement”). Analysis of only major osteotomy plus approach modifier yielded a classification that was “almost perfect” with an average intrarater reliability of 0.90 (0.63–1.00) and an interrater reliability of 0.88 and 0.86 for the two reviews. Conclusions The proposed cervical spine osteotomy nomenclature provides the surgeon with a simple, standard description of the various cervical osteotomies. The reliability analysis demonstrated that this system is consistent and directly applicable. Future work will evaluate the relationship between this system and health-related quality of life metrics.

Download Full-text

The Head Control Scale: Interrater Reliability Among Therapy Students

American Journal of Occupational Therapy ◽

10.5014/ajot.2021.75s2-po12 ◽

2021 ◽

Vol 75 (Supplement_2) ◽

pp. 7512500012p1-7512500012p1

Author(s):

Amy Armstrong-Heimsoth ◽

Rachel Reed ◽

Samantha Grant ◽

Jodi Thomas ◽

Roy St. Laurent

Keyword(s):

Physical Therapy ◽

Interrater Reliability ◽

Kappa Coefficient ◽

Control Scale ◽

Head Control ◽

Primary Author ◽

Strong Agreement

Abstract Date Presented 04/13/21 This study assesses reliability and accuracy of the Head Control Scale (HCS) when used by inexperienced raters. Physical therapy and OT students used the HCS to rate five videotaped pediatric subjects. The kappa coefficient for interrater reliability among students was "almost perfect" (>.80). In one subscale, when comparing student raters with clinicians, there was strong agreement in grading between each group. The HCS may be consistently used by both new and experienced raters. Primary Author and Speaker: Amy Armstrong-Heimsoth Additional Authors and Speakers: Emily Mei Chun, Elizabeth Diane Hesse, Kelsey E. Ranneklev, and Camila E. Sanchez

Download Full-text

Intrarater and interrater reliability of the pediatric arteriovenous malformation compactness score in children

Journal of Neurosurgery Pediatrics ◽

10.3171/2013.2.peds12465 ◽

2013 ◽

Vol 11 (5) ◽

pp. 547-551 ◽

Cited By ~ 8

Author(s):

Fabio A. Frisoli ◽

Shih-Shan Lang ◽

Arastoo Vossough ◽

Anne Marie Cahill ◽

Gregory G. Heuer ◽

...

Keyword(s):

Arteriovenous Malformation ◽

Recurrence Rate ◽

Arteriovenous Malformations ◽

Interrater Reliability ◽

Intrarater Reliability ◽

Cerebral Arteriovenous Malformations ◽

Perfect Agreement ◽

Interventional Radiologist ◽

Κ Statistic

Object Cerebral arteriovenous malformations (AVMs) have a higher postresection recurrence rate in children than in adults. The authors' previous study demonstrated that a diffuse AVM (low compactness score) predicts postresection recurrence. The aims of this study were to evaluate the intra- and interrater reliability of the AVM compactness score. Methods Angiograms of 24 patients assigned a preoperative compactness score (scale of 1–3; 1 = most diffuse, 3 = most compact) in the authors' previous study were rerated by the same pediatric neuroradiologist 9 months later. A pediatric neurosurgeon, pediatric neuroradiology fellow, and interventional radiologist blinded to each other's ratings, the original ratings, and AVM recurrence also rated each AVM's compactness. Intrarater and interrater reliability were calculated using the κ statistic. Results Of the 24 AVMs, scores by the original neuroradiologist were 1 in 6 patients, 2 in 16 patients, and 3 in 2 patients. Intrarater reliability was 1.0. The κ statistic among the 4 raters was 0.69 (95% CI 0.44–0.89), which indicates substantial reliability. The interrater reliability between the neuroradiologist and neuroradiology fellow was moderate (κ = 0.59 [95%CI 0.20–0.89]) and was substantial between the neuroradiologist and neurosurgeon (κ = 0.74 [95% CI 0.41–1.0]). The neuroradiologist and interventional radiologist had perfect agreement (κ = 1.0). Conclusions Intrarater and interrater reliability of the AVM compactness score were excellent and substantial, respectively. These results demonstrate that the AVM compactness score is reproducible. However, the neuroradiologist and interventional radiologist had perfect agreement, which indicates that the compactness score is applied most accurately by those with extensive angiography experience.

Download Full-text

Influence of Rater Training on Inter- and Intrarater Reliability When Using the Rat Grimace Scale

Journal of the American Association for Laboratory Animal Science ◽

10.30802/aalas-jaalas-18-000044 ◽

2019 ◽

Vol 58 (2) ◽

pp. 178-183 ◽

Cited By ~ 8

Author(s):

Emily Q Zhang ◽

Vivian SY Leung ◽

Daniel SJ Pang

Keyword(s):

Acute Pain ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Training Group ◽

Intrarater Reliability ◽

Rater Training ◽

Trainee Group ◽

Pain Models ◽

Ongoing Pain ◽

And Performance

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.

Download Full-text

Could Residents Adequately Assess the Severity of Hidradenitis Suppurativa? Interrater and Intrarater Reliability Assessment of Major Scoring Systems

Dermatology ◽

10.1159/000501771 ◽

2019 ◽

Vol 236 (1) ◽

pp. 8-14 ◽

Cited By ~ 1

Author(s):

Katarzyna Włodarek ◽

Aleksandra Stefaniak ◽

Łukasz Matusiak ◽

Jacek C. Szepietowski

Keyword(s):

Interrater Reliability ◽

Hidradenitis Suppurativa ◽

Intraclass Correlation ◽

Scoring Systems ◽

Staging System ◽

Severity Index ◽

Assessment Tools ◽

Intrarater Reliability ◽

Global Assessment Scale ◽

Interrater Variability

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.

Download Full-text

Self-Managed Surveillance for Breast Cancer–Related Upper Body Issues: A Feasibility and Reliability Study

Physical Therapy ◽

10.1093/ptj/pzz181 ◽

2020 ◽

Vol 100 (3) ◽

pp. 468-476 ◽

Cited By ~ 2

Author(s):

Bolette S Rafn ◽

Chiara A Singh ◽

Julie Midtgaard ◽

Pat G Camp ◽

Margaret L McNeely ◽

...

Keyword(s):

Breast Cancer ◽

Interrater Reliability ◽

Physical Therapist ◽

Retention Rates ◽

Upper Body ◽

Intrarater Reliability ◽

Prospective Surveillance ◽

Reliability Study ◽

Shoulder Flexion ◽

Arm Circumference

Abstract Background Early identification of breast cancer–related upper body issues is important to enable timely physical therapist treatment. Objective This study evaluated the feasibility and reliability of women performing self-managed prospective surveillance for upper body issues in the early postoperative phase as part of a hospital-based physical therapy program. Design This was a prospective, single-site, single-group feasibility and reliability study. Methods Presurgery arm circumference measurements were completed at home and at the hospital by participants and by a physical therapist. Instruction in self-measurement was provided using a video guide. After surgery, all circumference measurements were repeated along with self-assessment and therapist assessment for shoulder flexion and abduction active range of motion. Feasibility was determined by recruitment/retention rates and participant-reported ease of performing self-measurements (1 [very difficult] to 10 [very easy]). Reliability was determined as intrarater reliability, interrater reliability, and agreement. Results Thirty-three women who were 53.4 (SD = 11.4) years old participated, with recruitment and retention rates of 79% and 94%, respectively. Participant-reported ease of measurement was 8.2 (SD = 2.2) before surgery and 8.0 (SD = 1.9) after surgery. The intrarater reliability and interrater reliability were excellent before surgery (intraclass correlation coefficient [ICC] ≥ 0.94; 95% confidence interval = 0.87–0.97) and after surgery (ICC ≥ 0.91; 95% confidence interval = 0.76–0.96). Agreement between self-assessed and therapist-assessed active shoulder flexion (κ = 0.79) and abduction (κ = 0.71) was good. Limitations Further testing is needed using a prospective design with a longer follow-up to determine whether self-managed prospective surveillance and timely treatment can hinder the development of chronic breast cancer–related upper body issues Conclusions Self-measured arm circumference and shoulder range of motion are reliable, and their inclusion in a hospital-based program of prospective surveillance for upper body issues seems feasible. This approach may improve early detection and treatment

Download Full-text