Interrater Reliability of the Berg Balance Scale When Used by Clinicians of Various Experience Levels to Assess People With Lower Limb Amputations

Background People with lower limb amputations frequently have impaired balance ability. The Berg Balance Scale (BBS) has excellent psychometric properties for people with neurologic disorders and elderly people dwelling in the community. A Rasch analysis demonstrated the validity of the BBS for people with lower limb amputations of all ability strata, but rater reliability has not been tested. Objective The study objective was to determine the interrater reliability and intrarater reliability of BBS scores and the differences in scores assigned by testers with various levels of experience when assessing people with lower limb amputations. Design This reliability study of video-recorded single-session BBS assessments had a cross-sectional design. Methods From a larger study of people with lower limb amputations, 5 consecutively recruited participants using prostheses were video recorded during an in-person BBS assessment. Sixteen testers independently rated the video-recorded assessments. Testers were 3 physical therapists, 1 occupational therapist, 3 third-year and 4 second-year doctor of physical therapy (DPT) students, and 5 first-year DPT students without clinical training. Rater reliability was calculated using intraclass correlation coefficients (ICC [2,k]). Differences in scores assigned by testers with various levels of experience were determined by use of an analysis of variance with Tukey post hoc tests. Results The average age of the participants was 53.0 years (SD=15.7). Amputations had occurred at the ankle disarticulation, transtibial, and transfemoral levels because of vascular, trauma, and medical etiologies an average of 8.2 years earlier (SD=7.9). Berg Balance Scale scores spanned all ability strata. Interrater reliability (ICC [2,k]=.99) and intrarater reliability of scores determined in person and through video-recorded assessments by the same testers (ICC [2,k]=.99) were excellent. For participants with the lowest levels of ability, licensed professionals assigned lower scores than did DPT students without clinical training. Limitations Intrarater reliability calculations were based on 2 testers. Conclusions Berg Balance Scale scores assigned to people using prostheses by testers with various levels of clinical experience had excellent interrater reliability and intrarater reliability.

Download Full-text

Validity, Reliability, and Ability to Identify Fall Status of the Berg Balance Scale, BESTest, Mini-BESTest, and Brief-BESTest in Patients With COPD

Physical Therapy ◽

10.2522/ptj.20150391 ◽

2016 ◽

Vol 96 (11) ◽

pp. 1807-1815 ◽

Cited By ~ 35

Author(s):

Cristina Jácome ◽

Joana Cruz ◽

Ana Oliveira ◽

Alda Marques

Keyword(s):

Interrater Reliability ◽

Intraclass Correlation ◽

Berg Balance Scale ◽

Performance Validity ◽

Operating Characteristics ◽

Intrarater Reliability ◽

Balance Test ◽

Balance Scale ◽

Balance Tests ◽

Abc Scale

Abstract Background The Berg Balance Scale (BBS), Balance Evaluation Systems Test (BESTest), Mini-BESTest, and Brief-BESTest are useful in the assessment of balance. Their psychometric properties, however, have not been tested in patients with chronic obstructive pulmonary disease (COPD). Objective This study aimed to compare the validity, reliability, and ability to identify fall status of the BBS, BESTest, Mini-BESTest, and the Brief-BESTest in patients with COPD. Design A cross-sectional study was conducted. Methods Forty-six patients (24 men, 22 women; mean age=75.9 years, SD=7.1) were included. Participants were asked to report their falls during the previous 12 months and to fill in the Activity-specific Balance Confidence (ABC) Scale. The BBS and the BESTest were administered. Mini-BESTest and Brief-BESTest scores were computed based on the participants' BESTest performance. Validity was assessed by correlating balance tests with each other and with the ABC Scale. Interrater reliability (2 raters), intrarater reliability (48–72 hours), and minimal detectable changes (MDCs) were established. Receiver operating characteristics assessed the ability of each balance test to differentiate between participants with and without a history of falls. Results Balance test scores were significantly correlated with each other (Spearman correlation rho=.73–.90) and with the ABC Scale (rho=.53–.75). Balance tests presented high interrater reliability (intraclass correlation coefficient [ICC]=.85–.97) and intrarater reliability (ICC=.52–.88) and acceptable MDCs (MDC=3.3–6.3 points). Although all balance tests were able to identify fall status (area under the curve=0.74–0.84), the BBS (sensitivity=73%, specificity=77%) and the Brief-BESTest (sensitivity=81%, specificity=73%) had the higher ability to identify fall status. Limitations Findings are generalizable mainly to older patients with moderate COPD. Conclusions The 4 balance tests are valid, reliable, and valuable in identifying fall status in patients with COPD. The Brief-BESTest presented slightly higher interrater reliability and ability to differentiate participants' fall status.

Download Full-text

Effects of robot-(Morning Walk®) assisted gait training for patients after stroke: a randomized controlled trial

Clinical Rehabilitation ◽

10.1177/0269215518806563 ◽

2018 ◽

Vol 33 (3) ◽

pp. 516-523 ◽

Cited By ~ 5

Author(s):

JaYoung Kim ◽

Dae Yul Kim ◽

Min Ho Chun ◽

Seong Woo Kim ◽

Ha Ra Jeon ◽

...

Keyword(s):

Randomized Controlled Trial ◽

Lower Limb ◽

Controlled Trial ◽

Gait Training ◽

Berg Balance Scale ◽

Control Group ◽

Balance Scale ◽

Randomized Controlled ◽

Scale Scores ◽

Conventional Physiotherapy

Objective: To investigate the effects of Morning Walk®–assisted gait training for patients with stroke. Design: Prospective randomized controlled trial. Setting: Three hospital rehabilitation departments (two tertiary and one secondary). Patients: We enrolled 58 patients with hemiparesis following a first-time stroke within the preceding year and with Functional Ambulation Category scores ⩾2. Intervention: The patients were randomly assigned to one of two treatment groups: 30 minutes of training with Morning Walk®, a lower limb rehabilitation robot, plus 1 hour of conventional physiotherapy (Morning Walk® group; n = 28); or 1.5 hour of conventional physiotherapy (control group; n = 30). All received treatment five times per week for three weeks. Main outcome measurements: The primary outcomes were walking ability, assessed using the Functional Ambulation Category scale, and lower limb function, assessed using the Motricity Index-Lower. Secondary outcomes included the 10 Meter Walk Test, Modified Barthel Index, Rivermead Mobility Index, and Berg Balance Scale scores. Results: A total of 10 patients were lost to follow-up, leaving a cohort of 48 for the final analyses. After training, all outcome measures significantly improved in both groups. In Motricity Index-Lower of the affected limb, the Morning Walk® group (∆mean ± SD; 19.68 ± 14.06) showed greater improvement ( p = .034) than the control group (∆mean ± SD; 11.70 ± 10.65). And Berg Balance Scale scores improved more ( p = .047) in the Morning Walk® group (∆mean ± SD; 14.36 ± 9.01) than the control group (∆mean ± SD; 9.65 ± 8.14). Conclusion: Compared with conventional physiotherapy alone, our results suggest that voluntary strength and balance of stroke patients with hemiparesis might be improved with Morning Walk®–assisted gait training combined with conventional physiotherapy.

Download Full-text

Between-Rater Reliability of the 6-Minute Walk Test, Berg Balance Scale, and Handheld Dynamometry in People with Multiple Sclerosis

International Journal of MS Care ◽

10.7224/1537-2073.2011-036 ◽

2013 ◽

Vol 15 (1) ◽

pp. 1-6 ◽

Cited By ~ 14

Author(s):

Elaine Toomey ◽

Susan Coote

Keyword(s):

Multiple Sclerosis ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Berg Balance Scale ◽

Small Sample ◽

Walk Test ◽

Rater Reliability ◽

Balance Scale ◽

6 Minute Walk Test ◽

Minute Walk

This study investigated the between-rater reliability of the Berg Balance Scale (BBS), 6-Minute Walk test (6MW), and handheld dynamometry (HHD) in people with multiple sclerosis (MS). Previous studies that examined BBS and 6MW reliability in people with MS have not used more than two raters, or analyzed different mobility levels separately. The reliability of HHD has not been previously reported for people with MS. In this study, five physical therapists assessed eight people with MS using the BBS, 6MW, and HHD, resulting in 12 pairs of data. Data were analyzed using intraclass correlation coefficients (ICCs), Spearman correlation coefficients (SCCs), and Bland and Altman methods. The results suggest excellent agreement for the BBS (SCC = 0.95, mean difference between raters [d̄] = 2.08, standard error of measurement [SEM] = 1.77) and 6MW (ICC = 0.98, d̄ = 5.22 m, SEM = 24.76 m) when all mobility levels are analyzed together. Reliability is lower in less mobile people with MS (BBS SCC = 0.6, d̄ = −1.83; 6MW ICC = 0.95, d̄ = 20.04 m). Although the ICC and SCC results for HHD suggest good-to-excellent reliability (0.65–0.85), d̄ ranges up to 17.83 N, with SEM values as high as 40.95 N. While the small sample size is a limitation of this study, the preliminary evidence suggests strong agreement between raters for the BBS and 6MW and decreased agreement between raters for people with greater mobility problems. The mean differences between raters for HHD are probably too high for it to be applied in clinical practice.

Download Full-text

Interrater and Intrarater Reliability of the Tuck Jump Assessment by Health Professionals of Varied Educational Backgrounds

Journal of Sports Medicine ◽

10.1155/2013/483503 ◽

2013 ◽

Vol 2013 ◽

pp. 1-5 ◽

Cited By ~ 9

Author(s):

Lisa A. Dudley ◽

Craig A. Smith ◽

Brandon K. Olson ◽

Nicole J. Chimera ◽

Brian Schmitz ◽

...

Keyword(s):

Health Professionals ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Clinical Implementation ◽

Intrarater Reliability ◽

Study Objective ◽

Intraclass Correlation Coefficients ◽

Educational Backgrounds ◽

And Training

Objective. The Tuck Jump Assessment (TJA), a clinical plyometric assessment, identifies 10 jumping and landing technique flaws. The study objective was to investigate TJA interrater and intrarater reliability with raters of different educational and clinical backgrounds.Methods. 40 participants were video recorded performing the TJA using published protocol and instructions. Five raters of varied educational and clinical backgrounds scored the TJA. Each score of the 10 technique flaws was summed for the total TJA score. Approximately one month later, 3 raters scored the videos again. Intraclass correlation coefficients determined interrater (5 and 3 raters for first and second session, resp.) and intrarater (3 raters) reliability.Results. Interrater reliability with 5 raters was poor (ICC = 0.47; 95% confidence intervals (CI) 0.33–0.62). Interrater reliability between 3 raters who completed 2 scoring sessions improved from 0.52 (95% CI 0.35–0.68) for session one to 0.69 (95% CI 0.55–0.81) for session two. Intrarater reliability was poor to moderate, ranging from 0.44 (95% CI 0.22–0.68) to 0.72 (95% CI 0.55–0.84).Conclusion. Published protocol and training of raters were insufficient to allow consistent TJA scoring. There may be a learned effect with the TJA since interrater reliability improved with repetition. TJA instructions and training should be modified and enhanced before clinical implementation.

Download Full-text

Psychometric Properties of the Mini-Balance Evaluation Systems Test (Mini-BESTest) in Community-Dwelling Individuals With Chronic Stroke

Physical Therapy ◽

10.2522/ptj.20120454 ◽

2013 ◽

Vol 93 (8) ◽

pp. 1102-1115 ◽

Cited By ~ 104

Author(s):

Charlotte S.L. Tsang ◽

Lin-Rong Liao ◽

Raymond C.K. Chung ◽

Marco Y.C. Pang

Keyword(s):

Psychometric Properties ◽

Correlation Coefficient ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Positive Likelihood Ratio ◽

Berg Balance Scale ◽

Chronic Stroke ◽

Intrarater Reliability ◽

Evaluation Systems ◽

Stroke Group

BackgroundThe Mini-Balance Evaluation Systems Test (Mini-BESTest) is a new balance assessment, but its psychometric properties have not been specifically tested in individuals with stroke.ObjectivesThe purpose of this study was to examine the reliability and validity of the Mini-BESTest and its accuracy in categorizing people with stroke based on fall history.DesignAn observational measurement study with a test-retest design was conducted.MethodsOne hundred six people with chronic stroke were recruited. Intrarater reliability was evaluated by repeating the Mini-BESTest within 10 days by the same rater. The Mini-BESTest was administered by 2 independent raters to establish interrater reliability. Validity was assessed by correlating Mini-BESTest scores with scores of other balance measures (Berg Balance Scale, one-leg-standing, Functional Reach Test, and Timed “Up & Go” Test) in the stroke group and by comparing Mini-BESTest scores between the stroke group and 48 control participants, and between fallers (≥1 falls in the previous 12 months, n=25) and nonfallers (n=81) in the stroke group.ResultsThe Mini-BESTest had excellent internal consistency (Cronbach alpha=.89–.94), intrarater reliability (intraclass correlation coefficient [3,1]=.97), and interrater reliability (intraclass correlation coefficient [2,1]=.96). The minimal detectable change at 95% confidence interval was 3.0 points. The Mini-BESTest was strongly correlated with other balance measures. Significant differences in Mini-BESTest total scores were found between the stroke and control groups and between fallers and nonfallers in the stroke group. In terms of floor and ceiling effects, the Mini-BESTest was significantly less skewed than other balance measures, except for one-leg-standing on the nonparetic side. The Berg Balance Scale showed significantly better ability to identify fallers (positive likelihood ratio=2.6) than the Mini-BESTest (positive likelihood ratio=1.8).LimitationsThe results are generalizable only to people with mild to moderate chronic stroke.ConclusionsThe Mini-BESTest is a reliable and valid tool for evaluating balance in people with chronic stroke.

Download Full-text

Reliability Assessment of Scores From Video-Recorded TGMD-3 Performances

Journal of Motor Learning and Development ◽

10.1123/jmld.2016-0007 ◽

2017 ◽

Vol 5 (1) ◽

pp. 59-68 ◽

Cited By ~ 16

Author(s):

Pauli Olavi Rintala ◽

Arja Kaarina Sääkslahti ◽

Susanna Iivonen

Keyword(s):

Motor Development ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Intrarater Reliability ◽

Gross Motor ◽

Gross Motor Development ◽

Percent Agreement ◽

Two Samples ◽

Ball Skills

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.

Download Full-text

Influence of Rater Training on Inter- and Intrarater Reliability When Using the Rat Grimace Scale

Journal of the American Association for Laboratory Animal Science ◽

10.30802/aalas-jaalas-18-000044 ◽

2019 ◽

Vol 58 (2) ◽

pp. 178-183 ◽

Cited By ~ 8

Author(s):

Emily Q Zhang ◽

Vivian SY Leung ◽

Daniel SJ Pang

Keyword(s):

Acute Pain ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Training Group ◽

Intrarater Reliability ◽

Rater Training ◽

Trainee Group ◽

Pain Models ◽

Ongoing Pain ◽

And Performance

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.

Download Full-text

Could Residents Adequately Assess the Severity of Hidradenitis Suppurativa? Interrater and Intrarater Reliability Assessment of Major Scoring Systems

Dermatology ◽

10.1159/000501771 ◽

2019 ◽

Vol 236 (1) ◽

pp. 8-14 ◽

Cited By ~ 1

Author(s):

Katarzyna Włodarek ◽

Aleksandra Stefaniak ◽

Łukasz Matusiak ◽

Jacek C. Szepietowski

Keyword(s):

Interrater Reliability ◽

Hidradenitis Suppurativa ◽

Intraclass Correlation ◽

Scoring Systems ◽

Staging System ◽

Severity Index ◽

Assessment Tools ◽

Intrarater Reliability ◽

Global Assessment Scale ◽

Interrater Variability

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.

Download Full-text

Balancing Access with Technology: Comparing In-Person and Telerehabilitation Berg Balance Scale Scores among Stroke Survivors

Physiotherapy Canada ◽

10.3138/ptc-2019-0095 ◽

2020 ◽

pp. e20190095

Author(s):

Dan Gillespie ◽

Crystal MacLellan ◽

Martin Ferguson-Pell ◽

Andrea Taeger ◽

Patricia J. Manns

Keyword(s):

Berg Balance Scale ◽

Stroke Survivors ◽

Balance Scale ◽

Scale Scores

Download Full-text

Intrarater and Interrater Reliability of Infrared Image Analysis of Forearm Acupoints before and after Moxibustion

Evidence-based Complementary and Alternative Medicine ◽

10.1155/2020/6328756 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Jiali Lou ◽

Yongliang Jiang ◽

Hantong Hu ◽

Xiaoyu Li ◽

Yajun Zhang ◽

...

Keyword(s):

Image Analysis ◽

Correlation Coefficient ◽

Temperature Change ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Infrared Image ◽

Infrared Images ◽

Intrarater Reliability ◽

Before And After

The objective of this study was to determine the intrarater and interrater reliabilities of infrared image analysis of forearm acupoints before and after moxibustion. In this work, infrared images of acupoints in the forearm of 20 volunteers (M/F, 10/10) were collected prior to and after moxibustion by infrared thermography (IRT). Two trained raters performed the analysis of infrared images in two different periods at a one-week interval. The intraclass correlation coefficient (ICC) was calculated to determine the intrarater and interrater reliabilities. With regard to the intrarater reliability, ICC values were between 0.758 and 0.994 (substantial to excellent). For the interrater reliability, ICC values ranged from 0.707 to 0.964 (moderate to excellent). Given that the intrarater and interrater reliability levels show excellent concordance, IRT could be a reliable tool to monitor the temperature change of forearm acupoints induced by moxibustion.

Download Full-text