Intrarater and Interrater Reliability of the Flexicurve Index, Flexicurve Angle, and Manual Inclinometer for the Measurement of Thoracic Kyphosis

Objective. This study aimed to describe the interrater and intrarater reliability of the flexicurve index, flexicurve angle, and manual inclinometer in swimmers. A secondary objective was to determine the level of agreement between the inclinometer angle and the flexicurve angle and to provide an equation to approximate one angle from the other.Methods. Thirty swimmers participated. Thoracic kyphosis was measured using the flexicurve and the manual inclinometer. Intraclass correlation coefficient, 95% confidence interval, and standard error of measurement were computed.Results. The flexicurve angle and index showed excellent intrarater (ICC = 0.94) and good interrater (ICC = 0.86) reliability. The inclinometer demonstrated excellent intrarater (ICC = 0.92) and interrater (ICC = 0.90) reliability. The flexicurve angle was systematically smaller and correlated poorly with the inclinometer angle (R2=0.384). The following equations can be used for approximate conversions: flexicurve angle = (0.275 × inclinometer angle) + 8.478; inclinometer angle = (1.396 × flexicurve angle) + 8.694.Conclusion. The inclinometer and flexicurve are both reliable instruments for thoracic kyphosis measurement in swimmers. Although the flexicurve and inclinometer angles are not directly comparable, the approximate conversion factors provided will permit translation of flexicurve angle to inclinometer angle and vice versa.

Download Full-text

Intrarater and Interrater Reliability of First Metatarsophalangeal Joint Dorsiflexion

Journal of the American Podiatric Medical Association ◽

10.7547/1020290 ◽

2012 ◽

Vol 102 (4) ◽

pp. 290-298 ◽

Cited By ~ 12

Author(s):

Angela M. Jones ◽

Sarah A. Curran

Keyword(s):

Standard Error ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Metatarsophalangeal Joint ◽

Visual Estimation ◽

First Metatarsophalangeal Joint ◽

Real Size ◽

Continued Use ◽

Standard Error Of Measurement ◽

Error Of Measurement

Background: Visual estimation (VE) and goniometric measurement (GM) are commonly used to assess first metatarsophalangeal joint dorsiflexion. The purposes of this study were to determine the intrarater and interrater reliability of VE and GM and to establish whether reliability was influenced by the experience of the examiner. Methods: Ten experienced and ten inexperienced examiners evaluated three real-size photographs of a first metatarsophalangeal joint positioned in various degrees of dorsiflexion on two separate occasions. Results: Experienced examiners demonstrated excellent intrarater and interrater reliability for GM (intraclass correlation coefficient [ICC], >0.953; standard error of measurement [SEM], 1.8°–2.5°) compared with inexperienced examiners, who showed fair-to-good intrarater and interrater reliability (ICC, 0.322–0.597; SEM, 2.0°–3.0°). For VE, inexperienced examiners demonstrated fair-to-good interrater and excellent intra-rater reliability (ICC, 0.666–0.808), which was higher compared with experienced examiners (ICC, 0.167–0.672). The SEM (2.8°–4.4°) was less varied than that of experienced examiners (SEM, 3.8°–6.4°) for VE, but neither group’s SEMs were clinically acceptable. Conclusions: Although minimal differences between intrarater and interrater reliability of GM and VE are noted, this study suggests that GM is more reliable than VE is when used by experienced examiners. These findings support the continued use of GM for first metatarsophalangeal joint dorsiflexion assessment. (J Am Podiatr Med Assoc 102(4): 290–298, 2012)

Download Full-text

Feasibility, Reliability, and Agreement of the WeeFIM Instrument in Dutch Children With Burns

Physical Therapy ◽

10.2522/ptj.20110419 ◽

2012 ◽

Vol 92 (7) ◽

pp. 958-966 ◽

Cited By ~ 3

Author(s):

Anuschka S. Niemeijer ◽

Heleen A. Reinders-Messelink ◽

Laurien M. Disseldorp ◽

Marianne K. Nieuwenhuis

Keyword(s):

Intraclass Correlation ◽

Correlation Coefficients ◽

Functional Independence ◽

Intrarater Reliability ◽

Burn Centers ◽

Intraclass Correlation Coefficients ◽

Functional Consequences ◽

Standard Error Of Measurement ◽

Dutch Children ◽

Error Of Measurement

Background Burns occur frequently in young children. To date, insufficient data are available to fully describe the functional consequences of burns. In different patient populations and countries, the WeeFIM instrument (“WeeFIM”) often is used to measure functional independence in children. Objective The purpose of this study was to examine the psychometric properties of the WeeFIM instrument for use in Dutch burn centers. Design This was an observational study. Methods The WeeFIM instrument was translated into Dutch. All clinicians who rated the children with the instrument passed the WeeFIM credentialing examination. They scored consecutive children (n=134) aged 6 months to 16 years admitted to Dutch burn centers with acute burns during a 1-year period at 2 to 3 weeks, 3 months, and 6 months postburn. To examine reliability, 2 raters scored a child at the same time (n=52, 9 raters) or the same rater scored a child twice within 1 week (n=7, 3 raters). Results After a few weeks, the WeeFIM assessment could be administered in less than 15 minutes. Clinicians thought it was difficult to rate a child aged between 2 and 4 years as well as the cognitive items. Nevertheless, reliability was good (all intraclass correlation coefficients [1,1] were above .80). The standard error of measurement was 3.7. Limitations Intrarater reliability was based on only 7 test-retest measurements. Within our clinical setting, it turned out to be difficult to schedule the same rater and patient twice in one week for repeated assessments. Assessments for interrater reliability, on the other hand, worked out well. Conclusions The WeeFIM instrument is a feasible and reliable instrument for use in children with burns. For evaluation of a child's individual progress, at least 11 points' improvement should be observed to state that a child has significantly improved.

Download Full-text

Reliability and Validity of a 1-Person Technique to Measure Humeral Torsion Using Ultrasound

Journal of Athletic Training ◽

10.4085/1062-6050-213-17 ◽

2018 ◽

Vol 53 (6) ◽

pp. 590-596

Author(s):

Daniel C. Hannah ◽

Jason S. Scibek ◽

Christopher R. Carcia ◽

Amy L. Phelps

Keyword(s):

Injury Risk ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Intrarater Reliability ◽

Convenience Sample ◽

Humeral Torsion ◽

Standard Error Of Measurement ◽

Bilateral Difference ◽

Strong Linear Relationship ◽

Error Of Measurement

Context: Knowledge of the bilateral difference in humeral torsion (HT) enables clinicians to implement appropriate interventions for soft tissue restrictions of the shoulder to restore rotational motion and reduce injury risk. Whereas the current ultrasound method for measuring HT requires 2 assessors, a more efficient 1-person technique (1PT) may be of value. Objective: To determine if a 1PT is a reliable and valid alternative to the established 2-person technique (2PT) for indirectly measuring HT using ultrasound. Design: Descriptive laboratory study. Setting: Biomechanics laboratory. Patients or Other Participants: A convenience sample of 16 volunteers (7 men, 9 women; age = 26.9 ± 6.8 years, height = 172.2 ± 10.7 cm, mass = 80.0 ± 13.3 kg). Main Outcome Measure(s): We collected the HT data using both the 1PT and 2PT from a total of 30 upper extremities (16 left, 14 right). Within-session intrarater reliability (intraclass correlation coefficient; ICC [3,1]) and standard error of measurement (SEM) were assessed for both techniques. Simple linear regression and Bland-Altman analysis were used to examine the validity of the 1PT when compared with the established 2PT. Results: The 1PT (ICC [3,1] = 0.992, SEM = 0.8°) and 2PT (ICC [3,1] = 0.979, SEM = 1.1°) demonstrated excellent within-session intrarater reliability. A strong linear relationship was demonstrated between the HT measurements collected with both techniques (r = 0.963, r2 = 0.928, F1,28 = 361.753, P < .001). A bias of −1.2° ± 2.6° was revealed, and the 95% limits of agreement indicated the 2 techniques can be expected to vary from −6.3° to 3.8°. Conclusions: The 1PT for measuring HT using ultrasound was a reliable and valid alternative to the 2PT. By reducing the number of testers involved, the 1PT may provide clinicians with a more efficient and practical means of obtaining these valuable clinical data.a

Download Full-text

Reliability of the Active-Knee-Extension and Straight-Leg-Raise Tests in Subjects With Flexibility Deficits

Journal of Sport Rehabilitation ◽

10.1123/jsr.2014-0220 ◽

2015 ◽

Vol 24 (4) ◽

Cited By ~ 15

Author(s):

Tiago Neto ◽

Lia Jacobsohn ◽

Ana I. Carita ◽

Raul Oliveira

Keyword(s):

Intraclass Correlation ◽

Relevant Information ◽

Knee Extension ◽

Lower Limbs ◽

Intrarater Reliability ◽

Test Retest Reliability ◽

Straight Leg Raise ◽

Academic Laboratory ◽

Standard Error Of Measurement ◽

Error Of Measurement

Context: The active-knee-extension test (AKE) and the straight-leg-raise test (SLR) are widely used for flexibility assessment. A number of investigations have tested the reliability of these measures, especially the AKE. However, in most studies, the sample involved subjects with normal flexibility. In addition, few studies have determined the standard error of measurement (SEM) and minimal detectable difference (MDD), which can provide complementary and more clinically relevant information than the intraclass correlation coefficient (ICC) alone. Objectives: This study aimed to determine the AKE and LSR intrarater (test-retest) reliability in subjects with flexibility deficits, as well as the correlation between the 2 tests. Design: Reliability study. Setting: Academic laboratory. Subjects: 102 recreationally active participants (48 male, 54 female) with no injury to the lower limbs and with flexibility deficits in the hamstrings muscle group. Main Outcomes: Intrarater reliability was determined using the ICC, complemented by the SEM and MDD. Measures: All participants performed, in each lower limb, 2 trials of the AKE and the SLR. Results: The ICC values found for AKE and SLR tests were, respectively, .87-.94 and .93-.97. The values for SEM were low for both tests (2.6-2.9° for AKE, 2.2-2.6° for SLR), as well as the calculated MDD (7-8° for AKE, 6-7° for SLR). A moderate to strong, and significant, correlation between AKE and SLR was determined for the dominant limb (r = .71) and the nondominant limb (r = .67). Conclusions: These findings suggest that both AKE and SLR have excellent intrarater reliability. The SEMs and MDDs recorded are also very encouraging for the use of these tests in subjects with flexibility deficits.

Download Full-text

Reliability Assessment of Scores From Video-Recorded TGMD-3 Performances

Journal of Motor Learning and Development ◽

10.1123/jmld.2016-0007 ◽

2017 ◽

Vol 5 (1) ◽

pp. 59-68 ◽

Cited By ~ 16

Author(s):

Pauli Olavi Rintala ◽

Arja Kaarina Sääkslahti ◽

Susanna Iivonen

Keyword(s):

Motor Development ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Intrarater Reliability ◽

Gross Motor ◽

Gross Motor Development ◽

Percent Agreement ◽

Two Samples ◽

Ball Skills

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.

Download Full-text

Reliability of an isometric test for measuring the strength of the hip abductors and adductors

Bioscience Journal ◽

10.14393/bj-v36n3a2020-42693 ◽

2020 ◽

Vol 36 (3) ◽

Author(s):

Manoella Regina de Souza Silva ◽

Thiago Keitiney Souza Teixeira Da Silva ◽

Valdemi Xavier Delmondes Junior ◽

Silvia Ribeiro Santos Araújo ◽

Adriano Percival Calderaro Calvo ◽

...

Keyword(s):

Intraclass Correlation ◽

Complex Structures ◽

Hip Joints ◽

Physically Active ◽

Adductor Muscles ◽

Hip Abduction ◽

Standard Error Of Measurement ◽

Dynamic Variables ◽

Error Of Measurement ◽

Familiarization Procedure

This study aimed to measure the reliability of a test for measuring the strength and strength imbalance of the hip abductors and adductors, using isokinetic equipment adapted for isometric testing. Thirteen healthy, physically active male individuals took part in the research. Two unilateral isometric tests were undertaken using a load cell attached to an adapted abductor bench machine: a hip abduction test and hip adduction test. Tests consisted of two maximum voluntary isometric contractions made for six seconds with a break of one minute between each. The following dynamic variables were measured: maximum force, mean force, rate of force development for each limb (right and left), and the existence of asymmetries between the limbs. For statistical analysis, the t-test, intraclass correlation coefficient (ICC), and standard error of measurement (SEM) were applied. Results: The methodology utilized for the evaluation of the hip abductors and adductors did not show reliability in most of the parameters researched, with the ICC neither sufficient or low, and the retest performance higher than the test (p < 0.05). The applied test was not reliable for assessing strength and strength imbalances of hip abductors and adductors in most of the parameters investigated. These results indicate that the hip joints, more precisely, the abductor and adductor muscles, are complex structures to be assessed. They need to be previously familiarized with the proposed exercise, as their performance does not occur habitually. It is recommended to develop new tests in order to measure hip abduction and adduction strength adding a prior familiarization procedure.

Download Full-text

Assessment of Knee Laxity Following Anterior Cruciate Ligament Reconstruction

Journal of Sport Rehabilitation ◽

10.1123/jsr.2.2.97 ◽

1993 ◽

Vol 2 (2) ◽

pp. 97-103 ◽

Cited By ~ 6

Author(s):

Kelly R. Holcomb ◽

Cheryl A. Skaggs ◽

Teddy W. Worrell ◽

Mark DeCarlo ◽

K. Donald Shelbourne

Keyword(s):

Anterior Cruciate Ligament ◽

Standard Error ◽

Cruciate Ligament ◽

Intraclass Correlation ◽

Knee Laxity ◽

Testing Procedures ◽

Kt 1000 ◽

Anterior Cruciate ◽

Standard Error Of Measurement ◽

Error Of Measurement

A paucity of information exists concerning reliability of the KT-1000 knee arthrometer (MEDmetric Corp., San Diego, CA) when used by different clinicians to assess the same anterior cruciate ligament-deficient patient. The purpose of this study was to determine the reliability and standard error of measurement of four clinicians who routinely report KT-1000 arthrometer values to referring orthopedic surgeons. Two physical therapists and two athletic trainers performed anterior laxity tests using the KT-1000 on 19 subjects. Intraclass correlation coefficients (ICC) and standard error of measurement (SEM) were used to determine reliability. Intratester ICC ranged from .98 to 1.0 and intratesterSEMranged from 0.0 to .28 mm. Intertester ICC andSEMfor all four testers were .53 and 1.2 mm, respectively. A 95% confidence interval (M ± 1.96 ×SEM) of the intertester variability ranged from −0.18 to 4.52 mm. Therefore, large intertester variation existed in KT-1000 values. Each facility should standardize testing procedures and establish intratester and intertester reliability for all clinicians reporting KT-1000 values.

Download Full-text

Influence of Rater Training on Inter- and Intrarater Reliability When Using the Rat Grimace Scale

Journal of the American Association for Laboratory Animal Science ◽

10.30802/aalas-jaalas-18-000044 ◽

2019 ◽

Vol 58 (2) ◽

pp. 178-183 ◽

Cited By ~ 8

Author(s):

Emily Q Zhang ◽

Vivian SY Leung ◽

Daniel SJ Pang

Keyword(s):

Acute Pain ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Training Group ◽

Intrarater Reliability ◽

Rater Training ◽

Trainee Group ◽

Pain Models ◽

Ongoing Pain ◽

And Performance

Rodent grimace scales facilitate assessment of ongoing pain. Reported rater training using these scales varies considerably and may contribute to the observed variability in interrater reliability. This study evaluated the effect of training on interrater reliability with the Rat Grimace Scale (RGS). Two training sets (42 and 150 images) were prepared from acute pain models. Four trainee raters progressed through 2 rounds of training, scoring 42 images (set 1) followed by 150 images (set 2a). After each round, trainees reviewed the RGS and any problematic images with an experienced rater. The 150 images were then rescored (set 2b). Four years later, trainees rescored the 150 images (set 2c). A second group of raters (no-training group) scored the same image sets without review with the experienced rater. Inter- and intrarater reliability were evaluated by using the intraclass correlation coefficient (ICC), and ICC values were compared by using the Feldt test. In the trainee group, interrater reliability increased from moderate to very good between sets 1 and 2b and increased between sets 2a and 2b. Action units with the highest and lowest ICC at set 2b were orbital tightening and whiskers, respectively. In comparison to an experienced rater, the ICC for all trainees improved, ranging from 0.88 to 0.91 at set 2b. Four years later, very good interrater reliability was retained, and intrarater reliability was good or very good). The interrater reliability of the no-training group was moderate and did not improve from set 1 to set 2b. Training improved interrater reliability, with an associated reduction in 95%CI. In addition, training improved interrater reliability with an experienced rater, and performance was retained.

Download Full-text

Could Residents Adequately Assess the Severity of Hidradenitis Suppurativa? Interrater and Intrarater Reliability Assessment of Major Scoring Systems

Dermatology ◽

10.1159/000501771 ◽

2019 ◽

Vol 236 (1) ◽

pp. 8-14 ◽

Cited By ~ 1

Author(s):

Katarzyna Włodarek ◽

Aleksandra Stefaniak ◽

Łukasz Matusiak ◽

Jacek C. Szepietowski

Keyword(s):

Interrater Reliability ◽

Hidradenitis Suppurativa ◽

Intraclass Correlation ◽

Scoring Systems ◽

Staging System ◽

Severity Index ◽

Assessment Tools ◽

Intrarater Reliability ◽

Global Assessment Scale ◽

Interrater Variability

A wide variety of assessment tools have been proposed for hidradenitis suppurativa (HS) until now, but none of them meets the criteria for an ideal score. Because there is no gold standard scoring system, the choice of the measure instrument depends on the purpose of use and even on the physician’s experience in the subject of HS. The aim of this study was to assess the intrarater and interrater reliability of 6 scoring systems commonly used for grading severity of HS: the Hurley Staging System, the Refined Hurley Staging, the Hidradenitis Suppurativa Severity Score System (IHS4), the Hidradenitis Suppurativa Severity Index (HSSI), the Sartorius Hidradenitis Suppurativa Score and the Hidradenitis Suppurativa Physician’s Global Assessment Scale (HS-PGA). On the scoring day, 9 HS patients underwent a physical examination and disease severity assessment by a group of 16 dermatology residents using all evaluated instruments. Then, intrarater reliability was calculated using intraclass correlation coefficient (ICC), and interrater variability was evaluated using the coefficient of variation (CV). In all 6 scorings the ICCs were >0.75, indicating high intrarater reliability of all presented scales. The study has also demonstrated moderate agreement between raters in most of the evaluated measure instruments. The most reproducible methods, according to CVs, seem to be the Hurley staging, IHS4, and HSSI. None of the 6 evaluated scoring systems showed a significant advantage over the other when comparing ICCs, and all the instruments seem to be very reliable methods. The interrater reliability was usually good, but the most repeatable results between researchers were obtained for the easiest scales, including Hurley scoring, IHS4 and HSSI.

Download Full-text

Test–Retest Reliability of Skill Tests in the F-MARC Battery for Youth Soccer Players

Perceptual and Motor Skills ◽

10.1177/0031512519866038 ◽

2019 ◽

Vol 126 (5) ◽

pp. 1006-1023 ◽

Cited By ~ 1

Author(s):

Alexis Padrón-Cabo ◽

Ezequiel Rey ◽

Alexandra Pérez-Ferreirós ◽

Anton Kalén

Keyword(s):

Intraclass Correlation ◽

Young Male ◽

Soccer Players ◽

Limits Of Agreement ◽

Retest Reliability ◽

Youth Soccer ◽

Test Retest Reliability ◽

Standard Error Of Measurement ◽

Absolute Reliability ◽

Error Of Measurement

This study aimed to evaluate the test–retest reliability of soccer skill tests belonging to the F-MARC test battery. To avoid bias during talent identification and development, coaches and scouts should be using reliable tests for assessing soccer-specific skills in young male players. Fifty-two U-14 outfield male soccer players performed F-MARC soccer skill tests on two occasions, separated by 7 days. After familiarization, we administered two trial sessions of five skill tests: speed dribbling, juggling, shooting, passing, and heading. We assessed absolute reliability by expressing the standard error of measurement as a coefficient of variation with 95% limits of agreement, and we assessed relative reliability with the intraclass correlation coefficient and with Pearson’s correlation ( r). The results demonstrated satisfactory relative and absolute reliability for speed dribbling, right foot juggling, short passing, shooting a dead ball right, shooting from a pass, heading in front, and heading right. However, reliability values for left foot juggling, chest-head-foot juggling, head-left-foot-right foot-chest-head juggling, long pass, and shooting a dead ball left tests were not strong enough to suggest their usage by coaches in training or sport scientists in research.

Download Full-text