Interrater and Intrarater Reliability in the Measurement of Ankle Joint Dorsiflexion is Independent of Examiner Experience and Technique Used

Background: Goniometric measurement is currently being used as a diagnostic and outcomes assessment tool for ankle joint dorsiflexion. Despite its common use, its interrater and intrarater reliability has been questioned. Methods: This is a prospective study examining whether the experience of the examiner or the technique used affects the interrater and intrarater reliability for measuring ankle joint dorsiflexion. Fourteen asymptomatic individuals (8 male and 6 female) with a mean age of 28.2 years (range, 23–52) were enrolled into this study. The years of clinical experience of the five examiners averaged 10.4 years (range, 0–26). Four examiners used a modified Root, Weed and Orien method of measuring ankle joint dorsiflexion. The fifth examiner utilized a nonstandardized technique. A standard goniometer was used for bilateral measurements of ankle joint dorsiflexion with the knee extended and flexed. All five examiners repeated each measurement three times during each of the three sessions, with each session spaced at least 1 week apart. Results: The interclass correlation coefficient reveals a moderate intrarater and poor interrater reliability in ankle joint dorsiflexion measurements using a standard goniometer. More importantly, further analysis indicates that the use of a standardized technique for measurement of ankle joint dorsiflexion or years of clinical experience does not increase the intrarater or interrater reliability. Conclusions: The utility of the goniometric measurement of ankle joint dorsiflexion may be limited. (J Am Podiatr Med Assoc 101(5): 407–414, 2011)

Download Full-text

Reliability assessment of the Biffl Scale for blunt traumatic cerebrovascular injury as detected on computer tomography angiography

Journal of Neurosurgery ◽

10.3171/2016.7.jns16849 ◽

2017 ◽

Vol 127 (1) ◽

pp. 32-35 ◽

Cited By ~ 15

Author(s):

Paul M. Foreman ◽

Christoph J. Griessenauer ◽

Kimberly P. Kicielinski ◽

Philip G. R. Schmalz ◽

Brandon G. Rocque ◽

...

Keyword(s):

Interrater Reliability ◽

High Energy ◽

Digital Subtraction ◽

Intrarater Reliability ◽

Substantial Agreement ◽

Cerebrovascular Injury ◽

Interclass Correlation ◽

Grade Assignment ◽

Computer Tomography Angiography ◽

Structural Injury

OBJECTIVEBlunt traumatic cerebrovascular injury (TCVI) represents structural injury to a vessel due to high-energy trauma. The Biffl Scale is a widely accepted grading scheme for these injuries that was developed using digital subtraction angiography. In recent years, screening CT angiography (CTA) has been used to identify patients with TCVI. The reliability of this scale, with injuries assessed using CTA, has not yet been determined.METHODSSeven independent raters, including 2 neurosurgeons, 2 neuroradiologists, 2 neurosurgical residents, and 1 neurosurgical vascular fellow, independently reviewed each presenting CTA of the neck performed in 40 patients with confirmed TCVI and assigned a Biffl grade. Ten images were repeated to assess intrarater reliability, for a total of 50 CTAs. Fleiss' multirater kappa (κ) and interclass correlation were calculated as a measure of interrater reliability. Weighted Cohen's κ was used to assess intrarater reliability.RESULTSFleiss' multirater κ was 0.65 (95% CI 0.61–0.69), indicating substantial agreement as to the Biffl grade assignment among the 7 raters. Interclass correlation was 0.82, demonstrating excellent agreement among the raters. Intrarater reliability was perfect (weighted Cohen's κ = 1) in 2 raters, and near perfect (weighted Cohen's κ > 0.8) in the remaining 5 raters.CONCLUSIONSGrading of TCVI with CTA using the Biffl Scale is reliable.

Download Full-text

Development and Psychometric Testing of the Mealtime Engagement Scale in Direct Care Providers of Nursing Home Residents With Dementia

The Gerontologist ◽

10.1093/geront/gnaa097 ◽

2020 ◽

Cited By ~ 1

Author(s):

Wen Liu ◽

Melissa Batchelor ◽

Kristine Williams

Keyword(s):

Nursing Home ◽

Internal Consistency ◽

Content Validity ◽

Convergent Validity ◽

Interrater Reliability ◽

Eating Quality ◽

Intrarater Reliability ◽

Interclass Correlation ◽

Psychometric Evidence ◽

Engagement Scale

Abstract Background and Objectives Mealtime engagement is defined as verbal and nonverbal assistance provided by caregivers to guide and motivate care recipients in eating. Quality mealtime engagement is critical to improve mealtime difficulties and intake among older adults with dementia requiring eating assistance. Few tools are feasible and valid to measure mealtime engagement. This study developed and tested the Mealtime Engagement Scale (MES). Research Design and Methods Items were developed based on literature review and expert review and finalized based on content validity and corrected item-total correlation. A secondary analysis of 87 videotaped observations capturing 18 nursing home staff providing mealtime care to residents with dementia was conducted. Internal consistency, interrater reliability, and intrarater reliability were assessed. Concurrent and convergent validity were examined through correlation (rs) with the Relational Behavior Scale (RBS) and the Mealtime Relational Care Checklist (M-RCC), respectively. Results The 18-item MES was developed with adequate content validity (Scale-content validity index [CVI] = 1.00; Scale-CVI/Average = 0.962–0.987). Each item is scored from 0 (never) to 3 (always). The total scale score ranges from 0 to 54. Higher scores indicate greater mealtime engagement. The MES had very good internal consistency (Cronbach’s α = 0.837), outstanding interrater reliability (interclass correlation = 0.920), outstanding intrarater reliability (interclass correlation = 0.956), adequate concurrent validity based on strong correlation with the RBS (rs = 0.821, p < .001), and fair convergent validity based on weak correlation with the M-RCC (rs = 0.219, p = .042). Discussion and Implications Findings provide preliminary psychometric evidence of MES to measure mealtime engagement. Future testing is needed among more and diverse samples in different care settings to accumulate psychometric evidence.

Download Full-text

The Reliability of a Smartphone Goniometer Application Compared With a Traditional Goniometer for Measuring Ankle Joint Range of Motion

Journal of the American Podiatric Medical Association ◽

10.7547/16-128 ◽

2019 ◽

Vol 109 (1) ◽

pp. 22-29 ◽

Cited By ~ 3

Author(s):

Motaz Abdalla Alawna ◽

Bayram H. Unver ◽

Ertugrul O. Yuksel

Keyword(s):

Range Of Motion ◽

Ankle Joint ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Ankle Dorsiflexion ◽

Data Sets ◽

Intrarater Reliability ◽

New Device ◽

Advantages And Disadvantages

Background: Evaluation of range of motion (ROM) is integral to assessment of the musculoskeletal system, is required in health fitness and pathologic conditions, and is used as an objective outcome measure. Several methods are described to check ROM, each with advantages and disadvantages. Hence, this study introduces a new device using a smartphone goniometer to measure ankle joint ROM. Objective: To test the reliability of smartphone goniometry in the ankle joint by comparing it with the universal goniometer (UG) and to assess interrater and intrarater reliability for the smartphone goniometer record (SGR) application. Methods: Fifty-eight healthy volunteers (29 men and 29 women aged 18–30 years) underwent SGR and UG measurement of ankle joint dorsiflexion and plantarflexion. Two examiners measured ankle joint ROM. Descriptive statistics were calculated for descriptive and anthropometric variables, as were intraclass correlation coefficients (ICCs). Results: There were 58 usable data sets. For measuring ankle dorsiflexion ROM, both instruments showed excellent interrater reliability: UG (ICC = 0.87) and SGR (ICC = 0.89). Intrarater reliability was excellent in both instruments in ankle dorsiflexion: UG and SGR (mean ICC = 0.91). For measuring ankle plantarflexion, both instruments showed excellent interrater reliability: UG (ICC = 0.76) and SGR (ICC = 0.82). Intrarater reliability was excellent in both instruments in ankle plantarflexion: UG (mean ICC = 0.85) and SGR (mean ICC = 0.82). Conclusions: Smartphone-based goniometers can be used to assess active ROM of the ankle joint because they can achieve a high degree of intrarater and interrater reliability.

Download Full-text

The Design, Development, and Reliability Testing of a New Innovative Device to Measure Ankle Joint Dorsiflexion

Journal of the American Podiatric Medical Association ◽

10.7547/14-051 ◽

2016 ◽

Vol 106 (5) ◽

pp. 338-343

Author(s):

James Charles

Keyword(s):

Confidence Interval ◽

Ankle Joint ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Design Development ◽

Single Measure ◽

Ankle Joint Dorsiflexion ◽

Left And Right ◽

Treatment Table ◽

Average Measure

Background: In clinical and research settings, ankle joint dorsiflexion needs to be reliably measured. Dorsiflexion is often measured by goniometry, but the intrarater and interrater reliability of this technique have been reported to be poor. Many devices to measure dorsiflexion have been developed for clinical and research use. An evaluation of 12 current tools showed that none met all of the desirable criteria. The purpose of this study was to design and develop a device that rates highly in all of the criteria and that can be proved to be highly reliable. Methods: While supine on a treatment table, 14 participants had a foot placed in the Charles device and ankle joint dorsiflexion measured and recorded three times with a digital inclinometer. The mean of the three readings was determined to be the ankle joint dorsiflexion. Results: The analysis used was intraclass correlation coefficient (ICC). There was very little difference in ICC single or average measures between left and right feet, so data were pooled (N = 28). The single-measure ICC was 0.998 (95% confidence interval, 0.996-0.998). The average-measure ICC was 0.998 (95% confidence interval, 0.995-0.999). Limits of agreement for the average measure were also very good: −1.30° to 1.65°. Conclusions: The Charles device meets all of the desirable criteria and has many innovative features, increasing its appropriateness for clinical and research applications. It has a suitable design for measuring dorsiflexion and high intrarater and interrater reliability.

Download Full-text

Measurement of Foot Dorsiflexion

Journal of the American Podiatric Medical Association ◽

10.7547/0940573 ◽

2004 ◽

Vol 94 (6) ◽

pp. 573-577 ◽

Cited By ~ 14

Author(s):

Rolf Scharfbillig ◽

Sheila D. Scutter

Keyword(s):

Correlation Coefficient ◽

Ankle Joint ◽

Intraclass Correlation Coefficient ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Adolescent Population ◽

Ankle Joint Dorsiflexion

The Lidcombe template was introduced in 1991 for the nonweightbearing assessment of ankle joint dorsiflexion, and it has shown excellent reliability in impaired and unimpaired adult populations. We discuss limitations of the original template and test the reliability of a modified apparatus in an adolescent population. Intrarater and interrater reliability were assessed for 14 children (28 limbs) aged 7 to 14 years, returning intraclass correlation coefficient (1,1) results of greater than 0.99 for both aspects of reliability. (J Am Podiatr Med Assoc 94(6): 573–577, 2004)

Download Full-text

Hypertonia Assessment Tool

Journal of Child Neurology ◽

10.1177/0883073816671681 ◽

2016 ◽

Vol 32 (1) ◽

pp. 132-138 ◽

Cited By ~ 8

Author(s):

Petra Marsico ◽

Victoria Frontzek-Weps ◽

Julia Balzer ◽

Hubertus J. A. van Hedel

Keyword(s):

Clinical Diagnosis ◽

Interrater Reliability ◽

Assessment Tool ◽

Kappa Statistics ◽

Percentage Agreement ◽

Intrarater Reliability ◽

First Order ◽

Agreement Coefficient ◽

Item Scores ◽

Good Agreement

The Hypertonia Assessment Tool is a 7-item instrument that discriminates spasticity, dystonia, and rigidity on 3 levels: item scores, subtype, and hypertonia diagnosis for each extremity. We quantified the inter- and intrarater reliability using Kappa statistics, Gwet’s first-order agreement coefficient (both with 95% confidence interval), and percentage agreement for all levels. For validity, we compared the Hypertonia Assessment Tool subtype with the clinical diagnosis provided by the physicians. Two physiotherapists tested 45 children with neuromotor disorders. The interrater reliability (n = 45) of the Hypertonia Assessment Tool subtype was moderate to substantial whereas the intrarater reliability (n = 42) was almost perfect. The Hypertonia Assessment Tool showed good agreement in detecting spasticity. On the contrary, there was a higher presence of dystonia of 24% to 25% tested with the Hypertonia Assessment Tool compared to the clinical diagnosis. Even some individual items showed lower agreement between raters; the Hypertonia Assessment Tool subtypes and diagnosis were reliable. Validity of the Hypertonia Assessment Tool to test spasticity is confirmed, whereas, for dystonia and rigidity, further studies are needed.

Download Full-text

CODEM Instrument

GeroPsych ◽

10.1024/1662-9647/a000100 ◽

2014 ◽

Vol 27 (1) ◽

pp. 23-31 ◽

Cited By ~ 4

Author(s):

Anne Kuemmel (This author contributed eq ◽

Julia Haberstroh (This author contributed ◽

Johannes Pantel

Keyword(s):

Convergent Validity ◽

Interrater Reliability ◽

Discriminant Validity ◽

Assessment Tool ◽

Intraclass Correlation ◽

Well Being ◽

Communication Behavior ◽

People With Dementia ◽

Pearson's R ◽

Pearson’S R

Communication and communication behaviors in situational contexts are essential conditions for well-being and quality of life in people with dementia. Measuring methods, however, are limited. The CODEM instrument, a standardized observational communication behavior assessment tool, was developed and evaluated on the basis of the current state of research in dementia care and social-communicative behavior. Initially, interrater reliability was examined by means of videoratings (N = 10 people with dementia). Thereupon, six caregivers in six German nursing homes observed 69 residents suffering from dementia and used CODEM to rate their communication behavior. The interrater reliability of CODEM was excellent (mean κ = .79; intraclass correlation = .91). Statistical analysis indicated that CODEM had excellent internal consistency (Cronbach’s α = .95). CODEM also showed excellent convergent validity (Pearson’s R = .88) as well as discriminant validity (Pearson’s R = .63). Confirmatory factor analysis verified the two-factor solution of verbal/content aspects and nonverbal/relationship aspects. With regard to the severity of the disease, the content and relational aspects of communication exhibited different trends. CODEM proved to be a reliable, valid, and sensitive assessment tool for examining communication behavior in the field of dementia. CODEM also provides researchers a feasible examination tool for measuring effects of psychosocial intervention studies that strive to improve communication behavior and well-being in dementia.

Download Full-text

Reliability and Validity Analyses of the Youth Level of Service/Case Management Inventory

Criminal Justice and Behavior ◽

10.1177/0093854804274373 ◽

2005 ◽

Vol 32 (3) ◽

pp. 329-344 ◽

Cited By ~ 103

Author(s):

Fred Schmidt ◽

Robert D. Hoge ◽

Lezlie Gomes

Keyword(s):

Juvenile Offenders ◽

Case Management ◽

Interrater Reliability ◽

Assessment Tool ◽

Reliability And Validity ◽

Level Of Service ◽

Risk Level ◽

Criminogenic Needs ◽

Health Assessments ◽

Males And Females

The Youth Level of Service/Case Management Inventory (YLS/CMI) is a structured assessment tool designed to facilitate the effective intervention and rehabilitation of juvenile offenders by assessing each youth’s risk level and criminogenic needs. The present study examined the YLS/CMI’s reliability and validity in a sample of 107 juvenile offenders who were court-referred for mental health assessments. Results demonstrated the YLS/CMI’s internal consistency and interrater reliability. Moreover, the instrument’s predictive validity was substantiated on a number of recidivism measures for both males and females. Limitations of the current findings are discussed.

Download Full-text

Reliability Assessment of Scores From Video-Recorded TGMD-3 Performances

Journal of Motor Learning and Development ◽

10.1123/jmld.2016-0007 ◽

2017 ◽

Vol 5 (1) ◽

pp. 59-68 ◽

Cited By ~ 16

Author(s):

Pauli Olavi Rintala ◽

Arja Kaarina Sääkslahti ◽

Susanna Iivonen

Keyword(s):

Motor Development ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Intrarater Reliability ◽

Gross Motor ◽

Gross Motor Development ◽

Percent Agreement ◽

Two Samples ◽

Ball Skills

This study examined the intrarater and interrater reliability of the Test of Gross Motor Development—3rd Edition (TGMD-3). Participants were 60 Finnish children aged between 3 and 9 years, divided into three separate samples of 20. Two samples of 20 were used to examine the intrarater reliability of two different assessors, and the third sample of 20 was used to establish interrater reliability. Children’s TGMD-3 performances were video-recorded and later assessed using an intraclass correlation coefficient, a kappa statistic, and a percent agreement calculation. The intrarater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.69 to 0.77, and percent agreement ranged from 87 to 91%. The interrater reliability of the locomotor subtest, ball skills subtest, and gross motor total score ranged from 0.56 to 0.64. Percent agreement of 83% was observed for locomotor skills, ball skills, and total skills, respectively. Hop, horizontal jump, and two-hand strike assessments showed the most difference between the assessors. These results show acceptable reliability for the TGMD-3 to analyze children’s gross motor skills.

Download Full-text

A standardized nomenclature for cervical spine soft-tissue release and osteotomy for deformity correction

Journal of Neurosurgery Spine ◽

10.3171/2013.5.spine121067 ◽

2013 ◽

Vol 19 (3) ◽

pp. 269-278 ◽

Cited By ~ 50

Author(s):

Christopher P. Ames ◽

Justin S. Smith ◽

Justin K. Scheer ◽

Christopher I. Shaffrey ◽

Virginie Lafage ◽

...

Keyword(s):

Cervical Spine ◽

Soft Tissue ◽

Interrater Reliability ◽

Soft Tissue Release ◽

Spine Deformity ◽

Intrarater Reliability ◽

Perfect Agreement ◽

Moderate Agreement ◽

Posterior Anterior ◽

Anterior Posterior

Object Cervical spine osteotomies are powerful techniques to correct rigid cervical spine deformity. Many variations exist, however, and there is no current standardized system with which to describe and classify cervical osteotomies. This complicates the ability to compare outcomes across procedures and studies. The authors' objective was to establish a universal nomenclature for cervical spine osteotomies to provide a common language among spine surgeons. Methods A proposed nomenclature with 7 anatomical grades of increasing extent of bone/soft tissue resection and destabilization was designed. The highest grade of resection is termed the major osteotomy, and an approach modifier is used to denote the surgical approach(es), including anterior (A), posterior (P), anterior-posterior (AP), posterior-anterior (PA), anterior-posterior-anterior (APA), and posterior-anterior-posterior (PAP). For cases in which multiple grades of osteotomies were performed, the highest grade is termed the major osteotomy, and lower-grade osteotomies are termed minor osteotomies. The nomenclature was evaluated by 11 reviewers through 25 different radiographic clinical cases. The review was performed twice, separated by a minimum 1-week interval. Reliability was assessed using Fleiss kappa coefficients. Results The average intrarater reliability was classified as “almost perfect agreement” for the major osteotomy (0.89 [range 0.60–1.00]) and approach modifier (0.99 [0.95–1.00]); it was classified as “moderate agreement” for the minor osteotomy (0.73 [range 0.41–1.00]). The average interrater reliability for the 2 readings was the following: major osteotomy, 0.87 (“almost perfect agreement”); approach modifier, 0.99 (“almost perfect agreement”); and minor osteotomy, 0.55 (“moderate agreement”). Analysis of only major osteotomy plus approach modifier yielded a classification that was “almost perfect” with an average intrarater reliability of 0.90 (0.63–1.00) and an interrater reliability of 0.88 and 0.86 for the two reviews. Conclusions The proposed cervical spine osteotomy nomenclature provides the surgeon with a simple, standard description of the various cervical osteotomies. The reliability analysis demonstrated that this system is consistent and directly applicable. Future work will evaluate the relationship between this system and health-related quality of life metrics.

Download Full-text