Development and reliability of a structured interview guide for
                        the Montgomery-Åsberg Depression Rating Scale (SIGMA)

BackgroundThe Montgomery-Åsberg Depression Rating Scale (MADRS) is often used in clinical trials to select patients and to assess treatment efficacy. The scale was originally published without suggested questions for clinicians to use in gathering the information necessary to rate the items. Structured and semi-structured interview guides have been found to improve reliability with other scales.AimsTo describe the development and test-retest reliability of a structured interview guide for the MADRS (SIGMA).MethodA total of 162 test-retest interviews were conducted by 81 rater pairs. Each patient was interviewed twice, once by each rater conducting an independent interview.ResultsThe intraclass correlation for total score between raters using the SIGMA was r = 0.93, P < 0.0001. All ten items had good to excellent interrater reliability.ConclusionsUse of the SIGMA can result in high reliability of MADRS scores in evaluating patients with depression.

Download Full-text

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Educational and Psychological Measurement ◽

10.1177/0013164419899731 ◽

2020 ◽

Vol 80 (4) ◽

pp. 808-820

Author(s):

Cindy M. Walker ◽

Sakine Göçer Şahin

Keyword(s):

Differential Item Functioning ◽

Interrater Reliability ◽

Rating Scales ◽

Rating Scale ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Promising Alternative ◽

Constructed Response ◽

Polytomous Item ◽

Item Functioning

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared with traditional interrater reliability measures. Three different procedures that can be used as measures of interrater reliability were compared: (1) intraclass correlation coefficient (ICC), (2) Cohen’s kappa statistic, and (3) DIF statistic obtained from Poly-SIBTEST. The results of this investigation indicated that DIF procedures appear to be a promising alternative to assess the interrater reliability of constructed response items, or other polytomous types of items, such as rating scales. Furthermore, using DIF to assess interrater reliability does not require a fully crossed design and allows one to determine if a rater is either more severe, or more lenient, in their scoring of each individual polytomous item on a test or rating scale.

Download Full-text

Reliability and Responsiveness of Two Physical Performance Measures Examined in the Context of a Functional Training Intervention

Physical Therapy ◽

10.1093/ptj/80.1.8 ◽

2000 ◽

Vol 80 (1) ◽

pp. 8-16 ◽

Cited By ~ 61

Author(s):

Mary B King ◽

James O Judge ◽

Robert Whipple ◽

Leslie Wolfson

Keyword(s):

Physical Performance ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Intervention Group ◽

Training Intervention ◽

Control Group ◽

Walk Test ◽

Test Retest Reliability ◽

6 Minute Walk Test ◽

Minute Walk

Abstract Background and Purpose. The reliability and responsiveness of 2 physical performance measures were assessed in this nonrandomized, controlled pilot exercise intervention. Subjects. Forty-five older individuals with mobility impairment (mean age=77.9 years, SD=5.9, range=70–92) were sequentially assigned to participate in an exercise program (intervention group) or to a control group. Methods. The intervention group performed exercise 3 times a week for 12 weeks that targeted muscle force, endurance, balance, and flexibility. Outcome measures were the 8-item Physical Performance Test (PPT-8) and the 6-minute walk test. Test-retest reliability and responsiveness indexes were determined for both tests; interrater reliability was measured for the PPT-8. Results. The intraclass correlation coefficient for interrater reliability for the PPT-8 was .96. Intraclass correlation coefficients for test-retest reliability were .88 for the PPT-8 and .93 for the 6-minute walk test. The intervention group improved 2.4 points and the control group improved 0.7 point on the PPT-8, as compared with baseline measurements. There was no change in 6-minute walk test distance in the intervention group when compared with the control group. The responsiveness index was .8 for the PPT-8 and .6 for the 6-minute walk test. Conclusion and Discussion. Measurements for both the PPT-8 and the 6-minute walk test appeared to be highly reliable. The PPT-8 was more responsive than the 6-minute walk test to change in performance expected with this functional training intervention.

Download Full-text

Structured Interview Guide for the Hamilton Rating Scale for Depression

PsycTESTS Dataset ◽

10.1037/t67131-000 ◽

1988 ◽

Cited By ~ 17

Author(s):

Janet B. Williams

Keyword(s):

Rating Scale ◽

Structured Interview ◽

Interview Guide ◽

Hamilton Rating Scale

Download Full-text

Reliability of Safe Maximum Lifting Determinations of a Functional Capacity Evaluation

Physical Therapy ◽

10.1093/ptj/82.4.364 ◽

2002 ◽

Vol 82 (4) ◽

pp. 364-371 ◽

Cited By ~ 73

Author(s):

Douglas P Gross ◽

Michele C Battié

Keyword(s):

Functional Capacity ◽

Repeated Measures ◽

Interrater Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Functional Capacity Evaluation ◽

Measurement Variability ◽

Retest Reliability ◽

Repeated Measures Design ◽

Test Retest Reliability

Abstract Background and Purpose. Functional capacity evaluations (FCEs) are measurement tools used in predicting readiness to return to work following injury. The interrater and test-retest reliability of determinations of maximal safe lifting during kinesiophysical FCEs were examined in a sample of people who were off work and receiving workers' compensation. Subjects. Twenty-eight subjects with low back pain who had plateaued with treatment were enrolled. Five occupational therapists, trained and experienced in kinesiophysical methods, conducted testing. Methods. A repeated-measures design was used, with raters testing subjects simultaneously, yet independently. Subjects were rated on 2 occasions, separated by 2 to 4 days. Analyses included intraclass correlation coefficients (ICCs) and 95% confidence intervals. Results. The ICC values for interrater reliability ranged from .95 to .98. Test-retest values ranged from .78 to .94. Discussion and Conclusion. Inconsistencies in subjects' performance across sessions were the greatest source of FCE measurement variability. Overall, however, test-retest reliability was good and interrater reliability was excellent.

Download Full-text

Preschool Children's Taste Acceptance of Highly Concentrated Fluoride Compounds: Effects on Nonverbal Behavior

Journal of Clinical Pediatric Dentistry ◽

10.17796/jcpd.38.1.1501887254xt5u07 ◽

2013 ◽

Vol 38 (1) ◽

pp. 31-37 ◽

Cited By ~ 1

Author(s):

AK Kolb ◽

K Schmied ◽

P Faßheber ◽

R Heinrich-Weltzien

Keyword(s):

Preschool Children ◽

Nonverbal Behavior ◽

Interrater Reliability ◽

Rating Scale ◽

Intraclass Correlation ◽

Positive Behavior ◽

Behavior Rating ◽

Negative Behavior ◽

Behavior Rating Scale ◽

Standardized Protocol

Objective: The aim of this video-based study was to examine the taste acceptance of children between the ages of 2 and 5 years regarding highly concentrated fluoride preparations in kindergarten-based preventive programs. Study design: The fluoride preparation Duraphat was applied to 16 children, Elmex fluid to 15 children, and Fluoridin N5 to 14 children. The procedure was conducted according to a standardized protocol and videotaped. Three raters evaluated the children's nonverbal behavior as a measure of taste acceptance on the Frankl Behavior Rating Scale. The interrater reliability (intraclass correlation coefficient; ICC) was .86. In an interview, children indicated the taste of the fluoride preparations on a three-point “smiley” rating scale. The interviewer used a hand puppet during the survey to establish confidence between the children and examiners. Results: Children's nonverbal behavior was significantly more positive after Fluoridin N5 and Duraphat were applied compared to the application of Elmex fluid. The same trend was found during the smiley assessment. The response of children who displayed cooperative positive behavior before the application of fluoride preparations was significantly more positive than those who displayed uncooperative negative behavior. Conclusion: To achieve a high acceptance of the application of fluoride preparations among preschool children, flavorful preparations should be used.

Download Full-text

Test-Retest and Interrater Reliability of the Functional Movement Screen

Journal of Athletic Training ◽

10.4085/1062-6050-48.2.11 ◽

2013 ◽

Vol 48 (3) ◽

pp. 331-336 ◽

Cited By ~ 51

Author(s):

Rebecca Shultz ◽

Scott C. Anderson ◽

Gordon O. Matheson ◽

Brandon Marcello ◽

Thor Besier

Keyword(s):

Interrater Reliability ◽

Intraclass Correlation ◽

Video Recording ◽

Functional Movement Screen ◽

Good Test ◽

Good Reliability ◽

Functional Movement ◽

Retest Reliability ◽

Test Retest Reliability ◽

First Session

Context: The Functional Movement Screen (FMS) is a popular test to evaluate the degree of painful, dysfunctional, and asymmetric movement patterns. Despite great interest in the FMS, test-retest reliability data have not been published. Objective: To assess the test-retest and interrater reliability of the FMS and to compare the scoring by 1 rater during a live session and the same session on video. Design: Cross-sectional study. Setting: Human performance laboratory in the sports medicine center. Patients or Other Participants: A total of 21 female (age = 19.6 ± 1.5 years, height = 1.7 ± 0.1 m, mass = 64.4 ± 5.1 kg) and 18 male (age = 19.7 ± 1.0 years, height = 1.9 ± 0.1 m, mass = 80.1 ± 9.9 kg) National Collegiate Athletic Association Division IA varsity athletes volunteered. Intervention(s): Each athlete was tested and retested 1 week later by the same rater who also scored the athlete's first session from a video recording. Five other raters scored the video from the first session. Main Outcome Measure(s): The Krippendorff α (K α) was used to assess the interrater reliability, whereas intraclass correlation coefficients (ICCs) were used to assess the test-retest reliability and reliability of live-versus-video scoring. Results: Good reliability was found for the test-retest (ICC = 0.6), and excellent reliability was found for the live-versus-video sessions (ICC = 0.92). Poor reliability was found for the interrater reliability (K α = .38). Conclusions: The good test-retest and high live-versus-video session reliability show that the FMS is a usable tool within 1 rater. However, the low interrater K α values suggest that the FMS within the limits of generalization should not be used indiscriminately to detect deficiencies that place the athlete at greater risk for injury. The FMS interrater reliability may be improved with better training for the rater.

Download Full-text

Development and reliability of the Korean version of the Feeding Abilities Assessment

Hong Kong Journal of Occupational Therapy ◽

10.1177/1569186119850694 ◽

2019 ◽

Vol 32 (1) ◽

pp. 69-74

Author(s):

Seul Gi Koo ◽

Hae Yean Park ◽

Jongbae Kim ◽

Areum Han

Keyword(s):

Correlation Coefficient ◽

Internal Consistency ◽

Content Validity ◽

Assessment Tool ◽

High Reliability ◽

Intraclass Correlation ◽

Rater Reliability ◽

Retest Reliability ◽

Korean Version ◽

Test Retest Reliability

Objective The purpose of this study is to introduce a standardised assessment tool by verifying the reliability of the translated Korean version of the Feeding Abilities Assessment (K-FAA), which was developed to suit Korean culture. Methods The research subjects were 65 patients with dementia living in nursing homes. The K-FAA was completed by verifying the suitability of translation and reverse translation. The validity of the K-FAA was established through content validity, while its reliability was analysed based on internal consistency reliability for the items, test–retest reliability and inter-rater reliability. Results The content validity index determined, based on the assessment of professors, occupational therapists, and nurses, was more than .70. Cronbach’s α was more than .929, showing good internal consistency. A test–retest reliability of .884 was derived using Pearson’s correlation coefficient (p < .01), and an inter-rater reliability of .800 was derived using the kappa coefficients; intraclass correlation coefficient was .897, which also indicated good reliability. Conclusion The K-FAA was modified to fit the Korean domestic situation, and this assessment had high reliability. Therefore, K-FAA can evaluate the feeding ability of patients with dementia. Future studies should focus on providing evidence-based data to maintain or supplement the feeding ability of patients with dementia in Korea.

Download Full-text

Fatigue numeric rating scale validity, discrimination and responder definition in patients with psoriatic arthritis

RMD Open ◽

10.1136/rmdopen-2019-000928 ◽

2020 ◽

Vol 6 (1) ◽

pp. e000928 ◽

Cited By ~ 2

Author(s):

Dafna Gladman ◽

Peter Nash ◽

Hitoshi Goto ◽

Julie A Birt ◽

Chen-Yen Lin ◽

...

Keyword(s):

Clinical Trials ◽

Clinical Practice ◽

Psoriatic Arthritis ◽

Psychometric Properties ◽

Rating Scale ◽

Numeric Rating Scale ◽

Routine Clinical Practice ◽

Patient Reported ◽

Test Retest Reliability ◽

Numeric Rating

ObjectivesThis study assessed the psychometric properties of the fatigue numeric rating scale (NRS) and sought to establish values for clinically meaningful change (responder definition).MethodsUsing disease-specific clinician-reported and patient-reported data from two randomised clinical trials of patients with psoriatic arthritis (PsA), the fatigue NRS was evaluated for test–retest reliability, construct validity and responsiveness. A responder definition was also explored using anchor-based and distribution-based methods.ResultsTest–retest reliability analyses supported the reproducibility of the fatigue NRS in patients with PsA (intraclass correlation coefficient=0.829). Mean (SD) values at baseline and week 2 were 5.7 (2.2) and 5.7 (2.4), respectively. Supporting construct validity of the fatigue NRS, moderate-to-large correlations with other assessments measuring similar concepts as measured by Sackett’s conventions were demonstrated. Fatigue severity was reduced when the underlying disease activity was improved and reductions remained consistent at week 12 and 24. A 3-point improvement was identified as being optimal for demonstrating a level of clinically meaningful improvement in fatigue NRS after 12–24 weeks of treatment.ConclusionsFatigue NRS is a valid and responsive patient-reported outcome instrument for use in patients with PsA. The established psychometric properties from this study support the use of fatigue NRS in clinical trials and in routine clinical practice. Robust validation of reliability for use in routine clinical practice in treating patients with active PsA in less active disease states and other more diverse ethnic groups is needed.

Download Full-text

Reliability and Validity of the Dyskinesia Impairment Scale in Children and Young Adults with Inherited or Idiopathic Dystonia

Journal of Clinical Medicine ◽

10.3390/jcm9082597 ◽

2020 ◽

Vol 9 (8) ◽

pp. 2597

Author(s):

Annika Danielsson ◽

Inti Vanmechelen ◽

Cecilia Lidbeck ◽

Lena Krumlinde-Sundholm ◽

Els Ortibus ◽

...

Keyword(s):

Young Adults ◽

Rating Scale ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Children And Youth ◽

Rater Reliability ◽

Retest Reliability ◽

Children And Young Adults ◽

Test Retest Reliability ◽

Idiopathic Dystonia

Background: The Dyskinesia Impairment Scale (DIS) is a new assessment scale for dystonia and choreoathetosis in children and youth with dyskinetic cerebral palsy. Today, the Burke–Fahn–Marsden Dystonia Rating Scale (BFM) is mostly used to assess dystonia in children with inherited dystonia. The aim of this study was to assess reliability and validity of the DIS in children and youth with inherited or idiopathic dystonia. Methods: Reliability was measured by (1) the intraclass correlation coefficients (ICCs) for inter-rater and test-retest reliability, as well as (2) standard error of measurement (SEM) and minimal detectable difference (MDD). For concurrent validity of the DIS-dystonia subscale, the BFM was administered. Results: In total, 11 males and 9 females (median age 16 years and 7 months, range 6 to 24 years) were included. For inter-rater reliability, the ICCs for the DIS total score and the dystonia and choreoathetosis subscale scores were 0.83, 0.87, and 0.71, respectively. For test-retest reliability, the ICCs for the DIS total score and the dystonia and choreoathetosis subscale scores were 0.95, 0.88, and 0.93, respectively. The SEM and MDD for the total DIS were 3.98% and 11.04%, respectively. The Spearman correlation coefficient between the dystonia subscale and the BFM was 0.88 (p < 0.01). Conclusions: Good to excellent inter-rater, test-retest reliability, and validity were found for the total DIS and the dystonia subscale. The choreoathetosis subscale showed moderate inter-rater reliability and excellent test-retest reliability. The DIS may be a promising tool to assess dystonia and choreoathetosis in children and young adults with inherited or idiopathic dystonia.

Download Full-text