The Neurobehavioral Rating Scale: An Interrater Reliability Study in the HIV Seropositive Population

A clinician should not rely entirely upon a caregiver's report regarding behavioral pathology when planning a treatment strategy. Direct observational evaluation instruments as well as caregiver-based assessments are necessary. A new scale for the empirical (observational) evaluation of behavioral symptoms in Alzheimer's disease (AD) and related dementias, the Empirical Behavioral Pathology in Alzheimer's Disease Rating Scale (E-BEHAVEAD) was developed. Interrater reliability of this new assessment instrument was examined. Additionally, the relationship between the observed occurrence of behavioral symptomatology on this new rating instrument was compared with the occurrence using a similarly designed, caregiver-based instrument. The interrater reliability study consisted of two raters who simultaneously evaluated 20 dementia patients. The comparative study employed a cross-sectional design (N = 49). Individuals were evaluated in an outpatient clinic setting. The study population consisted of cognitively normal individuals and dementia patients. Evaluations included the new, observationally based behavioral assessment (the E-BEHAVE-AD), a caregiver-based behavioral assessment (the Behavioral Pathology in Alzheimer's Disease Rating Scale; BEHAVE-AD), a clinical global measure (the Global Deterioration Scale), and a mental status assessment (the Mini-Mental State Examination). The interrater reliability study revealed an intraclass correlation coefficient of .97 (p < .01) for total scores on the new E-BEHAVE-AD rating scale. The correlation coefficient for the amount of agreement on the presence of symptoms in six symptomatic categories between caregiver-based information about the patient's behavioral pathology assessed on the BEHAVE-AD and the clinician's observations assessed with the new E-BEHAVE-AD rating instrument was .51 (p < .01). The New E-BEHAVE-AD rating instrument showed excellent interrater reliability. Furthermore, there was a statistically significant relationship between clinician observation of the occurrence of behavioral pathology assessed using the E-BEHAVE-AD and caregive-reported pathology assessed with the BEHAVE-AD. However the magnitude of the correlation between these measures indicated that the majority of variance was independent and nonoverlapping. Consequently, these data support theoretical models suggesting that the assessment of behavioral pathology in dementia might ideally encompass both direct observational and caregiver-report approaches, using measures such as the E-BEHAVE-AD as well as measures such as the BEHAVE-AD.

Download Full-text

Interrater reliability of a method to assess hypothalamic involvement in pediatric adamantinomatous craniopharyngioma

Journal of Neurosurgery Pediatrics ◽

10.3171/2019.8.peds19295 ◽

2020 ◽

Vol 25 (1) ◽

pp. 37-42 ◽

Cited By ~ 1

Author(s):

Ros Whelan ◽

Eric Prince ◽

David M. Mirsky ◽

Robert Naftel ◽

Aashim Bhatia ◽

...

Keyword(s):

Quality Of Life ◽

Brain Tumors ◽

Interrater Reliability ◽

Statistical Evaluation ◽

Grading System ◽

Reliability Study ◽

Postoperative Mri ◽

Mri Scans ◽

Adamantinomatous Craniopharyngioma

OBJECTIVEPediatric adamantinomatous craniopharyngiomas (ACPs) are histologically benign brain tumors that confer significant neuroendocrine morbidity. Previous studies have demonstrated that injury to the hypothalamus is associated with worsened quality of life and a shorter lifespan. This insight helps many surgeons define the goals of surgery for patients with ACP. Puget and colleagues proposed a 3-tiered preoperative and postoperative grading system based on the degree of hypothalamic involvement identified on MRI. In a prospective cohort from their institution, the authors found that use of the system to guide operative goals was associated with decreased morbidity. To date, however, the Puget system has not been externally validated. Here, the authors present an interrater reliability study that assesses the generalizability of this system for surgeons planning initial operative intervention for children with craniopharyngiomas.METHODSA panel of 6 experts, consisting of pediatric neurosurgeons and pediatric neuroradiologists, graded 30 preoperative and postoperative MRI scans according to the Puget system. Interrater reliability was calculated using Fleiss’ κ and Krippendorff’s α statistics.RESULTSInterrater reliability in the preoperative context demonstrated moderate agreement (κ = 0.50, α = 0.51). Interrater reliability in the postoperative context was 0.27 for both methods of statistical evaluation.CONCLUSIONSInterrater reliability for the system as defined is moderate. Slight refinements of the Puget MRI grading system, such as collapsing the 3 grades into 2, may improve its reliability, making the system more generalizable.

Download Full-text

Using Differential Item Functioning to Test for Interrater Reliability in Constructed Response Items

Educational and Psychological Measurement ◽

10.1177/0013164419899731 ◽

2020 ◽

Vol 80 (4) ◽

pp. 808-820

Author(s):

Cindy M. Walker ◽

Sakine Göçer Şahin

Keyword(s):

Differential Item Functioning ◽

Interrater Reliability ◽

Rating Scales ◽

Rating Scale ◽

Intraclass Correlation ◽

Kappa Statistic ◽

Promising Alternative ◽

Constructed Response ◽

Polytomous Item ◽

Item Functioning

The purpose of this study was to investigate a new way of evaluating interrater reliability that can allow one to determine if two raters differ with respect to their rating on a polytomous rating scale or constructed response item. Specifically, differential item functioning (DIF) analyses were used to assess interrater reliability and compared with traditional interrater reliability measures. Three different procedures that can be used as measures of interrater reliability were compared: (1) intraclass correlation coefficient (ICC), (2) Cohen’s kappa statistic, and (3) DIF statistic obtained from Poly-SIBTEST. The results of this investigation indicated that DIF procedures appear to be a promising alternative to assess the interrater reliability of constructed response items, or other polytomous types of items, such as rating scales. Furthermore, using DIF to assess interrater reliability does not require a fully crossed design and allows one to determine if a rater is either more severe, or more lenient, in their scoring of each individual polytomous item on a test or rating scale.

Download Full-text

Speech-Language Pathologists' Ratings of Speech Accuracy in Children With Speech Sound Disorders

American Journal of Speech-Language Pathology ◽

10.1044/2021_ajslp-20-00381 ◽

2021 ◽

pp. 1-12

Author(s):

Linye Jing ◽

Maria I. Grigos

Keyword(s):

Interrater Reliability ◽

Rating Scales ◽

Rating Scale ◽

Speech Sound ◽

Future Research ◽

Training Procedure ◽

Speech Sound Disorders ◽

Speech Language Pathologists ◽

Point Rating Scale ◽

Whole Word

Purpose: Forming accurate and consistent speech judgments can be challenging when working with children with speech sound disorders who produce a large number and varied types of error patterns. Rating scales offer a systematic approach to assessing the whole word rather than individual sounds. Thus, these scales can be an efficient way for speech-language pathologists (SLPs) to monitor treatment progress. This study evaluated the interrater reliability of an existing 3-point rating scale using a large group of SLPs as raters. Method: Utilizing an online platform, 30 SLPs completed a brief training and then rated single words produced by children with typical speech patterns and children with speech sound disorders. Words were closely balanced across the three rating categories of the scale. The interrater reliability of the SLPs ratings to a consensus judgment was examined. Results: The majority of SLPs (87%) reached substantial interrater reliability to a consensus judgment using the 3-point rating scale. Correct productions had the highest interrater reliability. Productions with extensive errors had higher agreement than those with minor errors. Certain error types, such as vowel distortions, were especially challenging for SLPs to judge. Conclusions: This study demonstrated substantial interrater reliability to a consensus judgment among a large majority of 30 SLPs using a 3-point rating. The clinical implications of the findings are discussed along with proposed modifications to the training procedure to guide future research.

Download Full-text

Electromyographic analysis of maximal voluntary contraction of female pelvic floor muscles: Intrarater and interrater reliability study

Neurourology and Urodynamics ◽

10.1002/nau.24834 ◽

2021 ◽

Author(s):

Maria P. Volpato ◽

Michele Menezes ◽

Tirza Sathler Prado ◽

Adriana Piccini ◽

Arthur Sá Ferreira ◽

...

Keyword(s):

Pelvic Floor ◽

Maximal Voluntary Contraction ◽

Interrater Reliability ◽

Voluntary Contraction ◽

Pelvic Floor Muscles ◽

Reliability Study ◽

Female Pelvic Floor ◽

Electromyographic Analysis

Download Full-text

Characteristics Explaining Performance in Downhill Mountain Biking

International Journal of Sports Physiology and Performance ◽

10.1123/ijspp.2014-0135 ◽

2015 ◽

Vol 10 (2) ◽

pp. 183-190 ◽

Cited By ~ 9

Author(s):

Joel B. Chidley ◽

Alexandra L. MacGregor ◽

Caoimhe Martin ◽

Calum A. Arthur ◽

Jamie H. Macdonald

Keyword(s):

Aerobic Capacity ◽

Interrater Reliability ◽

Ventilatory Threshold ◽

Anaerobic Power ◽

Important Variable ◽

Mountain Biking ◽

Reliability Study ◽

Self Confidence ◽

Group 2 ◽

Specific Contribution

Purpose:To identify physiological, psychological, and skill characteristics that explain performance in downhill (DH) mountainbike racing.Methods:Four studies were used to (1) identify factors potentially contributing to DH performance (using an expert focus group), (2) develop and validate a measure of rider skill (using video analysis and expert judge evaluation), (3) evaluate whether physiological, psychological, and skill variables contribute to performance at a DH competition, and (4) test the specific contribution of aerobic capacity to DH performance.Results:Study 1 identified aerobic capacity, handgrip endurance, anaerobic power, rider skill, and self-confidence as potentially important for DH. In study 2 the rider-skill measure displayed good interrater reliability. Study 3 found that rider skill and handgrip endurance were significantly related to DH ride time (β = –0.76 and –0.14, respectively; R2 = .73), with exploratory analyses suggesting that DH ride time may also be influenced by self-confidence and aerobic capacity. Study 4 confirmed aerobic capacity as an important variable influencing DH performance (for a DH ride, mean oxygen uptake was 49 ± 5 mL · kg−1 · min−1, and 90% of the ride was completed above the 1st ventilatory threshold).Conclusions:In order of importance, rider skill, handgrip endurance, self-confidence, and aerobic capacity were identified as variables influencing DH performance. Practically, this study provides a novel assessment of rider skill that could be used by coaches to monitor training and identify talent. Novel intervention targets to enhance DH performance were also identified, including self-confidence and aerobic capacity.

Download Full-text

An Intercenter Comparison of Nasolabial Appearance Including a Center Using Nasoalveolar Molding

The Cleft Palate-Craniofacial Journal ◽

10.1177/1055665618754947 ◽

2018 ◽

Vol 55 (5) ◽

pp. 655-663 ◽

Cited By ~ 3

Author(s):

Supakit Peanchitlertkajorn ◽

Ana Mercado ◽

John Daskalogiannakis ◽

Ronald Hathaway ◽

Kathleen Russell ◽

...

Keyword(s):

Interrater Reliability ◽

Cleft Lip ◽

Rating Scale ◽

Cleft Lip And Palate ◽

Nasal Reconstruction ◽

Kappa Statistics ◽

Symmetry Center ◽

Specific Protocol ◽

Nasoalveolar Molding ◽

Presurgical Orthopedics

Objective: To compare nasolabial appearance outcomes of patients with complete unilateral cleft lip and palate (CUCLP) in preadolescence from 4 cleft centers including a center using nasoalveolar molding (NAM) and primary nasal reconstruction. Design: Retrospective cohort study. Setting: Four cleft centers in North America. Patients: 135 subjects with repaired CUCLP. Methods: Frontal and profile facial pictures were assessed using the Asher-McDade rating scale. Intra- and interrater reliability were tested using weighted Kappa statistics. Median scores by center were compared with Kruskal-Wallis statistics. Results: Intrarater reliability scores were moderate to good. Interrater reliability scores were moderate. Significant differences ( P < .05) among centers were found. For nasal form, center G (median = 2.83) had better scores than centers C and D (C median = 3.33, D median = 3.17). For nose symmetry, center G had better scores (median = 2.33) than all other centers (B median = 2.67, C median = 2.83, D median = 2.83). For vermillion border, center G had better scores (median = 2.58) than centers B and C (B median = 3.17, C median = 3.17). For nasolabial profile, center G (median score = 2.67) had better scores than center C (median = 3.00). For total nasolabial score, center G (median = 2.67) had better scores than all other centers (B median = 2.83, C median = 3, D median = 2.83). Conclusion: The protocol followed by center G, the only center that performed NAM and primary nasal reconstruction, produced better results in all categories when compared to center C, the only center that did not perform presurgical orthopedics or lip/nose revisions. When compared to centers that performed traditional presurgical orthopedics and surgical revisions (B and D), center G was not consistently better in all categories. As with other uncontrolled, retrospective intercenter studies, it is not possible to attribute the outcomes to a specific protocol component.

Download Full-text

Self-Managed Surveillance for Breast Cancer–Related Upper Body Issues: A Feasibility and Reliability Study

Physical Therapy ◽

10.1093/ptj/pzz181 ◽

2020 ◽

Vol 100 (3) ◽

pp. 468-476 ◽

Cited By ~ 2

Author(s):

Bolette S Rafn ◽

Chiara A Singh ◽

Julie Midtgaard ◽

Pat G Camp ◽

Margaret L McNeely ◽

...

Keyword(s):

Breast Cancer ◽

Interrater Reliability ◽

Physical Therapist ◽

Retention Rates ◽

Upper Body ◽

Intrarater Reliability ◽

Prospective Surveillance ◽

Reliability Study ◽

Shoulder Flexion ◽

Arm Circumference

Abstract Background Early identification of breast cancer–related upper body issues is important to enable timely physical therapist treatment. Objective This study evaluated the feasibility and reliability of women performing self-managed prospective surveillance for upper body issues in the early postoperative phase as part of a hospital-based physical therapy program. Design This was a prospective, single-site, single-group feasibility and reliability study. Methods Presurgery arm circumference measurements were completed at home and at the hospital by participants and by a physical therapist. Instruction in self-measurement was provided using a video guide. After surgery, all circumference measurements were repeated along with self-assessment and therapist assessment for shoulder flexion and abduction active range of motion. Feasibility was determined by recruitment/retention rates and participant-reported ease of performing self-measurements (1 [very difficult] to 10 [very easy]). Reliability was determined as intrarater reliability, interrater reliability, and agreement. Results Thirty-three women who were 53.4 (SD = 11.4) years old participated, with recruitment and retention rates of 79% and 94%, respectively. Participant-reported ease of measurement was 8.2 (SD = 2.2) before surgery and 8.0 (SD = 1.9) after surgery. The intrarater reliability and interrater reliability were excellent before surgery (intraclass correlation coefficient [ICC] ≥ 0.94; 95% confidence interval = 0.87–0.97) and after surgery (ICC ≥ 0.91; 95% confidence interval = 0.76–0.96). Agreement between self-assessed and therapist-assessed active shoulder flexion (κ = 0.79) and abduction (κ = 0.71) was good. Limitations Further testing is needed using a prospective design with a longer follow-up to determine whether self-managed prospective surveillance and timely treatment can hinder the development of chronic breast cancer–related upper body issues Conclusions Self-measured arm circumference and shoulder range of motion are reliable, and their inclusion in a hospital-based program of prospective surveillance for upper body issues seems feasible. This approach may improve early detection and treatment

Download Full-text