Inter-rater Reliability of Ward Rating Scales

1974 ◽  
Vol 125 (586) ◽  
pp. 248-255 ◽  
Author(s):  
John N. Hall

Psychiatrists, psychologists, and nursing staff are increasingly making direct observations and ratings of ward behaviour. Characteristically, a nurse may be asked to complete a multi-item rating scale on a group of patients during the course of a drug trial. Several factors are involved in the choice of an appropriate scale for a particular purpose. Among these factors are the number of points per item, which defines the sensitivity to change of the item, and the total number of items in the scale, which affects the time taken to complete the scale and hence the frequency of rating that can be permitted in an assessment schedule.

2019 ◽  
Vol 5 (1) ◽  
pp. e000541 ◽  
Author(s):  
John Ressman ◽  
Wilhelmus Johannes Andreas Grooten ◽  
Eva Rasmussen Barr

Single leg squat (SLS) is a common tool used in clinical examination to set and evaluate rehabilitation goals, but also to assess lower extremity function in active people.ObjectivesTo conduct a review and meta-analysis on the inter-rater and intrarater reliability of the SLS, including the lateral step-down (LSD) and forward step-down (FSD) tests.DesignReview with meta-analysis.Data sourcesCINAHL, Cochrane Library, Embase, Medline (OVID) and Web of Science was searched up until December 2018.Eligibility criteriaStudies were eligible for inclusion if they were methodological studies which assessed the inter-rater and/or intrarater reliability of the SLS, FSD and LSD through observation of movement quality.ResultsThirty-one studies were included. The reliability varied largely between studies (inter-rater: kappa/intraclass correlation coefficients (ICC) = 0.00–0.95; intrarater: kappa/ICC = 0.13–1.00), but most of the studies reached ‘moderate’ measures of agreement. The pooled results of ICC/kappa showed a ‘moderate’ agreement for inter-rater reliability, 0.58 (95% CI 0.50 to 0.65), and a ‘substantial’ agreement for intrarater reliability, 0.68 (95% CI 0.60 to 0.74). Subgroup analyses showed a higher pooled agreement for inter-rater reliability of ≤3-point rating scales while no difference was found for different numbers of segmental assessments.ConclusionOur findings indicate that the SLS test including the FSD and LSD tests can be suitable for clinical use regardless of number of observed segments and particularly with a ≤3-point rating scale. Since most of the included studies were affected with some form of methodological bias, our findings must be interpreted with caution.PROSPERO registration numberCRD42018077822.


Author(s):  
Earl S. Stein ◽  
Randy L. Sollenberger

This paper describes a study that evaluated the reliability of a recently developed rating form designed to assess air traffic controller performance. Six supervisors from different radar approach control facilities nationwide viewed 20 video tapes of controllers working traffic from a previously recorded simulation study. The observer/raters used a new evaluation form that consisted of 24 different rating scales measuring specific areas of controller performance. An important part of this study was observer training. The training consisted of practice rating sessions followed by group discussions. In discussion, observers established mutual evaluation criteria for each performance area. Inter-rater reliability was assessed using intraclass correlations, and intra-rater reliability was assessed using Pearson product-moment correlations on repeated video tapes. In general, the reliability of the form was quite good, however, a few rating scales were much less reliable than the others. Reasons for the differences in rating scale reliability are discussed.


2002 ◽  
Vol 180 (1) ◽  
pp. 45-50 ◽  
Author(s):  
Peter F. Liddle ◽  
Elton T. C. Ngan ◽  
Gary Duffield ◽  
King Kho ◽  
Anthony J. Warren

BackgroundIn the rating scales commonly used for assessing response to antipsychotic treatment, individual items embrace symptoms that apparently arise from distinguishable pathophysiological processes and might be expected to respond differently to treatment.AimsTo test the reliability sensitivity to change and factor structure of a new scale for the assessment of the Signs and Symptoms of Psychotic Illness (the SSPI).MethodInterrater reliability was evaluated by determining the intraclass correlation for the ratings of 63 patients. Sensitivity to change was assessed in a longitudinal study of 33 patients. Factor structure was determined from scores for 155 patients.ResultsThe intraclass correlation was satisfactory for all individual items and excellent for the total score. Scores were sensitive to change. A change in Clinical Global Impression of one unit corresponded to an SSPI total score change of 31%. Factor analysis revealed five clusters of symptoms.ConclusionsThe SSPI provides a sensitive and reliable measure of the five major clusters of symptoms that occur commonly in psychotic illness.


1976 ◽  
Vol 129 (5) ◽  
pp. 452-456 ◽  
Author(s):  
Domenic V. Cicchetti

SummaryThis paper extends the recent work of Hall (1974) by presenting the minimal sample sizes and the specific linear agreement weights required for assessing the reliability of rating scales commonly used in neuropsychiatric and other clinico-medicai settings. The weights are shown to vary as a function of (a) whether or not the rating scale contains a point of ‘absence’, and (b) the number of ordinal points on the scale.


2012 ◽  
Vol 21 (4) ◽  
pp. 136-143
Author(s):  
Lynn E. Fox

Abstract The self-anchored rating scale (SARS) is a technique that augments collaboration between Augmentative and Alternative Communication (AAC) interventionists, their clients, and their clients' support networks. SARS is a technique used in Solution-Focused Brief Therapy, a branch of systemic family counseling. It has been applied to treating speech and language disorders across the life span, and recent case studies show it has promise for promoting adoption and long-term use of high and low tech AAC. I will describe 2 key principles of solution-focused therapy and present 7 steps in the SARS process that illustrate how clinicians can use the SARS to involve a person with aphasia and his or her family in all aspects of the therapeutic process. I will use a case study to illustrate the SARS process and present outcomes for one individual living with aphasia.


2006 ◽  
Vol 22 (4) ◽  
pp. 259-267 ◽  
Author(s):  
Eelco Olde ◽  
Rolf J. Kleber ◽  
Onno van der Hart ◽  
Victor J.M. Pop

Childbirth has been identified as a possible traumatic experience, leading to traumatic stress responses and even to the development of posttraumatic stress disorder (PTSD). The current study investigated the psychometric properties of the Dutch version of the Impact of Event Scale-Revised (IES-R) in a group of women who recently gave birth (N = 435). In addition, a comparison was made between the original IES and the IES-R. The scale showed high internal consistency (α = 0.88). Using confirmatory factor analysis no support was found for a three-factor structure of an intrusion, an avoidance, and a hyperarousal factor. Goodness of fit was only reasonable, even after fitting one intrusion item on the hyperarousal scale. The IES-R correlated significantly with scores on depression and anxiety self-rating scales, as well as with scores on a self-rating scale of posttraumatic stress disorder. Although the IES-R can be used for studying posttraumatic stress reactions in women who recently gave birth, the original IES proved to be a better instrument compared to the IES-R. It is concluded that adding the hyperarousal scale to the IES-R did not make the scale stronger.


Methodology ◽  
2011 ◽  
Vol 7 (3) ◽  
pp. 88-95 ◽  
Author(s):  
Jose A. Martínez ◽  
Manuel Ruiz Marín

The aim of this study is to improve measurement in marketing research by constructing a new, simple, nonparametric, consistent, and powerful test to study scale invariance. The test is called D-test. D-test is constructed using symbolic dynamics and symbolic entropy as a measure of the difference between the response patterns which comes from two measurement scales. We also give a standard asymptotic distribution of our statistic. Given that the test is based on entropy measures, it avoids smoothed nonparametric estimation. We applied D-test to a real marketing research to study if scale invariance holds when measuring service quality in a sports service. We considered a free-scale as a reference scale and then we compared it with three widely used rating scales: Likert-type scale from 1 to 5 and from 1 to 7, and semantic-differential scale from −3 to +3. Scale invariance holds for the two latter scales. This test overcomes the shortcomings of other procedures for analyzing scale invariance; and it provides researchers a tool to decide the appropriate rating scale to study specific marketing problems, and how the results of prior studies can be questioned.


2021 ◽  
pp. 001698622098594
Author(s):  
Nielsen Pereira

The purpose of this study was to investigate the validity of the HOPE Scale for identifying gifted English language learners (ELs) and how classroom and English as a second language (ESL) teacher HOPE Scale scores differ. Seventy teachers completed the HOPE Scale on 1,467 students in grades K-5 and four ESL teachers completed the scale on 131 ELs. Measurement invariance tests indicated that the HOPE Scale yields noninvariant latent means across EL and English proficient (EP) samples. However, confirmatory factor analysis results support the use of the scale with ELs or EP students separately. Results also indicate that the rating patterns of classroom and ESL teachers were different and that the HOPE Scale does not yield valid data when used by ESL teachers. Caution is recommended when using the HOPE Scale and other teacher rating scales to compare ELs to EP students. The importance of invariance testing before using an instrument with a population that is different from the one(s) for which the instrument was developed is discussed.


Assessment ◽  
2021 ◽  
pp. 107319112199646
Author(s):  
Olivia Gratz ◽  
Duncan Vos ◽  
Megan Burke ◽  
Neelkamal Soares

To date, there is a paucity of research conducting natural language processing (NLP) on the open-ended responses of behavior rating scales. Using three NLP lexicons for sentiment analysis of the open-ended responses of the Behavior Assessment System for Children-Third Edition, the researchers discovered a moderately positive correlation between the human composite rating and the sentiment score using each of the lexicons for strengths comments and a slightly positive correlation for the concerns comments made by guardians and teachers. In addition, the researchers found that as the word count increased for open-ended responses regarding the child’s strengths, there was a greater positive sentiment rating. Conversely, as word count increased for open-ended responses regarding child concerns, the human raters scored comments more negatively. The authors offer a proof-of-concept to use NLP-based sentiment analysis of open-ended comments to complement other data for clinical decision making.


2021 ◽  
Vol 10 (14) ◽  
pp. 3056
Author(s):  
Ada Holak ◽  
Michał Czapla ◽  
Marzena Zielińska

Background: The all-too-frequent failure to rate pain intensity, resulting in the lack of or inadequacy of pain management, has long ceased to be an exclusive problem of the young patient, becoming a major public health concern. This study aimed to evaluate the methods used for reducing post-traumatic pain in children and the frequency of use of such methods. Additionally, the methods of pain assessment and the frequency of their application in this age group were analysed. Methods: A retrospective analysis of 2452 medical records of emergency medical teams dispatched to injured children aged 0–18 years in the area around Warsaw (Poland). Results: Of all injured children, 1% (20 out of 2432) had their pain intensity rated, and the only tool used for this assessment was the numeric rating scale (NRS). Children with burns most frequently received a single analgesic drug or cooling (56.2%), whereas the least frequently used method was multimodal treatment combining pharmacotherapy and cooling (13.5%). Toddlers constituted the largest percentage of patients who were provided with cooling (12%). Immobilisation was most commonly used in adolescents (29%) and school-age children (n = 186; 24%). Conclusions: Low frequency of pain assessment emphasises the need to provide better training in the use of various pain rating scales and protocols. What is more, non-pharmacological methods (cooling and immobilisation) used for reducing pain in injured children still remain underutilized.


Sign in / Sign up

Export Citation Format

Share Document