Measuring functional outcomes in stroke trials: Improving inter-rater reliability of the Modified Rankin Scale using a structured interview

P227 Background & aims: The Modified Rankin Scale (MRS) (van Swieten et al, 1988) is widely used in clinical trials to rate disability and handicap after stroke. Although the MRS is a popular measure of functional outcome the categories of the scale are very broadly defined and open to interpretation by raters. Previous work with the Glasgow Outcome Scale indicates that the reliability of functional rating scales may be improved by use of a structured interview (Wilson et al, 1998). The purpose of the present study was to compare the inter-rater reliability of the conventional MRS with the inter-rater reliability of a newly developed structured interview for the MRS (MRS-SI) Methods: A structured interview was devised for the MRS covering five areas of everyday function. 63 patients with stable functional state after stroke (stroke 6 to 24 months previously) were recruited to the study and scored on the conventional MRS by two independent observers. These observers then underwent training in use of the MRS-SI. Eight weeks after the first assessment the same observers reassessed 58 of these patients using the MRS-SI. Results: To allow comparison between the assessments the analysis of results was restricted to the 58 patients who were rated on both the MRS and MRS-SI. Inter-rater reliability was measured using the kappa statistic (unweighted and weighted using quadratic weights). For the MRS, overall agreement between the two raters was 57% (unweighted kappa 0.44, weighted kappa 0.78); using the MRS-SI, overall agreement was 78% (unweighted kappa 0.70, weighted kappa 0.93). Conclusions: Variability between raters in assigning patients to Rankin grades appears to be reduced when using a structured interview for the Modified Rankin Scale. The use of the MRS-SI could potentially improve the quality of results from clinical studies in stroke. A multi-centre study to further establish the improvement in inter-rater reliability is ongoing.

Download Full-text

Are the Naranjo Criteria Reliable and Valid for Determination of Adverse Drug Reactions in the Intensive Care Unit?

Annals of Pharmacotherapy ◽

10.1345/aph.1g177 ◽

2005 ◽

Vol 39 (11) ◽

pp. 1823-1827 ◽

Cited By ~ 32

Author(s):

Sandra L Kane-Gill ◽

Levent Kirisci ◽

Dev S Pathak

Keyword(s):

Intensive Care Unit ◽

Intensive Care ◽

Adverse Drug Reactions ◽

Reliability And Validity ◽

Kappa Statistic ◽

Weighted Kappa ◽

Drug Reactions ◽

Rater Reliability ◽

Cronbach Alpha

BACKGROUND The Naranjo criteria are frequently used for determination of causality for suspected adverse drug reactions (ADRs); however, the psychometric properties have not been studied in the critically ill. OBJECTIVE To evaluate the reliability and validity of the Naranjo criteria for ADR determination in the intensive care unit (ICU). METHODS All patients admitted to a surgical ICU during a 3-month period were enrolled. Four raters independently reviewed 142 suspected ADRs using the Naranjo criteria (review 1). Raters evaluated the 142 suspected ADRs 3–4 weeks later, again using the Naranjo criteria (review 2). Inter-rater reliability was tested using the kappa statistic. The weighted kappa statistic was calculated between reviews 1 and 2 for the intra-rater reliability of each rater. Cronbach alpha was computed to assess the inter-item consistency correlation. The Naranjo criteria were compared with expert opinion for criterion validity for each rater and reported as a Spearman rank (rs) coefficient. RESULTS The kappa statistic ranged from 0.14 to 0.33, reflecting poor inter-rater agreement. The weighted kappa within raters was 0.5402–0.9371. The Cronbach alpha ranged from 0.443 to 0.660, which is considered moderate to good. The rs coefficient range was 0.385–0.545; all rs coefficients were statistically significant (p < 0.05). CONCLUSIONS Inter-rater reliability is marginal; however, within-rater evaluation appears to be consistent. The inter-item correlation is expected to be higher since all questions pertain to ADRs. Overall, the Naranjo criteria need modification for use in the ICU to improve reliability, validity, and clinical usefulness.

Download Full-text

Cross-cultural validation of the Brief Social Phobia Scale for use in Portuguese and the development of a structured interview guide

Brazilian Journal of Psychiatry ◽

10.1590/s1516-44462006000300014 ◽

2006 ◽

Vol 28 (3) ◽

pp. 212-217 ◽

Cited By ~ 9

Author(s):

Flávia de Lima Osório ◽

José Alexandre de Souza Crippa ◽

Sonia Regina Loureiro

Keyword(s):

Anxiety Disorders ◽

Social Phobia ◽

Rating Scales ◽

Structured Interview ◽

Rater Reliability ◽

Language Version ◽

General Evaluation ◽

Reliability Method ◽

Interview Guide ◽

Translation And Validation

OBJECTIVE: To present the translation and validation of the Brief Social Phobia Scale for use in Brazilian Portuguese, to develop a structured interview guide in order to systemize its use and to perform a preliminary study of inter-rater reliability. METHOD: The instrument was translated and adapted to Portuguese by specialists in anxiety disorders and rating scales. A structured interview guide was created with the aim of covering all of the items of the instrument and grouping them into six categories. Specialists in mental health evaluated the guide. These professionals also watched the videotaped interviews of patients with and without social anxiety disorders, and, based on the interview guide, they rated the scale to evaluate its reliability. RESULTS: No semantic or linguistic adjustments were needed. For the complete scale, the general evaluation showed a percentage of agreement of 0.84 and intraclass coefficient of 0.91. The mean inter-rater correlation was 0.84. CONCLUSIONS: The Portuguese-language version of the Brief Social Phobia Scale is available for use in the Brazilian population, with rather acceptable indicators of inter-rater reliability. The interview guide was useful in providing these values. Further studies are needed in order to improve the reliability and to study other psychometric properties of the instrument.

Download Full-text

A Comparison of Reliability Coefficients for Ordinal Rating Scales

Journal of Classification ◽

10.1007/s00357-021-09386-5 ◽

2021 ◽

Author(s):

Alexandra de Raadt ◽

Matthijs J. Warrens ◽

Roel J. Bosker ◽

Henk A. L. Kiers

Keyword(s):

Empirical Data ◽

Rating Scales ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Weighted Kappa ◽

Rater Reliability ◽

Intraclass Correlations ◽

Applied Researcher ◽

Highly Correlated ◽

Reliability Coefficients

AbstractKappa coefficients are commonly used for quantifying reliability on a categorical scale, whereas correlation coefficients are commonly applied to assess reliability on an interval scale. Both types of coefficients can be used to assess the reliability of ordinal rating scales. In this study, we compare seven reliability coefficients for ordinal rating scales: the kappa coefficients included are Cohen’s kappa, linearly weighted kappa, and quadratically weighted kappa; the correlation coefficients included are intraclass correlation ICC(3,1), Pearson’s correlation, Spearman’s rho, and Kendall’s tau-b. The primary goal is to provide a thorough understanding of these coefficients such that the applied researcher can make a sensible choice for ordinal rating scales. A second aim is to find out whether the choice of the coefficient matters. We studied to what extent we reach the same conclusions about inter-rater reliability with different coefficients, and to what extent the coefficients measure agreement in a similar way, using analytic methods, and simulated and empirical data. Using analytical methods, it is shown that differences between quadratic kappa and the Pearson and intraclass correlations increase if agreement becomes larger. Differences between the three coefficients are generally small if differences between rater means and variances are small. Furthermore, using simulated and empirical data, it is shown that differences between all reliability coefficients tend to increase if agreement between the raters increases. Moreover, for the data in this study, the same conclusion about inter-rater reliability was reached in virtually all cases with the four correlation coefficients. In addition, using quadratically weighted kappa, we reached a similar conclusion as with any correlation coefficient a great number of times. Hence, for the data in this study, it does not really matter which of these five coefficients is used. Moreover, the four correlation coefficients and quadratically weighted kappa tend to measure agreement in a similar way: their values are very highly correlated for the data in this study.

Download Full-text

The reliability and validity of a slightly revised Chinese version simplified modified Rankin scale questionnaire

10.21203/rs.2.13565/v3 ◽

2020 ◽

Author(s):

Junliang yuan ◽

yunxiao wang ◽

Wenli Hu ◽

Askiel Bruno

Keyword(s):

Ischemic Stroke ◽

Reliability And Validity ◽

Weighted Kappa ◽

Chinese Version ◽

The Novel ◽

Modified Rankin Scale ◽

English Version ◽

Stroke Patients ◽

Rater Reliability ◽

Percent Agreement

Abstract Background The slightly revised English version simplified modified Rankin scale questionnaire smRSq(2011) was shown to be reliable, valid, and useful in scoring the modified Rankin scale (mRS) after stroke. Our aim was to assess the inter-rater reliability and validity of a novel Chinese version smRSq(2011). Methods The English version smRSq(2011) was translated into Chinese by a standard process. We recruited 300 consecutive hospitalized ischemic stroke patients in the department of neurology, Beijing Chaoyang Hospital. Six randomly paired raters scored the conventional mRS, the novel Chinese version smRSq(2011), the National Institutes of Health Stroke Scale (NIHSS), and the Barthel index (BI) in-person. Inter-rater reliability and validity were assessed. Results Among the 300 ischemic stroke patients, mean age was 64.9±12.1 years, and 220 (73%) were male. For inter-rater reliability of the smRSq(2011), the percent agreement among the paired raters was 87%, the kappa (κ) was 0.84 (95% CI, 0.79-0.88), and the weighted kappa (κw) was 0.96 (95% CI, 0.95-0.98). The percent agreement between the smRSq(2011) scores by the first rater and the conventional mRS scores by the second rater in each pair was 55%, κ=0.47 (95% CI, 0.40-0.54), and κw=0.91 (95% CI, 0.89-0.93). In construct validity testing, the Spearman’s correlation coefficients comparing the smRSq(2011) scores by the first rater with the NIHSS and the BI scores by the second rater were 0.83 (P<0.001) and -0.86 (P<0.001), respectively. Conclusions Our results show good clinimetric properties of the novel Chinese version smRSq(2011) in scoring the mRS in Chinese stroke patients. Further validation in other clinical settings, including in communities and by remote methods in China is warranted.

Download Full-text

Inter-rater reliability of two depression rating scales, MADRS and DRRS, based on videotape records of structured interviews

European Psychiatry ◽

10.1016/s0924-9338(98)80032-1 ◽

1998 ◽

Vol 13 (5) ◽

pp. 264-266 ◽

Cited By ~ 5

Author(s):

E Corruble ◽

D Purper ◽

C Payan ◽

JD Guelfi

Keyword(s):

Rating Scales ◽

Antidepressant Treatment ◽

Correlation Coefficients ◽

Structured Interview ◽

Structured Interviews ◽

Rater Reliability ◽

Intra Class Correlation ◽

Videotape Recording

SummaryThe inter-rater reliability of the French versions of the MADRS and the DRRS was studied on the basis of 58 videotape records of structured standardised interviews of depressed inpatients under antidepressant treatment. Each patient was assessed by two trained raters, from the same videotape recording. The inter-rater reliability of total scores was high with both scales (intra-class correlation coefficients: 0.86 for MADRS and 0.77 for DRRS). However, the inter-rater reliability for individual items was higher and more homogeneous for the MADRS than for the DRRS. Finally, the structured interview in French appears to be relevant for the MADRS, but it should be improved for the DRRS.

Download Full-text

For Non-expert Clinical Searches, Google Scholar Results are Older with Higher Impact while PubMed Results Offer More Breadth

Evidence Based Library and Information Practice ◽

10.18438/b88609 ◽

2013 ◽

Vol 8 (2) ◽

pp. 254

Author(s):

Carol Perryman

Keyword(s):

Kappa Statistic ◽

Weighted Kappa ◽

Subjective Assessment ◽

Google Scholar ◽

Weighted Kappa Statistic ◽

Ranking Algorithms ◽

Content Relevance ◽

Texas Tech University ◽

Article Quality

Objectives – To compare PubMed and Google Scholar results for content relevance and article quality Design – Bibliometric study. Setting – Department of Internal Medicine at Texas Tech University Health Sciences Center. Methods – Four clinical searches were conducted in both PubMed and Google Scholar. Search methods were described as “real world” (p. 216) behaviour, with the searchers familiar with content, though not expert at retrieval techniques. The first 20 results from each search were evaluated for relevance to the initial question, as well as for quality. Relevance was determined based on one author’s subjective assessment of information in the title and abstract, when available, and then tested by two other authors, with discrepancies discussed and resolved. Items were assigned to one of three categories: relevant, possibly relevant, and not relevant to the question, with reviewer agreement measured using a weighted kappa statistic. The quality of items found to be ‘relevant’ and ‘possibly relevant’ was measured by impact factor ratings from Thomsen Reuters (ISI) Web of Knowledge, when available, as well as information obtained by SCOPUS on the number of times items were cited. Main Results – Google Scholar results were judged to be more relevant and of higher quality than results obtained from PubMEed. Google Scholar results are also older on average, while PubMed retrieved items from a larger number of unique journals. Conclusion – In agreement with earlier research, the authors recommended that searchers use both PubMed and Google Scholar to improve on the quality and relevance of results. Searches in the two resources identify unique items based upon the ranking algorithms involved.

Download Full-text

Inter-method reliability of the modified Rankin Scale in patients with subarachnoid hemorrhage

Journal of Neurology ◽

10.1007/s00415-021-10880-4 ◽

2021 ◽

Author(s):

E. Nobels-Janssen ◽

E. N. Postma ◽

I. L. Abma ◽

J. M. C. van Dijk ◽

R. Haeren ◽

...

Keyword(s):

Subarachnoid Hemorrhage ◽

Aneurysmal Subarachnoid Hemorrhage ◽

Weighted Kappa ◽

Structured Interview ◽

Assessment Methods ◽

The Self ◽

Modified Rankin Scale ◽

Self Assessment ◽

Unique Identifier ◽

Kappa Score

Abstract Background and objectives The modified Rankin Scale (mRS) is one of the most frequently used outcome measures in trials in patients with an aneurysmal subarachnoid hemorrhage (aSAH). The assessment method of the mRS is often not clearly described in trials, while the method used might influence the mRS score. The aim of this study is to evaluate the inter-method reliability of different assessment methods of the mRS. Methods This is a prospective, randomized, multicenter study with follow-up at 6 weeks and 6 months. Patients aged ≥ 18 years with aSAH were randomized to either a structured interview or a self-assessment of the mRS. Patients were seen by a physician who assigned an mRS score, followed by either the structured interview or the self-assessment. Inter-method reliability was assessed with the quadratic weighted kappa score and percentage of agreement. Assessment of feasibility of the self-assessment was done by a feasibility questionnaire. Results The quadratic weighted kappa was 0.60 between the assessment of the physician and structured interview and 0.56 between assessment of the physician and self-assessment. Percentage agreement was, respectively, 50.8 and 19.6%. The assessment of the mRS through a structured interview and by self-assessment resulted in systematically higher mRS scores than the mRS scored by the physician. Self-assessment of the mRS was proven feasible. Discussion The mRS scores obtained with different assessment methods differ significantly. The agreement between the scores is low, although the reliability between the assessment methods is good. This should be considered when using the mRS in clinical trials. Trial registration www.trialregister.nl; Unique identifier: NL7859.

Download Full-text

The structured interview for anorexic and bulimic disorders for DSM-IV and ICD-10 (SIAB-EX): reliability and validity

European Psychiatry ◽

10.1016/s0924-9338(00)00534-4 ◽

2001 ◽

Vol 16 (1) ◽

pp. 38-48 ◽

Cited By ~ 73

Author(s):

M. Fichter ◽

N. Quadflieg

Keyword(s):

Eating Disorders ◽

Construct Validity ◽

Rating Scales ◽

Reliability And Validity ◽

Structured Interview ◽

Study Data ◽

Self Report ◽

Rater Reliability ◽

Valid Assessment ◽

Demoralization Scale

Objective. For reliable and valid assessment and diagnostic categorization of eating disorders, self-report measures have considerable limitations. A semi-structured interview – the SIAB-EX – was developed for a more reliable and valid assessment of eating disorders.Methods. One study (videotapes of 31 inpatients, seven raters) was made to establish inter-rater reliability; in another study with 80 patients the SIAB-EX was compared to another semi-structured interview designed for comparable purposes (EDE). In a third study data was obtained on 377 eating disorder patients seeking treatment to explore discriminant and convergent (construct) validity using the following self-rating scales: EDI, TFEQ, SCL-90, BDI, and the PERI Demoralization Scale.Results. Inter-rater reliability of dichotomous ratings was good with mean kappa values of .81 (current) and .85 (past). Comparison of the SIAB-EX with the EDE generally showed quite similar results and higher intercorrelation of the total scale (.77). There are, however, a number of differences between the two scales, which are discussed in detail. Construct validity of the SIAB-EX was established.Conclusion. Inter-rater reliability was good. Convergent and discriminant (construct) validity of the SIAB-EX was demonstrated. The constructs assessed by the SIAB and its subscales and items are discussed in the context of their correlations with other well-known scales.

Download Full-text

Abstract 3148: Self Reported Quality of Life After Intracerebral Hemorrhage: Is a Modified Rankin Scale Score of 4 Worth it?

Stroke ◽

10.1161/str.43.suppl_1.a3148 ◽

2012 ◽

Vol 43 (suppl_1) ◽

Author(s):

Jonathan T Kleinman ◽

Ryan W Snider ◽

Irina Eyngorn ◽

Demi Thai ◽

Sevan R Komshian ◽

...

Keyword(s):

Quality Of Life ◽

Intracerebral Hemorrhage ◽

Scale Score ◽

Structured Interview ◽

Modified Rankin Scale ◽

Spontaneous Intracerebral Hemorrhage ◽

Personal Bias ◽

The Individual ◽

Time Point

Background: Intracerebral hemorrhage (ICH) trials often define poor outcome as a modified Rankin Scale Score (mRS) ≥4. While mRS score thresholds are important for demonstrating treatment effect, they do no tell physicians if a treatment outcome is “worth it.” Little self-reported quality of life (QOL) data exists to guide physicians, so opinions during academic discussions and/or family meetings may be driven by personal bias. We sought to describe both self and surrogate reported QOL in ICH survivors in relation to mRS score. Methods: Consecutive ICH patient were prospectively enrolled in the NIH-funded DiAgnostic Utility of MRI in Spontaneous Intracerebral Hemorrhage (DASH) study. Survivors were followed up at 3 months in clinic and at 12 months by telephone. At each time point, patients or surrogates were asked to rate the patient’s QOL as: excellent, good, fair, or poor. mRS scores were determined by an investigator through a semi-structured interview. Results: Self reported QOL was available in 95 patients with 143 QOL ratings, and surrogate reported QOL in 66 patients with 84 QOL ratings. Of self-reporters with a mRS of 4, 29% reported at least a good QOL, and 93% rated at least a fair QOL ( Figure 1). Of self-reporters with a mRS of 3, 58% reported at least a good QOL, and 97% rated at least a fair QOL. Patients with a mRS of 4 were less likely to report a poor QOL than surrogate raters (χ 2 =3.9, p=0.05, Figure 2). In all patients, both self-reported and surrogate reported QOL were only loosely associated with mRS (R 2 =0.25 and R 2 =0.12, respectively). Forty-eight patients had self-reported QOL at 3 and 12 months. In these patients mRS improved in 16 (33%) patients without an associated improvement in QOL. Seven patients (15%) reported an improvement in QOL, but only 3 had an improvement in their mRS between 3 and 12 months. In 3 (6%) patients, the mRS worsened while QOL remained unchanged. No change in mRS was seen in 8 (17%) patients who reported worse QOL at 12 than at 3 months. Conclusions: Self reported QOL is only loosely correlated with mRS for the individual patient. Patient surrogates are more prone to rate QOL of patients with a mRS of 4 as poor than patients themselves. These data are clinically relevant as mRS alone may not capture the satisfaction of the individual patient with their outcome.

Download Full-text

A Mixed-Methods Evaluation of Parent-Assisted Children’s Friendship Training to Improve Social Skills and Friendship Quality in Children with Autism in Malaysia

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18052566 ◽

2021 ◽

Vol 18 (5) ◽

pp. 2566

Author(s):

Sing Yee Ong ◽

Samsilah Roslan ◽

Nor Aniza Ahmad ◽

Ahmad Fauzi Mohd Ayub ◽

Chen Lee Ping ◽

...

Keyword(s):

Social Skills ◽

Rating Scales ◽

Friendship Quality ◽

Children With Autism ◽

Structured Interview ◽

Autism Spectrum ◽

Children With Asd ◽

The Social ◽

Friendship Training

Background: This study evaluates the effectiveness of parent-assisted children’s friendship training intervention for enhancing friendship quality and social skills among children with autism spectrum disorders (ASD). We conducted a quasi-experimental study to investigate the effective outcomes of social skills and friendship quality in the pre-and post-parent-assisted CFT intervention phases; Methods: to conduct a 12-week field session, 30 children with their parents were selected. The Social Skills Improvement System Rating Scales and the Quality of Play Questionnaire-Parent were used to assess the effectiveness of the parent-assisted children’s friendship training during pre-and post-intervention. A semi-structured interview with parents was conducted at the end of the session; Results: findings revealed that intervention improved the social skills of these children. Additionally, the friendship quality of children with ASD improved before and after the intervention, however, engagement remained unchanged. Parents also showed some sort of improvement after the session as they reported a heightened sense of fear and resistance, awareness, learning and adjustment, change is not easy, and identifying support; Conclusions: there was clear evidence that children with ASD benefitted from parent-assisted CFTs in terms of social skills and friendship quality. However, larger and controlled studies are required to draw firm conclusions about this kind of intervention.

Download Full-text