New Interpretations of Cohen’s Kappa

Cohen’s kappa is a widely used association coefficient for summarizing interrater agreement on a nominal scale. Kappa reduces the ratings of the two observers to a single number. With three or more categories it is more informative to summarize the ratings by category coefficients that describe the information for each category separately. Examples of category coefficients are the sensitivity or specificity of a category or the Bloch-Kraemer weighted kappa. However, in many research studies one is often only interested in a single overall number that roughly summarizes the agreement. It is shown that both the overall observed agreement and Cohen’s kappa are weighted averages of various category coefficients and thus can be used to summarize these category coefficients.

Download Full-text

Abstract WP276: Simplification of a Prehospital Short NIHSS Scale Does not Increase Interrater Agreement Between Emergency Medical Services and Stroke Specialists

Stroke ◽

10.1161/str.48.suppl_1.wp276 ◽

2017 ◽

Vol 48 (suppl_1) ◽

Author(s):

Jelle Demeestere ◽

Carlos Garcia-Esperon ◽

Longting Lin ◽

Allan Loudfoot ◽

Andrew Bivard ◽

...

Keyword(s):

Emergency Medical Services ◽

Weighted Kappa ◽

Interrater Agreement ◽

Medical Services ◽

Kappa Statistics ◽

Single Centre ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Emergency Medical ◽

Patient Arrival

Objective: To assess if simplifying a prehospital 8-item NIHSS scale (NIHSS-8, fig 1) to a 0 (symptom absent) – 1 (symptom present) scoring system increases interrater agreement between emergency medical services (EMS) and stroke specialists. Methods: We analysed interrater agreement between EMS and stroke specialists of a single centre on a prospectively collected cohort of 64 suspected acute ischemic stroke patients. EMS performed NIHSS-8 score upon patient arrival at the emergency department. The stroke specialist scored the full 15-item NIHSS blind to the EMS scores and within 5 minutes of patient arrival. Linear-weighted Cohen’s kappa statistics was used to assess agreement between EMS and stroke specialist on the total NIHSS-8 score and each NIHSS-8 scale item. We then simplified each item to a 0-1 score and reassessed interrater agreement for the overall NIHSS-8 scale using linear-weighted Cohen’s kappa statistics and for each NIHSS-8 item using Cohen’s kappa statistics. We used Cohen’s kappa statistics to assess agreement for original and simplified NIHSS-8 cut-off scores. Results: EMS and stroke specialist reached substantial agreement on overall NIHSS-8 scoring (linear-weighted kappa 0.69). Optimum agreement was reached for right arm weakness (linear-weighted kappa 0.79; Table 1) and a cut-off score of 2 and 5 (Cohen’s kappa 0.78; Table 2). When the score was simplified to a 0-1, overall agreement between EMS and stroke specialists was substantial (linear-weighted kappa 0.65). Optimum agreement was seen for LOC questions (Cohen’s kappa 0.78; Table 1) and a cut-off score of 2 (Cohen’s kappa 0.77; Table 2). Conclusion: Simplifying an 8-item prehospital NIHSS stroke scale does not increase interrater agreement between emergency medical services and stroke specialists.

Download Full-text

Histology as a Valid and Reliable Tool To Differentiate Fresh from Frozen-Thawed Fish

Journal of Food Protection ◽

10.4315/0362-028x.jfp-12-035 ◽

2012 ◽

Vol 75 (8) ◽

pp. 1536-1541 ◽

Cited By ~ 9

Author(s):

E. BOZZETTA ◽

M. PEZZOLATO ◽

E. CENCETTI ◽

K. VARELLO ◽

F. ABRAMO ◽

...

Keyword(s):

Fish Species ◽

Interrater Agreement ◽

Cohen’S Kappa ◽

Method Performance ◽

Tissue Samples ◽

Fish Products ◽

Cohen's Kappa ◽

Accuracy And Precision ◽

Fish Samples ◽

Method Accuracy

Selling fish products as fresh when they have actually been frozen and thawed is a common fraudulent practice in seafood retailing. Unlike fish products frozen to protect them against degenerative changes during transportation and to extend the product's storage life, fish intended for raw consumption in European countries must be previously frozen at −20°C for at least 24 h to kill parasites. The aim of this study was to use histological analysis to distinguish between fresh and frozen-thawed fish and to evaluate this method for use as a routine screening technique in compliance with the requirements of European Commission Regulation No. 882/2004 on official food and feed controls. Method performance (i.e., accuracy and precision) was evaluated on tissue samples from three common Mediterranean fish species; the evaluation was subsequently extended to include samples from 35 fish species in a second experiment to test for method robustness. Method accuracy was tested by comparing histological results against a “gold standard” obtained from the analysis of frozen and unfrozen fish samples prepared for the study. Method precision was evaluated according to interrater agreement (i.e., three laboratories with expertise in histopathology in the first experiment and three expert analysts in the second experiment) by estimating Cohen's kappa (and corresponding 95% confidence intervals) for each pair of laboratories and experts and the combined Cohen's kappa for all three experts and laboratories. The observed interrater agreement among the three laboratories and the three experts indicated high levels of method accuracy and precision (high sensitivity and specificity) and method reproducibility. Our results suggest that histology is a rapid, simple, and highly accurate method for distinguishing between fresh and frozen-thawed fish, regardless of the fish species analyzed.

Download Full-text

Learning how to Differ: Agreement and Reliability Statistics in Psychiatry

The Canadian Journal of Psychiatry ◽

10.1177/070674379504000202 ◽

1995 ◽

Vol 40 (2) ◽

pp. 60-66 ◽

Cited By ~ 46

Author(s):

L. Streiner David

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Intraclass Correlation ◽

Weighted Kappa ◽

The Other ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

The Subject

Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.

Download Full-text

Interrater agreement statistics with skewed data: Evaluation of alternatives to Cohen’s kappa.

Journal of Consulting and Clinical Psychology ◽

10.1037/a0037489 ◽

2014 ◽

Vol 82 (6) ◽

pp. 1219-1227 ◽

Cited By ~ 37

Author(s):

Shu Xu ◽

Michael F. Lorber

Keyword(s):

Data Evaluation ◽

Interrater Agreement ◽

Skewed Data ◽

Cohen’S Kappa ◽

Cohen's Kappa

Download Full-text

On the Equivalence of Multirater Kappas Based on 2-Agreement and 3-Agreement with Binary Scores

ISRN Probability and Statistics ◽

10.5402/2012/656390 ◽

2012 ◽

Vol 2012 ◽

pp. 1-11

Author(s):

Matthijs J. Warrens

Keyword(s):

Cohen’S Kappa ◽

Nominal Scale ◽

Cohen's Kappa

Cohen’s kappa is a popular descriptive statistic for summarizing agreement between the classifications of two raters on a nominal scale. With m≥3 raters there are several views in the literature on how to define agreement. The concept of g-agreement (g∈{2,3,…,m}) refers to the situation in which it is decided that there is agreement if g out of m raters assign an object to the same category. Given m≥2 raters we can formulate m−1 multirater kappas, one based on 2-agreement, one based on 3-agreement, and so on, and one based on m-agreement. It is shown that if the scale consists of only two categories the multi-rater kappas based on 2-agreement and 3-agreement are identical.

Download Full-text

Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α

Understanding Statistics ◽

10.1207/s15328031us0203_03 ◽

2003 ◽

Vol 2 (3) ◽

pp. 205-219 ◽

Cited By ~ 56

Author(s):

Louis M. Hsu ◽

Ronald Field

Keyword(s):

Interrater Agreement ◽

Cohen’S Kappa ◽

Cohen's Kappa

Download Full-text

Kappa Coefficients for Missing Data

Educational and Psychological Measurement ◽

10.1177/0013164418823249 ◽

2019 ◽

Vol 79 (3) ◽

pp. 558-576 ◽

Cited By ~ 10

Author(s):

Alexandra De Raadt ◽

Matthijs J. Warrens ◽

Roel J. Bosker ◽

Henk A. L. Kiers

Keyword(s):

Missing Data ◽

Mean Squared Error ◽

Kappa Coefficient ◽

Complete Data ◽

Cohen’S Kappa ◽

Nominal Scale ◽

Listwise Deletion ◽

Squared Error ◽

Cohen's Kappa ◽

Kappa Value

Cohen’s kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen’s kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data under two missing data mechanisms—namely, missingness completely at random and a form of missingness not at random. The kappa coefficient considered in Gwet ( Handbook of Inter-rater Reliability, 4th ed.) and the kappa coefficient based on listwise deletion of units with missing ratings were found to have virtually no bias and mean squared error if missingness is completely at random, and small bias and mean squared error if missingness is not at random. Furthermore, the kappa coefficient that treats missing ratings as a regular category appears to be rather heavily biased and has a substantial mean squared error in many of the simulations. Because it performs well and is easy to compute, we recommend to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random.

Download Full-text

Fixed-Effects Modeling of Cohen's Weighted Kappa for Bivariate Multinomial Data: A Perspective of Generalized Inverse

Journal of Probability and Statistics ◽

10.1155/2011/603856 ◽

2011 ◽

Vol 2011 ◽

pp. 1-14

Author(s):

Jingyun Yang ◽

Vernon M. Chinchilli

Keyword(s):

Fixed Effects ◽

Generalized Inverse ◽

Weighted Kappa ◽

Small Sample ◽

Kappa Statistics ◽

Cohen’S Kappa ◽

Multinomial Data ◽

Cohen's Kappa ◽

Bootstrap Simulation ◽

Measuring Agreement

Cohen's kappa and weighted kappa statistics are the conventional methods used frequently in measuring agreement for categorical responses. In this paper, through the perspective of a generalized inverse, we propose an alternative general framework of the fixed-effects modeling of Cohen's weighted kappa, proposed by Yang and Chinchilli (2011). Properties of the proposed method are provided. Small sample performance is investigated through bootstrap simulation studies, which demonstrate good performance of the proposed method. When there are only two categories, the proposed method reduces to Cohen's kappa.

Download Full-text

Validation of the Simplified Modified Rankin Scale Questionnaire

European Neurology ◽

10.1159/000510721 ◽

2020 ◽

Vol 83 (5) ◽

pp. 493-499

Author(s):

Eva Isaksson ◽

Per Wester ◽

Ann Charlotte Laska ◽

Per Näsman ◽

Erik Lundström

Keyword(s):

Assessment Tool ◽

Rank Correlation ◽

Weighted Kappa ◽

Modified Rankin Scale ◽

Next Of Kin ◽

Cohen’S Kappa ◽

Time Saving ◽

Face To Face ◽

Cohen's Kappa ◽

Number Of Patients

Introduction: The modified Rankin scale (mRS) is the most common assessment tool for measuring overall functional outcome in stroke studies. The traditional way of using mRS face-to-face is time- and cost-consuming. The aim of this study was to test the validity of the Swedish translation of the simplified modified Rankin scale questionnaire (smRSq) as compared with the mRS assessed face-to-face 6 months after a stroke. Methods: Within the ongoing EFFECTS trial, smRSq was sent out to 108 consecutive stroke patients 6 months after a stroke. The majority, 90% (97/108), of the patients answered the questionnaire; for the remaining 10%, it was answered by the next of kin. The patients were assessed by face-to-face mRS by 7 certified healthcare professionals at 4 Swedish stroke centres. The primary outcome was assessed by Cohen’s kappa and weighted kappa. Results: There was good agreement between postal smRSq, answered by the patients, and the mRS face-to-face; Cohen’s kappa was 0.43 (CI 95% 0.31–0.55), weighted kappa was 0.64 (CI 95% 0.55–0.73), and Spearman rank correlation was 0.82 (p < 0.0001). In 55% (59/108), there was full agreement, and of the 49 patients not showing exact agreement, 44 patients differed by 1 grade and 5 patients had a difference of 2 grades. Discussion/Conclusion: Our results show good validity of the postal smRSq, answered by the patients, compared with the mRS carried out face-to-face at 6 months after a stroke. This result could help trialists in the future simplify study design and make multicentre trials and quality registers with a large number of patients more feasible and time-saving.

Download Full-text

Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables

Statistical Methodology ◽

10.1016/j.stamet.2010.09.004 ◽

2011 ◽

Vol 8 (2) ◽

pp. 268-272 ◽

Cited By ~ 15

Author(s):

Matthijs J. Warrens

Keyword(s):

Weighted Kappa ◽

Cohen’S Kappa ◽

Cohen's Kappa

Download Full-text