scholarly journals New Interpretations of Cohen’s Kappa

2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Matthijs J. Warrens

Cohen’s kappa is a widely used association coefficient for summarizing interrater agreement on a nominal scale. Kappa reduces the ratings of the two observers to a single number. With three or more categories it is more informative to summarize the ratings by category coefficients that describe the information for each category separately. Examples of category coefficients are the sensitivity or specificity of a category or the Bloch-Kraemer weighted kappa. However, in many research studies one is often only interested in a single overall number that roughly summarizes the agreement. It is shown that both the overall observed agreement and Cohen’s kappa are weighted averages of various category coefficients and thus can be used to summarize these category coefficients.

Stroke ◽  
2017 ◽  
Vol 48 (suppl_1) ◽  
Author(s):  
Jelle Demeestere ◽  
Carlos Garcia-Esperon ◽  
Longting Lin ◽  
Allan Loudfoot ◽  
Andrew Bivard ◽  
...  

Objective: To assess if simplifying a prehospital 8-item NIHSS scale (NIHSS-8, fig 1) to a 0 (symptom absent) – 1 (symptom present) scoring system increases interrater agreement between emergency medical services (EMS) and stroke specialists. Methods: We analysed interrater agreement between EMS and stroke specialists of a single centre on a prospectively collected cohort of 64 suspected acute ischemic stroke patients. EMS performed NIHSS-8 score upon patient arrival at the emergency department. The stroke specialist scored the full 15-item NIHSS blind to the EMS scores and within 5 minutes of patient arrival. Linear-weighted Cohen’s kappa statistics was used to assess agreement between EMS and stroke specialist on the total NIHSS-8 score and each NIHSS-8 scale item. We then simplified each item to a 0-1 score and reassessed interrater agreement for the overall NIHSS-8 scale using linear-weighted Cohen’s kappa statistics and for each NIHSS-8 item using Cohen’s kappa statistics. We used Cohen’s kappa statistics to assess agreement for original and simplified NIHSS-8 cut-off scores. Results: EMS and stroke specialist reached substantial agreement on overall NIHSS-8 scoring (linear-weighted kappa 0.69). Optimum agreement was reached for right arm weakness (linear-weighted kappa 0.79; Table 1) and a cut-off score of 2 and 5 (Cohen’s kappa 0.78; Table 2). When the score was simplified to a 0-1, overall agreement between EMS and stroke specialists was substantial (linear-weighted kappa 0.65). Optimum agreement was seen for LOC questions (Cohen’s kappa 0.78; Table 1) and a cut-off score of 2 (Cohen’s kappa 0.77; Table 2). Conclusion: Simplifying an 8-item prehospital NIHSS stroke scale does not increase interrater agreement between emergency medical services and stroke specialists.


2012 ◽  
Vol 75 (8) ◽  
pp. 1536-1541 ◽  
Author(s):  
E. BOZZETTA ◽  
M. PEZZOLATO ◽  
E. CENCETTI ◽  
K. VARELLO ◽  
F. ABRAMO ◽  
...  

Selling fish products as fresh when they have actually been frozen and thawed is a common fraudulent practice in seafood retailing. Unlike fish products frozen to protect them against degenerative changes during transportation and to extend the product's storage life, fish intended for raw consumption in European countries must be previously frozen at −20°C for at least 24 h to kill parasites. The aim of this study was to use histological analysis to distinguish between fresh and frozen-thawed fish and to evaluate this method for use as a routine screening technique in compliance with the requirements of European Commission Regulation No. 882/2004 on official food and feed controls. Method performance (i.e., accuracy and precision) was evaluated on tissue samples from three common Mediterranean fish species; the evaluation was subsequently extended to include samples from 35 fish species in a second experiment to test for method robustness. Method accuracy was tested by comparing histological results against a “gold standard” obtained from the analysis of frozen and unfrozen fish samples prepared for the study. Method precision was evaluated according to interrater agreement (i.e., three laboratories with expertise in histopathology in the first experiment and three expert analysts in the second experiment) by estimating Cohen's kappa (and corresponding 95% confidence intervals) for each pair of laboratories and experts and the combined Cohen's kappa for all three experts and laboratories. The observed interrater agreement among the three laboratories and the three experts indicated high levels of method accuracy and precision (high sensitivity and specificity) and method reproducibility. Our results suggest that histology is a rapid, simple, and highly accurate method for distinguishing between fresh and frozen-thawed fish, regardless of the fish species analyzed.


1995 ◽  
Vol 40 (2) ◽  
pp. 60-66 ◽  
Author(s):  
L. Streiner David

Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.


2012 ◽  
Vol 2012 ◽  
pp. 1-11
Author(s):  
Matthijs J. Warrens

Cohen’s kappa is a popular descriptive statistic for summarizing agreement between the classifications of two raters on a nominal scale. With m≥3 raters there are several views in the literature on how to define agreement. The concept of g-agreement (g∈{2,3,…,m}) refers to the situation in which it is decided that there is agreement if g out of m raters assign an object to the same category. Given m≥2 raters we can formulate m−1 multirater kappas, one based on 2-agreement, one based on 3-agreement, and so on, and one based on m-agreement. It is shown that if the scale consists of only two categories the multi-rater kappas based on 2-agreement and 3-agreement are identical.


2019 ◽  
Vol 79 (3) ◽  
pp. 558-576 ◽  
Author(s):  
Alexandra De Raadt ◽  
Matthijs J. Warrens ◽  
Roel J. Bosker ◽  
Henk A. L. Kiers

Cohen’s kappa coefficient is commonly used for assessing agreement between classifications of two raters on a nominal scale. Three variants of Cohen’s kappa that can handle missing data are presented. Data are considered missing if one or both ratings of a unit are missing. We study how well the variants estimate the kappa value for complete data under two missing data mechanisms—namely, missingness completely at random and a form of missingness not at random. The kappa coefficient considered in Gwet ( Handbook of Inter-rater Reliability, 4th ed.) and the kappa coefficient based on listwise deletion of units with missing ratings were found to have virtually no bias and mean squared error if missingness is completely at random, and small bias and mean squared error if missingness is not at random. Furthermore, the kappa coefficient that treats missing ratings as a regular category appears to be rather heavily biased and has a substantial mean squared error in many of the simulations. Because it performs well and is easy to compute, we recommend to use the kappa coefficient that is based on listwise deletion of missing ratings if it can be assumed that missingness is completely at random or not at random.


2011 ◽  
Vol 2011 ◽  
pp. 1-14
Author(s):  
Jingyun Yang ◽  
Vernon M. Chinchilli

Cohen's kappa and weighted kappa statistics are the conventional methods used frequently in measuring agreement for categorical responses. In this paper, through the perspective of a generalized inverse, we propose an alternative general framework of the fixed-effects modeling of Cohen's weighted kappa, proposed by Yang and Chinchilli (2011). Properties of the proposed method are provided. Small sample performance is investigated through bootstrap simulation studies, which demonstrate good performance of the proposed method. When there are only two categories, the proposed method reduces to Cohen's kappa.


2020 ◽  
Vol 83 (5) ◽  
pp. 493-499
Author(s):  
Eva Isaksson ◽  
Per Wester ◽  
Ann Charlotte Laska ◽  
Per Näsman ◽  
Erik Lundström

<b><i>Introduction:</i></b> The modified Rankin scale (mRS) is the most common assessment tool for measuring overall functional outcome in stroke studies. The traditional way of using mRS face-to-face is time- and cost-consuming. The aim of this study was to test the validity of the Swedish translation of the simplified modified Rankin scale questionnaire (smRSq) as compared with the mRS assessed face-to-face 6 months after a stroke. <b><i>Methods:</i></b> Within the ongoing EFFECTS trial, smRSq was sent out to 108 consecutive stroke patients 6 months after a stroke. The majority, 90% (97/108), of the patients answered the questionnaire; for the remaining 10%, it was answered by the next of kin. The patients were assessed by face-to-face mRS by 7 certified healthcare professionals at 4 Swedish stroke centres. The primary outcome was assessed by Cohen’s kappa and weighted kappa. <b><i>Results:</i></b> There was good agreement between postal smRSq, answered by the patients, and the mRS face-to-face; Cohen’s kappa was 0.43 (CI 95% 0.31–0.55), weighted kappa was 0.64 (CI 95% 0.55–0.73), and Spearman rank correlation was 0.82 (<i>p</i> &#x3c; 0.0001). In 55% (59/108), there was full agreement, and of the 49 patients not showing exact agreement, 44 patients differed by 1 grade and 5 patients had a difference of 2 grades. <b><i>Discussion/Conclusion:</i></b> Our results show good validity of the postal smRSq, answered by the patients, compared with the mRS carried out face-to-face at 6 months after a stroke. This result could help trialists in the future simplify study design and make multicentre trials and quality registers with a large number of patients more feasible and time-saving.


Sign in / Sign up

Export Citation Format

Share Document