Fixed-Effects Modeling of Cohen's Weighted Kappa for Bivariate Multinomial Data: A Perspective of Generalized Inverse

Cohen's kappa and weighted kappa statistics are the conventional methods used frequently in measuring agreement for categorical responses. In this paper, through the perspective of a generalized inverse, we propose an alternative general framework of the fixed-effects modeling of Cohen's weighted kappa, proposed by Yang and Chinchilli (2011). Properties of the proposed method are provided. Small sample performance is investigated through bootstrap simulation studies, which demonstrate good performance of the proposed method. When there are only two categories, the proposed method reduces to Cohen's kappa.

Download Full-text

Fixed-Effects Modeling of Cohen's Kappa for Bivariate Multinomial Data

Communication in Statistics- Theory and Methods ◽

10.1080/03610920802618426 ◽

2009 ◽

Vol 38 (20) ◽

pp. 3634-3653 ◽

Cited By ~ 4

Author(s):

Jingyun Yang ◽

Vernon M. Chinchilli

Keyword(s):

Fixed Effects ◽

Cohen’S Kappa ◽

Multinomial Data ◽

Cohen's Kappa

Download Full-text

Abstract WP276: Simplification of a Prehospital Short NIHSS Scale Does not Increase Interrater Agreement Between Emergency Medical Services and Stroke Specialists

Stroke ◽

10.1161/str.48.suppl_1.wp276 ◽

2017 ◽

Vol 48 (suppl_1) ◽

Author(s):

Jelle Demeestere ◽

Carlos Garcia-Esperon ◽

Longting Lin ◽

Allan Loudfoot ◽

Andrew Bivard ◽

...

Keyword(s):

Emergency Medical Services ◽

Weighted Kappa ◽

Interrater Agreement ◽

Medical Services ◽

Kappa Statistics ◽

Single Centre ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Emergency Medical ◽

Patient Arrival

Objective: To assess if simplifying a prehospital 8-item NIHSS scale (NIHSS-8, fig 1) to a 0 (symptom absent) – 1 (symptom present) scoring system increases interrater agreement between emergency medical services (EMS) and stroke specialists. Methods: We analysed interrater agreement between EMS and stroke specialists of a single centre on a prospectively collected cohort of 64 suspected acute ischemic stroke patients. EMS performed NIHSS-8 score upon patient arrival at the emergency department. The stroke specialist scored the full 15-item NIHSS blind to the EMS scores and within 5 minutes of patient arrival. Linear-weighted Cohen’s kappa statistics was used to assess agreement between EMS and stroke specialist on the total NIHSS-8 score and each NIHSS-8 scale item. We then simplified each item to a 0-1 score and reassessed interrater agreement for the overall NIHSS-8 scale using linear-weighted Cohen’s kappa statistics and for each NIHSS-8 item using Cohen’s kappa statistics. We used Cohen’s kappa statistics to assess agreement for original and simplified NIHSS-8 cut-off scores. Results: EMS and stroke specialist reached substantial agreement on overall NIHSS-8 scoring (linear-weighted kappa 0.69). Optimum agreement was reached for right arm weakness (linear-weighted kappa 0.79; Table 1) and a cut-off score of 2 and 5 (Cohen’s kappa 0.78; Table 2). When the score was simplified to a 0-1, overall agreement between EMS and stroke specialists was substantial (linear-weighted kappa 0.65). Optimum agreement was seen for LOC questions (Cohen’s kappa 0.78; Table 1) and a cut-off score of 2 (Cohen’s kappa 0.77; Table 2). Conclusion: Simplifying an 8-item prehospital NIHSS stroke scale does not increase interrater agreement between emergency medical services and stroke specialists.

Download Full-text

New Interpretations of Cohen’s Kappa

Journal of Mathematics ◽

10.1155/2014/203907 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 7

Author(s):

Matthijs J. Warrens

Keyword(s):

Weighted Kappa ◽

Interrater Agreement ◽

Cohen’S Kappa ◽

Association Coefficient ◽

Nominal Scale ◽

Weighted Averages ◽

Cohen's Kappa ◽

Single Number ◽

Research Studies

Cohen’s kappa is a widely used association coefficient for summarizing interrater agreement on a nominal scale. Kappa reduces the ratings of the two observers to a single number. With three or more categories it is more informative to summarize the ratings by category coefficients that describe the information for each category separately. Examples of category coefficients are the sensitivity or specificity of a category or the Bloch-Kraemer weighted kappa. However, in many research studies one is often only interested in a single overall number that roughly summarizes the agreement. It is shown that both the overall observed agreement and Cohen’s kappa are weighted averages of various category coefficients and thus can be used to summarize these category coefficients.

Download Full-text

Learning how to Differ: Agreement and Reliability Statistics in Psychiatry

The Canadian Journal of Psychiatry ◽

10.1177/070674379504000202 ◽

1995 ◽

Vol 40 (2) ◽

pp. 60-66 ◽

Cited By ~ 46

Author(s):

L. Streiner David

Keyword(s):

Correlation Coefficient ◽

Intraclass Correlation Coefficient ◽

Intraclass Correlation ◽

Weighted Kappa ◽

The Other ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

The Subject

Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.

Download Full-text

Adherence to Antiseizure vs Other Medications Among US Medicare Beneficiaries With and Without Epilepsy

Neurology ◽

10.1212/wnl.0000000000013119 ◽

2021 ◽

pp. 10.1212/WNL.0000000000013119

Author(s):

Samuel Waller Terman ◽

Wesley T Kerr ◽

Carole E Aubert ◽

Chloe E Hill ◽

Zachary A Marcum ◽

...

Keyword(s):

Selective Serotonin Reuptake Inhibitors ◽

Correlation Coefficients ◽

International Classification Of Diseases ◽

Kappa Statistics ◽

Spearman Correlation ◽

Medicare Beneficiaries ◽

Cohen’S Kappa ◽

Diagnostic Codes ◽

Cohen's Kappa ◽

Classification Of Diseases

Objective:To 1) compare adherence to antiseizure medications (ASMs) versus non-ASMs among individuals with epilepsy, 2) assess the degree to which variation in adherence is due to differences between individuals versus between medication classes among individuals with epilepsy, and 3) compare adherence in individuals with versus without epilepsy.Methods:This was a retrospective cohort study using Medicare. We included beneficiaries with epilepsy (≥1 ASM, plus International Classification of Diseases, Ninth Revision, Clinical Modification diagnostic codes), and a 20% random sample without epilepsy. Adherence for each medication class was measured by the proportion of days covered (PDC) in 2013-2015. We used Spearman correlation coefficients, Cohen’s kappa statistics, and multilevel logistic regressions.Results:There were 83,819 beneficiaries with epilepsy. Spearman correlation coefficients between ASM PDCs and each of the 5 non-ASM PDCs ranged 0.44-0.50, Cohen’s kappa ranged 0.33-0.38, and within-person differences between each ASM’s PDC minus each non-ASM’s PDC were all statistically significant (p<0.01) though median differences were all very close to 0. Fifty-four percent of variation in adherence across medications was due to differences between individuals. Adjusted predicted probabilities of adherence were: ASMs 74% (95% confidence interval [CI] 73%-74%), proton pump inhibitors 74% (95% CI 74%-74%), antihypertensives 77% (95% CI 77%-78%), selective serotonin reuptake inhibitors 77% (95% CI 77%-78%), statins 78% (95% CI 78%-79%), and levothyroxine 82% (95% CI 81%-82%). Adjusted predicted probabilities of adherence to non-ASMs were 80% (95% CI 80%-81%) for beneficiaries with epilepsy versus 77% (77%-77%) for beneficiaries without epilepsy.Conclusion:Among individuals with epilepsy, ASM and non-ASM adherence were moderately correlated, half of variation in adherence was due to between-person rather than between-medication differences, adjusted adherence was slightly lower for ASMs than several non-ASMs, and epilepsy was associated with a quite small increase in adherence to non-ASMs. Nonadherence to ASMs may provide an important cue to the clinician to inquire about adherence to other potentially life-prolonging medications as well. Although efforts should focus on improving ASM adherence, patient-level rather than purely medication-specific behaviors are also critical to consider when developing interventions to optimize adherence.

Download Full-text

Estimating Scorer Agreement for Nominal Categorization Systems

Educational and Psychological Measurement ◽

10.1177/001316448104100403 ◽

1981 ◽

Vol 41 (4) ◽

pp. 953-962 ◽

Cited By ~ 5

Author(s):

Nancy W. Burton

Keyword(s):

Control Program ◽

Sufficient Information ◽

Educational Objective ◽

National Assessment ◽

Cohen’S Kappa ◽

Quality Control Program ◽

Cohen's Kappa ◽

Potential Problems ◽

Overall Evaluation ◽

Measuring Agreement

National Assessment has devised a scoring system that involves using nominal classifications of responses as well as an overall evaluation of whether responses provide acceptable evidence that an educational objective has been met. This study was concerned with selecting a measure of scorer agreement for an ongoing quality-control program. The purpose of the agreement statistic was to screen out potential problems for further analysis of disagreements. Based on a review of various statistics proposed for measuring agreement, two were chosen for further study: the simple percent of agreement and Cohen's kappa. The analysis showed that Cohen's kappa is extremely sensitive to very easy or very difficult items which comprised more than a third of the measures in this study. The percent of agreement statistic, on the other hand, is overly sensitive to high variance items—those of medium difficulty with many scoring categories. Since kappa is inappropriate for so many items, and since National Assessment can use the disagreement information on many-category items in later analysis, staff concluded that Cohen's kappa does not add sufficient information to make its calculation worthwhile.

Download Full-text

Inter-Coder Agreement for Computational Linguistics

Computational Linguistics ◽

10.1162/coli.07-034-r2 ◽

2008 ◽

Vol 34 (4) ◽

pp. 555-596 ◽

Cited By ~ 357

Author(s):

Ron Artstein ◽

Massimo Poesio

Keyword(s):

Computational Linguistics ◽

Corpus Annotation ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Measuring Agreement

This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff's alpha as well as Scott's pi and Cohen's kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder.

Download Full-text

Fixed-effects modeling of Cohen’s weighted kappa for bivariate multinomial data

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2010.08.014 ◽

2011 ◽

Vol 55 (2) ◽

pp. 1061-1070 ◽

Cited By ~ 7

Author(s):

Jingyun Yang ◽

Vernon M. Chinchilli

Keyword(s):

Fixed Effects ◽

Weighted Kappa ◽

Multinomial Data

Download Full-text

Spinal Instability Neoplastic Score (SINS): Reliability Among Spine Fellows and Resident Physicians in Orthopedic Surgery and Neurosurgery

Global Spine Journal ◽

10.1177/2192568217697691 ◽

2017 ◽

Vol 7 (8) ◽

pp. 744-748 ◽

Cited By ~ 24

Author(s):

Shandy Fox ◽

Michael Spiess ◽

Luke Hnenny ◽

Daryl. R. Fourney

Keyword(s):

Bone Quality ◽

Vertebral Body ◽

Bone Lesion ◽

Spinal Instability ◽

Kappa Statistics ◽

Educational Tool ◽

Intraobserver Agreement ◽

Cohen’S Kappa ◽

Perfect Agreement ◽

Cohen's Kappa

Study Design: Reliability analysis. Objectives: The Spinal Instability Neoplastic Score (SINS) was developed for assessing patients with spinal neoplasia. It identifies patients who may benefit from surgical consultation or intervention. It also acts as a prognostic tool for surgical decision making. Reliability of SINS has been established for spine surgeons, radiologists, and radiation oncologists, but not yet among spine surgery trainees. The purpose of our study is to determine the reliability of SINS among spine residents and fellows, and its role as an educational tool. Methods: Twenty-three residents and 2 spine fellows independently scored 30 de-identified spine tumor cases on 2 occasions, at least 6 weeks apart. Intraclass correlation coefficient (ICC) measured interobserver and intraobserver agreement for total SINS scores. Fleiss’s kappa and Cohen’s kappa analysis evaluated interobserver and intraobserver agreement of 6 component subscores (location, pain, bone lesion quality, spinal alignment, vertebral body collapse, and posterolateral involvement of spinal elements). Results: Total SINS scores showed near perfect interobserver (0.990) and intraobserver (0.907) agreement. Fleiss’s kappa statistics revealed near perfect agreement for location; substantial for pain; moderate for alignment, vertebral body collapse, and posterolateral involvement; and fair for bone quality (0.948, 0.739, 0.427, 0.550, 0.435, and 0.382). Cohen’s kappa statistics revealed near perfect agreement for location and pain, substantial for alignment and vertebral body collapse, and moderate for bone quality and posterolateral involvement (0.954, 0.814, 0.610, 0.671, 0.576, and 0.561, respectively). Conclusions: The SINS is a reliable and valuable educational tool for spine fellows and residents learning to judge spinal instability.

Download Full-text

Measuring Agreement for Ordered Ratings in 3 x 3 Tables

Methods of Information in Medicine ◽

10.1055/s-0038-1634116 ◽

2006 ◽

Vol 45 (05) ◽

pp. 541-547 ◽

Cited By ~ 3

Author(s):

P. Aubas ◽

F. Seguret ◽

A. Kramar ◽

P. Dujols ◽

D. Neveu

Keyword(s):

Qualitative Agreement ◽

Kappa Statistic ◽

Weighted Kappa ◽

Kappa Statistics ◽

Data Sets ◽

Kappa Index ◽

Qualitative Variable ◽

Weighted Kappa Statistic ◽

The Difference ◽

Measuring Agreement

Summary Objectives: When two raters consider a qualitative variable ordered according to three categories, the qualitative agreement is commonly assessed with a symmetrically weighted kappa statistic. However, these statistics can present paradoxes, since they may be insensitive to variations of either complete agreements or disagreements. Methods: Agreement may be summarized by the relative amounts of complete agreements, partial and maximal disagreements beyond chance. Fixing the marginal totals and the trace, we computed symmetrically weighted kappa statistics and we developed a new statistic for qualitative agreements. Data sets from the literature were used to illustrate the methods. Results: We show that agreement may be better assessed with the unweighted kappa index, κc, and a new statistic ζ, which assesses the excess of maximal disagreements with respect to the partial ones, and does not depend on a particular weighting system. When ζis equal to zero, maximal and partial disagreements beyond chance are equal. With its estimated large sample variance, we compared the values of two contingency tables. Conclusions: The (κc, ζ) pair is sensitive to variations in agreements and/or disagreements and enables locating the difference between two qualitative agreements. The qualitative agreement is better with increasing values of κc and ζ.

Download Full-text