Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α

Cohen’s kappa is a widely used association coefficient for summarizing interrater agreement on a nominal scale. Kappa reduces the ratings of the two observers to a single number. With three or more categories it is more informative to summarize the ratings by category coefficients that describe the information for each category separately. Examples of category coefficients are the sensitivity or specificity of a category or the Bloch-Kraemer weighted kappa. However, in many research studies one is often only interested in a single overall number that roughly summarizes the agreement. It is shown that both the overall observed agreement and Cohen’s kappa are weighted averages of various category coefficients and thus can be used to summarize these category coefficients.

Download Full-text

Histology as a Valid and Reliable Tool To Differentiate Fresh from Frozen-Thawed Fish

Journal of Food Protection ◽

10.4315/0362-028x.jfp-12-035 ◽

2012 ◽

Vol 75 (8) ◽

pp. 1536-1541 ◽

Cited By ~ 9

Author(s):

E. BOZZETTA ◽

M. PEZZOLATO ◽

E. CENCETTI ◽

K. VARELLO ◽

F. ABRAMO ◽

...

Keyword(s):

Fish Species ◽

Interrater Agreement ◽

Cohen’S Kappa ◽

Method Performance ◽

Tissue Samples ◽

Fish Products ◽

Cohen's Kappa ◽

Accuracy And Precision ◽

Fish Samples ◽

Method Accuracy

Selling fish products as fresh when they have actually been frozen and thawed is a common fraudulent practice in seafood retailing. Unlike fish products frozen to protect them against degenerative changes during transportation and to extend the product's storage life, fish intended for raw consumption in European countries must be previously frozen at −20°C for at least 24 h to kill parasites. The aim of this study was to use histological analysis to distinguish between fresh and frozen-thawed fish and to evaluate this method for use as a routine screening technique in compliance with the requirements of European Commission Regulation No. 882/2004 on official food and feed controls. Method performance (i.e., accuracy and precision) was evaluated on tissue samples from three common Mediterranean fish species; the evaluation was subsequently extended to include samples from 35 fish species in a second experiment to test for method robustness. Method accuracy was tested by comparing histological results against a “gold standard” obtained from the analysis of frozen and unfrozen fish samples prepared for the study. Method precision was evaluated according to interrater agreement (i.e., three laboratories with expertise in histopathology in the first experiment and three expert analysts in the second experiment) by estimating Cohen's kappa (and corresponding 95% confidence intervals) for each pair of laboratories and experts and the combined Cohen's kappa for all three experts and laboratories. The observed interrater agreement among the three laboratories and the three experts indicated high levels of method accuracy and precision (high sensitivity and specificity) and method reproducibility. Our results suggest that histology is a rapid, simple, and highly accurate method for distinguishing between fresh and frozen-thawed fish, regardless of the fish species analyzed.

Download Full-text

Interrater agreement statistics with skewed data: Evaluation of alternatives to Cohen’s kappa.

Journal of Consulting and Clinical Psychology ◽

10.1037/a0037489 ◽

2014 ◽

Vol 82 (6) ◽

pp. 1219-1227 ◽

Cited By ~ 37

Author(s):

Shu Xu ◽

Michael F. Lorber

Keyword(s):

Data Evaluation ◽

Interrater Agreement ◽

Skewed Data ◽

Cohen’S Kappa ◽

Cohen's Kappa

Download Full-text

Supplemental Material for Interrater Agreement Statistics With Skewed Data: Evaluation of Alternatives to Cohen’s Kappa

Journal of Consulting and Clinical Psychology ◽

10.1037/a0037489.supp ◽

2014 ◽

Keyword(s):

Data Evaluation ◽

Interrater Agreement ◽

Skewed Data ◽

Cohen’S Kappa ◽

Cohen's Kappa

Download Full-text

Abstract WP276: Simplification of a Prehospital Short NIHSS Scale Does not Increase Interrater Agreement Between Emergency Medical Services and Stroke Specialists

Stroke ◽

10.1161/str.48.suppl_1.wp276 ◽

2017 ◽

Vol 48 (suppl_1) ◽

Author(s):

Jelle Demeestere ◽

Carlos Garcia-Esperon ◽

Longting Lin ◽

Allan Loudfoot ◽

Andrew Bivard ◽

...

Keyword(s):

Emergency Medical Services ◽

Weighted Kappa ◽

Interrater Agreement ◽

Medical Services ◽

Kappa Statistics ◽

Single Centre ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Emergency Medical ◽

Patient Arrival

Objective: To assess if simplifying a prehospital 8-item NIHSS scale (NIHSS-8, fig 1) to a 0 (symptom absent) – 1 (symptom present) scoring system increases interrater agreement between emergency medical services (EMS) and stroke specialists. Methods: We analysed interrater agreement between EMS and stroke specialists of a single centre on a prospectively collected cohort of 64 suspected acute ischemic stroke patients. EMS performed NIHSS-8 score upon patient arrival at the emergency department. The stroke specialist scored the full 15-item NIHSS blind to the EMS scores and within 5 minutes of patient arrival. Linear-weighted Cohen’s kappa statistics was used to assess agreement between EMS and stroke specialist on the total NIHSS-8 score and each NIHSS-8 scale item. We then simplified each item to a 0-1 score and reassessed interrater agreement for the overall NIHSS-8 scale using linear-weighted Cohen’s kappa statistics and for each NIHSS-8 item using Cohen’s kappa statistics. We used Cohen’s kappa statistics to assess agreement for original and simplified NIHSS-8 cut-off scores. Results: EMS and stroke specialist reached substantial agreement on overall NIHSS-8 scoring (linear-weighted kappa 0.69). Optimum agreement was reached for right arm weakness (linear-weighted kappa 0.79; Table 1) and a cut-off score of 2 and 5 (Cohen’s kappa 0.78; Table 2). When the score was simplified to a 0-1, overall agreement between EMS and stroke specialists was substantial (linear-weighted kappa 0.65). Optimum agreement was seen for LOC questions (Cohen’s kappa 0.78; Table 1) and a cut-off score of 2 (Cohen’s kappa 0.77; Table 2). Conclusion: Simplifying an 8-item prehospital NIHSS stroke scale does not increase interrater agreement between emergency medical services and stroke specialists.

Download Full-text

An update on image forming methods: structure analysis and Gestalt evaluation of images from rocket lettuce with shading, N supply, organic or mineral fertilization, and biodynamic preparations

Organic Agriculture ◽

10.1007/s13165-021-00347-1 ◽

2021 ◽

Author(s):

Miriam Athmann ◽

Roya Bornhütter ◽

Nicolaas Busscher ◽

Paul Doesburg ◽

Uwe Geier ◽

...

Keyword(s):

Copper Chloride ◽

Organic Fertilizer ◽

Structural Features ◽

Fertilizer Treatment ◽

Friedman Test ◽

Mineral Fertilization ◽

Cohen’S Kappa ◽

N Supply ◽

Cohen's Kappa ◽

Forming Methods

AbstractIn the image forming methods, copper chloride crystallization (CCCryst), capillary dynamolysis (CapDyn), and circular chromatography (CChrom), characteristic patterns emerge in response to different food extracts. These patterns reflect the resistance to decomposition as an aspect of resilience and are therefore used in product quality assessment complementary to chemical analyses. In the presented study, rocket lettuce from a field trial with different radiation intensities, nitrogen supply, biodynamic, organic and mineral fertilization, and with or without horn silica application was investigated with all three image forming methods. The main objective was to compare two different evaluation approaches, differing in the type of image forming method leading the evaluation, the amount of factors analyzed, and the deployed perceptual strategy: Firstly, image evaluation of samples from all four experimental factors simultaneously by two individual evaluators was based mainly on analyzing structural features in CapDyn (analytical perception). Secondly, a panel of eight evaluators applied a Gestalt evaluation imbued with a kinesthetic engagement of CCCryst patterns from either fertilization treatments or horn silica treatments, followed by a confirmatory analysis of individual structural features. With the analytical approach, samples from different radiation intensities and N supply levels were identified correctly in two out of two sample sets with groups of five samples per treatment each (Cohen’s kappa, p = 0.0079), and the two organic fertilizer treatments were differentiated from the mineral fertilizer treatment in eight out of eight sample sets with groups of three manure and two minerally fertilized samples each (Cohen’s kappa, p = 0.0048). With the panel approach based on Gestalt evaluation, biodynamic fertilization was differentiated from organic and mineral fertilization in two out of two exams with 16 comparisons each (Friedman test, p < 0.001), and samples with horn silica application were successfully identified in two out of two exams with 32 comparisons each (Friedman test, p < 0.001). Further research will show which properties of the food decisive for resistance to decomposition are reflected by analytical and Gestalt criteria, respectively, in CCCryst and CapDyn images.

Download Full-text

Prediction of Streptococcus uberis clinical mastitis treatment success in dairy herds by means of mass spectrometry and machine-learning

Scientific Reports ◽

10.1038/s41598-021-87300-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Alexandre Maciel-Guerra ◽

Necati Esener ◽

Katharina Giebel ◽

Daniel Lea ◽

Martin J. Green ◽

...

Keyword(s):

Machine Learning ◽

Mass Spectrometry ◽

Treatment Success ◽

Clinical Mastitis ◽

Supervised Machine Learning ◽

Structural Protein ◽

Cohen’S Kappa ◽

Maldi Tof ◽

Streptococcus Uberis ◽

Cohen's Kappa

AbstractStreptococcus uberis is one of the leading pathogens causing mastitis worldwide. Identification of S. uberis strains that fail to respond to treatment with antibiotics is essential for better decision making and treatment selection. We demonstrate that the combination of supervised machine learning and matrix-assisted laser desorption ionization/time of flight (MALDI-TOF) mass spectrometry can discriminate strains of S. uberis causing clinical mastitis that are likely to be responsive or unresponsive to treatment. Diagnostics prediction systems trained on 90 individuals from 26 different farms achieved up to 86.2% and 71.5% in terms of accuracy and Cohen’s kappa. The performance was further increased by adding metadata (parity, somatic cell count of previous lactation and count of positive mastitis cases) to encoded MALDI-TOF spectra, which increased accuracy and Cohen’s kappa to 92.2% and 84.1% respectively. A computational framework integrating protein–protein networks and structural protein information to the machine learning results unveiled the molecular determinants underlying the responsive and unresponsive phenotypes.

Download Full-text

The accuracy of initial diagnoses in coma: an observational study in 835 patients with non-traumatic disorder of consciousness

Scandinavian Journal of Trauma Resuscitation and Emergency Medicine ◽

10.1186/s13049-020-00822-w ◽

2021 ◽

Vol 29 (1) ◽

Author(s):

Maximilian Lutz ◽

Martin Möckel ◽

Tobias Lindner ◽

Christoph J. Ploner ◽

Mischa Braun ◽

...

Keyword(s):

Emergency Care ◽

Final Diagnosis ◽

Class I ◽

Prehospital Emergency Care ◽

Unknown Etiology ◽

Care Providers ◽

Cohen’S Kappa ◽

Disorder Of Consciousness ◽

Cohen's Kappa ◽

Suspected Diagnosis

Abstract Background Management of patients with coma of unknown etiology (CUE) is a major challenge in most emergency departments (EDs). CUE is associated with a high mortality and a wide variety of pathologies that require differential therapies. A suspected diagnosis issued by pre-hospital emergency care providers often drives the first approach to these patients. We aim to determine the accuracy and value of the initial diagnostic hypothesis in patients with CUE. Methods Consecutive ED patients presenting with CUE were prospectively enrolled. We obtained the suspected diagnoses or working hypotheses from standardized reports given by prehospital emergency care providers, both paramedics and emergency physicians. Suspected and final diagnoses were classified into I) acute primary brain lesions, II) primary brain pathologies without acute lesions and III) pathologies that affected the brain secondarily. We compared suspected and final diagnosis with percent agreement and Cohen’s Kappa including sub-group analyses for paramedics and physicians. Furthermore, we tested the value of suspected and final diagnoses as predictors for mortality with binary logistic regression models. Results Overall, suspected and final diagnoses matched in 62% of 835 enrolled patients. Cohen’s Kappa showed a value of κ = .415 (95% CI .361–.469, p < .005). There was no relevant difference in diagnostic accuracy between paramedics and physicians. Suspected diagnoses did not significantly interact with in-hospital mortality (e.g., suspected class I: OR .982, 95% CI .518–1.836) while final diagnoses interacted strongly (e.g., final class I: OR 5.425, 95% CI 3.409–8.633). Conclusion In cases of CUE, the suspected diagnosis is unreliable, regardless of different pre-hospital care providers’ qualifications. It is not an appropriate decision-making tool as it neither sufficiently predicts the final diagnosis nor detects the especially critical comatose patient. To avoid the risk of mistriage and unnecessarily delayed therapy, we advocate for a standardized diagnostic work-up for all CUE patients that should be triggered by the emergency symptom alone and not by any suspected diagnosis.

Download Full-text

Classification of Shoulder X-ray Images with Deep Learning Ensemble Models

Applied Sciences ◽

10.3390/app11062723 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2723

Author(s):

Fatih Uysal ◽

Fırat Hardalaç ◽

Ozan Peker ◽

Tolga Tolunay ◽

Nil Tokgöz

Keyword(s):

Deep Learning ◽

Performance Test ◽

The Body ◽

Test Accuracy ◽

Cohen’S Kappa ◽

X Ray ◽

Cohen's Kappa ◽

Auc Value ◽

Magnetic Resonance Imaging Mri ◽

Fully Connected

Fractures occur in the shoulder area, which has a wider range of motion than other joints in the body, for various reasons. To diagnose these fractures, data gathered from X-radiation (X-ray), magnetic resonance imaging (MRI), or computed tomography (CT) are used. This study aims to help physicians by classifying shoulder images taken from X-ray devices as fracture/non-fracture with artificial intelligence. For this purpose, the performances of 26 deep learning-based pre-trained models in the detection of shoulder fractures were evaluated on the musculoskeletal radiographs (MURA) dataset, and two ensemble learning models (EL1 and EL2) were developed. The pre-trained models used are ResNet, ResNeXt, DenseNet, VGG, Inception, MobileNet, and their spinal fully connected (Spinal FC) versions. In the EL1 and EL2 models developed using pre-trained models with the best performance, test accuracy was 0.8455, 0.8472, Cohen’s kappa was 0.6907, 0.6942 and the area that was related with fracture class under the receiver operating characteristic (ROC) curve (AUC) was 0.8862, 0.8695. As a result of 28 different classifications in total, the highest test accuracy and Cohen’s kappa values were obtained in the EL2 model, and the highest AUC value was obtained in the EL1 model.

Download Full-text

The South African dysphagia screening tool (SADS): A screening tool for a developing context

South African Journal of Communication Disorders ◽

10.4102/sajcd.v63i1.117 ◽

2016 ◽

Vol 63 (1) ◽

Cited By ~ 2

Author(s):

Calli Ostrofsky ◽

Jaishika Seedat

Keyword(s):

At Risk ◽

Acute Stroke ◽

South African ◽

Screening Tool ◽

Diagnostic Assessment ◽

The South ◽

Stroke Patients ◽

Cohen’S Kappa ◽

Cohen's Kappa ◽

Government Hospitals

Background: Notwithstanding its value, there are challenges and limitations to implementing a dysphagia screening tool from a developed contexts in a developing context. The need for a reliable and valid screening tool for dysphagia that considers context, systemic rules and resources was identified to prevent further medical compromise, optimise dysphagia prognosis and ultimately hasten patients’ return to home or work.Methodology: To establish the validity and reliability of the South African dysphagia screening tool (SADS) for acute stroke patients accessing government hospital services. The study was a quantitative, non-experimental, correlational cross-sectional design with a retrospective component. Convenient sampling was used to recruit 18 speech-language therapists and 63 acute stroke patients from three South African government hospitals. The SADS consists of 20 test items and was administered by speech-language therapists. Screening was followed by a diagnostic dysphagia assessment. The administrator of the tool was not involved in completing the diagnostic assessment, to eliminate bias and prevent contamination of results from screener to diagnostic assessment. Sensitivity, validity and efficacy of the screening tool were evaluated against the results of the diagnostic dysphagia assessment. Cohen’s kappa measures determined inter-rater agreement between the results of the SADS and the diagnostic assessment.Results and conclusion: The SADS was proven to be valid and reliable. Cohen’s kappa indicated a high inter-rater reliability and showed high sensitivity and adequate specificity in detecting dysphagia amongst acute stroke patients who were at risk for dysphagia. The SADS was characterised by concurrent, content and face validity. As a first step in establishing contextual appropriateness, the SADS is a valid and reliable screening tool that is sensitive in identifying stroke patients at risk for dysphagia within government hospitals in South Africa.

Download Full-text