scholarly journals A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples

2013 ◽  
Vol 13 (1) ◽  
Author(s):  
Nahathai Wongpakaran ◽  
Tinakon Wongpakaran ◽  
Danny Wedding ◽  
Kilem L Gwet
2019 ◽  
Author(s):  
Peter Tallberg ◽  
Randi Ulberg ◽  
Hanne-Sofie Johnsen Dahl ◽  
Per Andreas Høglend

Abstract Background: Creating a case formulation is an important and basic skill in psychotherapy meant to guide treatment. A patient’s interpersonal pattern is an essential part of a case formulation. Core Conflictual Relationship Theme (CCRT) is a well-known structured method to describe interpersonal patterns. The CCRT method is based on the assumption that humans display a central relationship theme, which is shown in most relationships as well as in the patient-therapist relation. The CCRT scoring is based on how the patient describes interactions with others, in therapy sessions or in a specific interview. These descriptions are transcribed. Raters then score the identified relational episodes by choosing elements from the clustered categories of Wishes, Response from Others and Response from Self. The method has shown high validity and reliability. Inter rater reliability is generally good: Cohen’s kappa ranging from 0.55 to 0.70. To decide CCRT pattern from transcribed material is time consuming and labour intensive This study investigates a labour- and timesaving version of the method.Methods: This study aimed to investigate rater agreement in a simplified method of scoring the CCRT, based directly on live semi-structured dynamic interviews without transcribing the material. 52 patients referred for psychotherapy in a clinical trial, were scored for CCRT pattern. Based on information that came forth during the two-hour interview, raters scored the patients choosing elements from the clustered categories of Wishes, Response from Others and Response from Self. More than one category in each component could be chosen without ranking. 5 raters compared two by two were investigated. Inter rater reliability was measured by Cohen’s kappa.Results: Mean kappa for Wishes, Response from Others and Response from Self was .33, .44 and .45 respectively. Mean kappa for CCRT in total was .44 among 5 raters.Conclusion: In this simplified method to score the CCRT based on oral dynamic interviews, fair to moderate IRR was obtained.


2020 ◽  
pp. 084456212092051
Author(s):  
Veronique Boscart ◽  
Linda Sheiban Taucar ◽  
Michelle Heyer ◽  
Tabitha Kellendonk ◽  
Keia Johnson ◽  
...  

Background Older adults are the biggest users of emergency departments and hospitals. However, healthcare professionals are often ill equipped to conduct comprehensive geriatric assessments causing missed opportunities for preventing adverse outcomes. Purpose To evaluate the inter-rater reliability of the interRAI Acute Care (AC) instrument for hospitalized older adults in two acute care hospitals in Ontario, Canada. Methods This descriptive study focused on evaluating the interRAI AC instrument, which was designed to facilitate a comprehensive nursing assessment for hospitalized seniors. Sample characteristics were described, and Cohen’s Kappa was calculated to derive the inter-rater reliability. Assessment times to complete the instrument were collected as well. Results The Cohen’s Kappa score for the instrument was 0.96. Many older adults who were interviewed had several challenges, including multimorbidity, polypharmacy, and lack of home support. The average time required for nurses to complete the interRAI AC instrument was 22 min. Conclusions The interRAI AC instrument is reliable for use by trained nurses to conduct a comprehensive assessment. This instrument offers a standardized and efficient approach to assess for care and intervention priorities and could prevent adverse outcomes in hospitalized older adults.


2020 ◽  
Vol 3 ◽  
pp. 31
Author(s):  
Cathal A. Cadogan ◽  
Audrey Rankin ◽  
Simon Lewin ◽  
Carmel M. Hughes

Background: The intervention Complexity Assessment Tool for Systematic Reviews (iCAT_SR) has been developed to facilitate detailed assessments of intervention complexity in systematic reviews. Worked examples of the tool’s application are needed to promote its use and refinement. The aim of this case study was to apply the iCAT_SR to a subset of 20 studies included in a Cochrane review of interventions aimed at improving appropriate polypharmacy in older people. Methods: Interventions were assessed independently by two authors using the six core iCAT_SR dimensions: (1) ‘Target organisational levels/categories’; (2) ‘Target behaviour/actions’; (3) ‘Active intervention components’; (4) ‘Degree of tailoring’; (5) ‘Level of skill required by intervention deliverers’; (6) ‘Level of skill required by intervention recipients’. Attempts were made to apply four optional dimensions: ‘Interaction between intervention components’; ‘Context/setting’; ‘Recipient/provider factors’; ‘Nature of causal pathway’. Inter-rater reliability was assessed using Cohen’s Kappa coefficient. Disagreements were resolved by consensus discussion. The findings are presented narratively. Results: Assessments involving the core iCAT_SR dimensions showed limited consistency in intervention complexity across included studies, even when categorised according to clinical setting. Interventions were delivered across various organisational levels and categories (i.e. healthcare professionals and patients) and typically comprised multiple components. Intermediate skill levels were required by those delivering and receiving the interventions across all studies. A lack of detail in study reports precluded application of the iCAT_SR’s optional dimensions. The inter-rater reliability was substantial (Cohen's Kappa = 0.75) Conclusions: This study describes the application of the iCAT_SR to studies included in a Cochrane systematic review. Future intervention studies need to ensure more detailed reporting of interventions, context and the causal pathways underlying intervention effects to allow a more holistic understanding of intervention complexity and facilitate replication in other settings. The experience gained has helped to refine the original guidance document relating to the application of iCAT_SR.


2020 ◽  
Author(s):  
Peter Tallberg ◽  
Randi Ulberg ◽  
Hanne-Sofie Johnsen Dahl ◽  
Per Andreas Høglend

Abstract Background: Creating a case formulation is an important and basic skill in psychotherapy meant to guide treatment. A patient’s interpersonal pattern is an essential part of a case formulation. Core Conflictual Relationship Theme (CCRT) is a well-known structured method to describe interpersonal patterns. The CCRT method is based on the assumption that humans display a central relationship theme, which is shown in most relationships as well as in the patient-therapist relation. The CCRT scoring is based on how the patient describes interactions with others, in therapy sessions or in a specific interview. These descriptions are transcribed. Raters then score the identified relational episodes by choosing elements from the clustered categories of Wishes, Response from Others and Response from Self. The method has shown high validity and reliability. Inter rater reliability is generally good: Cohen’s kappa ranging from 0.55 to 0.70.To decide CCRT pattern from transcribed material is time consuming and labour intensive This study investigates a labour- and timesaving version of the method.Methods: This study aimed to investigate rater agreement in a simplified method of scoring the CCRT, based directly on live semi-structured dynamic interviews without transcribing the material. 52 patients referred for psychotherapy in a clinical trial, were scored for CCRT pattern. Based on information that came forth during the two-hour interview, raters scored the patients choosing elements from the clustered categories of Wishes, Response from Others and Response from Self. More than one category in each component could be chosen without ranking. 5 raters compared two by two were investigated. Inter rater reliability was measured by Cohen’s kappa.Results: Mean kappa for Wishes, Response from Others and Response from Self was .33, .44 and .45 respectively. Mean kappa for CCRT in total was .44 among 5 raters.Conclusion: In this simplified method to score the CCRT based on oral dynamic interviews, fair to moderate IRR was obtained. Trial Registration: First Experimental Study of Transference-interpretations (FEST307/95) Registration number: ClinicalTrials.gov Identifier: NCT00423462.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 747-747
Author(s):  
Sohyun Kim

Abstract Understanding communication behaviors between persons living with dementia and family caregivers is essential for meaningful social interaction and decrease problematic behaviors and caregiving burden. The purpose of this study was to develop and test the psychometric properties of a coding scheme for dementia care interactions. The coding scheme items were developed from literature and expert review, and the pilot testing on 16 video-recorded interactions. A secondary analysis was conducted using 77 videos from 21 dyads of dementia family interactions naturally occurred in the participant’s home. The final coding scheme consists of 11 codes for persons living with dementia (6 nonverbal and 5 verbal) and 12 codes for family caregivers (7 nonverbal and 5 verbal). Content validity was excellent (I-CVI = .93, S-CVI/UA = .71, S-CVI/Ave = .93 with 6 experts). Inter-item correlation was acceptable for both caregiver codes (positive nonverbal = .21, positive verbal = .15, negative nonverbal = .36, negative verbal = .29), and patient codes (positive nonverbal = .13, positive verbal = .27, negative nonverbal = .15, negative verbal = .18). Intra-rater reliability (Cohen’s Kappa = .83, percentage of agreement = 83.88%) and inter-rater reliability (Cohen’s Kappa = .81, percentage of agreement = 81.75%) were excellent. Findings suggest the preliminary psychometric properties of the newly developed coding scheme to assess dyadic interactions of persons living with dementia and their informal caregiver in-home care situations. Future testing of the coding scheme for application in communication interventions to improve quality social interaction in dementia care is discussed.


2021 ◽  
Vol 55 (3) ◽  
Author(s):  
Jose Ma D. Bautista ◽  
Peter B. Bernardo ◽  
Mark Anthony R. Ruanto

Objective. The study aims to assess the similarity between the results of the evaluation of students during an Objective Structured Clinical Examination (OSCE) and a video recording of the same OSCE (VOSCE). Methods. All Orthopedic surgeon preceptors in the actual OSCE were recruited to the study. Video recordings of the students taking the OSCE were collected and later reviewed and re-evaluated by the same preceptor after at least four weeks. The grades of actual OSCE and VOSCE were collected and analyzed using Cohen’s kappa coefficient. Results. High variability of intra-rater reliability was observed in different preceptors and station (slight agreement to perfect agreement). Overall intra-rater reliability between actual and video OSCE showed moderate agreement with Cohen’s kappa coefficient equal to 0.43 (n-219). Conclusion. Video OSCE is a reliable tool in assessing student clinical skills and knowledge in the musculoskeletal examination. Some factors have been suggested to further improve reliability.


Author(s):  
Miriam Athmann ◽  
Roya Bornhütter ◽  
Nicolaas Busscher ◽  
Paul Doesburg ◽  
Uwe Geier ◽  
...  

AbstractIn the image forming methods, copper chloride crystallization (CCCryst), capillary dynamolysis (CapDyn), and circular chromatography (CChrom), characteristic patterns emerge in response to different food extracts. These patterns reflect the resistance to decomposition as an aspect of resilience and are therefore used in product quality assessment complementary to chemical analyses. In the presented study, rocket lettuce from a field trial with different radiation intensities, nitrogen supply, biodynamic, organic and mineral fertilization, and with or without horn silica application was investigated with all three image forming methods. The main objective was to compare two different evaluation approaches, differing in the type of image forming method leading the evaluation, the amount of factors analyzed, and the deployed perceptual strategy: Firstly, image evaluation of samples from all four experimental factors simultaneously by two individual evaluators was based mainly on analyzing structural features in CapDyn (analytical perception). Secondly, a panel of eight evaluators applied a Gestalt evaluation imbued with a kinesthetic engagement of CCCryst patterns from either fertilization treatments or horn silica treatments, followed by a confirmatory analysis of individual structural features. With the analytical approach, samples from different radiation intensities and N supply levels were identified correctly in two out of two sample sets with groups of five samples per treatment each (Cohen’s kappa, p = 0.0079), and the two organic fertilizer treatments were differentiated from the mineral fertilizer treatment in eight out of eight sample sets with groups of three manure and two minerally fertilized samples each (Cohen’s kappa, p = 0.0048). With the panel approach based on Gestalt evaluation, biodynamic fertilization was differentiated from organic and mineral fertilization in two out of two exams with 16 comparisons each (Friedman test, p < 0.001), and samples with horn silica application were successfully identified in two out of two exams with 32 comparisons each (Friedman test, p < 0.001). Further research will show which properties of the food decisive for resistance to decomposition are reflected by analytical and Gestalt criteria, respectively, in CCCryst and CapDyn images.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Alexandre Maciel-Guerra ◽  
Necati Esener ◽  
Katharina Giebel ◽  
Daniel Lea ◽  
Martin J. Green ◽  
...  

AbstractStreptococcus uberis is one of the leading pathogens causing mastitis worldwide. Identification of S. uberis strains that fail to respond to treatment with antibiotics is essential for better decision making and treatment selection. We demonstrate that the combination of supervised machine learning and matrix-assisted laser desorption ionization/time of flight (MALDI-TOF) mass spectrometry can discriminate strains of S. uberis causing clinical mastitis that are likely to be responsive or unresponsive to treatment. Diagnostics prediction systems trained on 90 individuals from 26 different farms achieved up to 86.2% and 71.5% in terms of accuracy and Cohen’s kappa. The performance was further increased by adding metadata (parity, somatic cell count of previous lactation and count of positive mastitis cases) to encoded MALDI-TOF spectra, which increased accuracy and Cohen’s kappa to 92.2% and 84.1% respectively. A computational framework integrating protein–protein networks and structural protein information to the machine learning results unveiled the molecular determinants underlying the responsive and unresponsive phenotypes.


Author(s):  
Maximilian Lutz ◽  
Martin Möckel ◽  
Tobias Lindner ◽  
Christoph J. Ploner ◽  
Mischa Braun ◽  
...  

Abstract Background Management of patients with coma of unknown etiology (CUE) is a major challenge in most emergency departments (EDs). CUE is associated with a high mortality and a wide variety of pathologies that require differential therapies. A suspected diagnosis issued by pre-hospital emergency care providers often drives the first approach to these patients. We aim to determine the accuracy and value of the initial diagnostic hypothesis in patients with CUE. Methods Consecutive ED patients presenting with CUE were prospectively enrolled. We obtained the suspected diagnoses or working hypotheses from standardized reports given by prehospital emergency care providers, both paramedics and emergency physicians. Suspected and final diagnoses were classified into I) acute primary brain lesions, II) primary brain pathologies without acute lesions and III) pathologies that affected the brain secondarily. We compared suspected and final diagnosis with percent agreement and Cohen’s Kappa including sub-group analyses for paramedics and physicians. Furthermore, we tested the value of suspected and final diagnoses as predictors for mortality with binary logistic regression models. Results Overall, suspected and final diagnoses matched in 62% of 835 enrolled patients. Cohen’s Kappa showed a value of κ = .415 (95% CI .361–.469, p < .005). There was no relevant difference in diagnostic accuracy between paramedics and physicians. Suspected diagnoses did not significantly interact with in-hospital mortality (e.g., suspected class I: OR .982, 95% CI .518–1.836) while final diagnoses interacted strongly (e.g., final class I: OR 5.425, 95% CI 3.409–8.633). Conclusion In cases of CUE, the suspected diagnosis is unreliable, regardless of different pre-hospital care providers’ qualifications. It is not an appropriate decision-making tool as it neither sufficiently predicts the final diagnosis nor detects the especially critical comatose patient. To avoid the risk of mistriage and unnecessarily delayed therapy, we advocate for a standardized diagnostic work-up for all CUE patients that should be triggered by the emergency symptom alone and not by any suspected diagnosis.


2021 ◽  
Vol 11 (6) ◽  
pp. 2723
Author(s):  
Fatih Uysal ◽  
Fırat Hardalaç ◽  
Ozan Peker ◽  
Tolga Tolunay ◽  
Nil Tokgöz

Fractures occur in the shoulder area, which has a wider range of motion than other joints in the body, for various reasons. To diagnose these fractures, data gathered from X-radiation (X-ray), magnetic resonance imaging (MRI), or computed tomography (CT) are used. This study aims to help physicians by classifying shoulder images taken from X-ray devices as fracture/non-fracture with artificial intelligence. For this purpose, the performances of 26 deep learning-based pre-trained models in the detection of shoulder fractures were evaluated on the musculoskeletal radiographs (MURA) dataset, and two ensemble learning models (EL1 and EL2) were developed. The pre-trained models used are ResNet, ResNeXt, DenseNet, VGG, Inception, MobileNet, and their spinal fully connected (Spinal FC) versions. In the EL1 and EL2 models developed using pre-trained models with the best performance, test accuracy was 0.8455, 0.8472, Cohen’s kappa was 0.6907, 0.6942 and the area that was related with fracture class under the receiver operating characteristic (ROC) curve (AUC) was 0.8862, 0.8695. As a result of 28 different classifications in total, the highest test accuracy and Cohen’s kappa values were obtained in the EL2 model, and the highest AUC value was obtained in the EL1 model.


Sign in / Sign up

Export Citation Format

Share Document