The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability

1973 ◽  
Vol 33 (3) ◽  
pp. 613-619 ◽  
Author(s):  
Joseph L. Fleiss ◽  
Jacob Cohen
1995 ◽  
Vol 40 (2) ◽  
pp. 60-66 ◽  
Author(s):  
L. Streiner David

Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.


2017 ◽  
Vol 48 (01) ◽  
pp. 24-38 ◽  
Author(s):  
Courtney Ryder ◽  
Tamara Mackean ◽  
Shahid Ullah ◽  
Heather Burton ◽  
Heather Halls ◽  
...  

Socially accountable health curricula, designed to decrease Aboriginal health inequities through the transformation of health professional students into culturally safe practitioners, has become a focal point for health professional programmes. Despite this inclusion in health curricula there remains the question of how to best assess students in this area in relation to the concept, of cultural safety and transformative unlearning, to facilitate attitudinal change. To address this question, this study developed a research questionnaire to measure thematic areas of transformative unlearning, cultural safety and critical thinking in Aboriginal Health for application on undergraduate and postgraduate students and faculty staff. The Likert-scale questionnaire was developed and validated through face and content validity. Test–retest methodology was utilised to determine stability and reliability of the questionnaire with 40 participants. The extent of agreement and reliability were determined through weighted kappa and intraclass correlation coefficient. Exploratory factor analysis was calculated to determine construct validity for questionnaire items. For the overall population subset the tool met good standards of reliability and validity, with 11 of the 15 items reaching moderate agreement (κ > 0.6) and an intraclass correlation coefficient of 0.72, suggesting substantial agreement. Cronbach's alpha was calculated above 0.7 for the thematic areas. The majority of items provided high factor loadings, low loading items will be reviewed to strengthen the tool, where validations of the revised tool with a larger cohort will allow future use to compare and determine effective teaching methodologies in Aboriginal health and cultural safety curricula.


2017 ◽  
Vol 27 (5) ◽  
pp. 365-372 ◽  
Author(s):  
Julian Edbrooke-Childs ◽  
Jacqueline Hayes ◽  
Evelyn Sharples ◽  
Dawid Gondek ◽  
Emily Stapley ◽  
...  

Background‘Situation Awareness For Everyone’ (SAFE) was a 3-year project which aimed to improve situation awareness in clinical teams in order to detect potential deterioration and other potential risks to children on hospital wards. The key intervention was the ‘huddle’, a structured case management discussion which is central to facilitating situation awareness. This study aimed to develop an observational assessment tool to assess the team processes occurring during huddles, including the effectiveness of the huddle.MethodsA cross-sectional observational design was used to psychometrically develop the ‘Huddle Observation Tool’ (HOT) over three phases using standardised psychometric methodology. Huddles were observed across four NHS paediatric wards participating in SAFE by five researchers; two wards within specialist children hospitals and two within district general hospitals, with location, number of beds and length of stay considered to make the sample as heterogeneous as possible. Inter-rater reliability was calculated using the weighted kappa and intraclass correlation coefficient.ResultsInter-rater reliability was acceptable for the collaborative culture (weighted kappa=0.32, 95% CI 0.17 to 0.42), environment items (weighted kappa=0.78, 95% CI 0.52 to 1) and total score (intraclass correlation coefficient=0.87, 95% CI 0.68 to 0.95). It was lower for the structure and risk management items, suggesting that these were more variable in how observers rated them. However, agreement on the global score for huddles was acceptable.ConclusionWe developed an observational assessment tool to assess the team processes occurring during huddles, including the effectiveness of the huddle. Future research should examine whether observational evaluations of huddles are associated with other indicators of safety on clinical wards (eg, safety climate and incidents of patient harm), and whether scores on the HOT are associated with improved situation awareness and reductions in deterioration and adverse events in clinical settings, such as inpatient wards.


2021 ◽  
Vol 34 ◽  
Author(s):  
Graciella CHIARELLI ◽  
Doroteia Aparecida HÖFELMANN ◽  
João Luiz Gurgel Calvet da SILVEIRA ◽  
Maria Urania ALVES ◽  
Luciane Coutinho de AZEVEDO

ABSTRACT Objective This study evaluated reproducibility, relative validity, using a 24-hour recall questionnaire as a reference standard, and estimated calibration factors for a food frequency questionnaire adapted for use with German descendants living in Brazil. Methods The target population consisted of 50 volunteers, of both genders, aged over 20 years, living in a German colonization city in southern Brazil. The food frequency questionnaire was applied twice, in the first and third months of the investigation. During this period, three 24-hour recalls were applied, with an interval of one month between them. Reproducibility was estimated by the intraclass correlation coefficient. Validity was tested by the intraclass correlation coefficient, weighted kappa test and Bland-Altman method. Calibration factors were estimated using linear regression. Results Among the food frequency questionnaires, there was a strong correlation for energy and most of the nutrients corrected for energy. There was a weak correlation between a food frequency questionnaire and a 24-hour dietary recall. However, the exact concordance in the categorization in tertiles among the instruments ranged from 28% (vitamin A) to 52% (fiber and potassium). Gross values of the food frequency questionnaire were reduced with the calibration and approached the consumption data estimated by the 24-hour dietary recall. Conclusions The food frequency questionnaire showed good reproducibility, however, weak correlation with the 24-hour dietary recall. The calibration of the data obtained by the food frequency questionnaire brought them closer to the reference method.


2016 ◽  
Vol 2016 ◽  
pp. 1-6
Author(s):  
Bo Zhang ◽  
Jianjun Gu ◽  
Xiaoxiao Zhang ◽  
Bin Yang ◽  
Zheng Wang ◽  
...  

Purpose. To explore the probability and variation in biomechanical measurements of rabbit cornea by a modified Scheimpflug device.Methods. A modified Scheimpflug device was developed by imaging anterior segment of the model imitating the intact eye at various posterior pressures. The eight isolated rabbit corneas were mounted on the Barron artificial chamber and images of the anterior segment were taken at posterior pressures of 15, 30, 45, 60, and 75 mmHg by the device. The repeatability and reliability of the parameters including CCT, ACD, ACV, and CV were evaluated at each posterior pressure. All the variations of the parameters at the different posterior pressures were calculated.Results. All parameters showed good intraobserver reliability (Cronbach’s alpha; intraclass correlation coefficient,α, ICC > 0.96) and repeatability in the modified Scheimpflug device. With the increase of posterior pressures, the ratio of CCT decreased linearly and the bulk modulus gradually reduced to a platform. The increase of ACD was almost linear with the posterior pressures elevated.Conclusions. The modified Scheimpflug device was a valuable tool to investigate the biomechanics of the cornea. The posterior pressure 15–75 mmHg range produced small viscoelastic deformations and nearly linear pressure-deformation response in the rabbit cornea.


Nutrients ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 1163
Author(s):  
Suzana Shahar ◽  
Mohd Razif Shahril ◽  
Noraidatulakma Abdullah ◽  
Boekhtiar Borhanuddin ◽  
Mohd Arman Kamaruddin ◽  
...  

Measuring dietary intakes in a multi-ethnic and multicultural setting, such as Malaysia, remains a challenge due to its diversity. This study aims to develop and evaluate the relative validity of an interviewer-administered food frequency questionnaire (FFQ) in assessing the habitual dietary exposure of The Malaysian Cohort (TMC) participants. We developed a nutrient database (with 203 items) based on various food consumption tables, and 803 participants were involved in this study. The output of the FFQ was then validated against three-day 24-h dietary recalls (n = 64). We assessed the relative validity and its agreement using various methods, such as Spearman’s correlation, weighed Kappa, intraclass correlation coefficient (ICC), and Bland–Altman analysis. Spearman’s correlation coefficient ranged from 0.24 (vitamin C) to 0.46 (carbohydrate), and almost all nutrients had correlation coefficients above 0.3, except for vitamin C and sodium. Intraclass correlation coefficients ranged from −0.01 (calcium) to 0.59 (carbohydrates), and weighted Kappa exceeded 0.4 for 50% of nutrients. In short, TMC’s FFQ appears to have good relative validity for the assessment of nutrient intake among its participants, as compared to the three-day 24-h dietary recalls. However, estimates for iron, vitamin A, and vitamin C should be interpreted with caution.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Lloyd Roberts ◽  
Tom Rozen ◽  
Deirdre Murphy ◽  
Adam Lawler ◽  
Mark Fitzgerald ◽  
...  

Abstract Background Multiple screening Duplex ultrasound scans (DUS) are performed in trauma patients at high risk of deep vein thrombosis (DVT) in the intensive care unit (ICU). Intensive care physician performed compression ultrasound (IP-CUS) has shown promise as a diagnostic test for DVT in a non-trauma setting. Whether IP-CUS can be used as a screening test in trauma patients is unknown. Our study aimed to assess the agreement between IP-CUS and vascular sonographer performed DUS for proximal lower extremity deep vein thrombosis (PLEDVT) screening in high-risk trauma patients in ICU. Methods A prospective observational study was conducted at the ICU of Alfred Hospital, a major trauma center in Melbourne, Australia, between Feb and Nov 2015. All adult major trauma patients admitted with high risk for DVT were eligible for inclusion. IP-CUS was performed immediately before or after DUS for PLEDVT screening. The paired studies were repeated twice weekly until the DVT diagnosis, death or ICU discharge. Written informed consent from the patient, or person responsible, or procedural authorisation, was obtained. The individuals performing the scans were blinded to the others’ results. The agreement analysis was performed using Cohen’s Kappa statistics and intraclass correlation coefficient for repeated binary measurements. Results During the study period, 117 patients had 193 pairs of scans, and 45 (39%) patients had more than one pair of scans. The median age (IQR) was 47 (28–68) years with 77% males, mean (SD) injury severity score 27.5 (9.53), and a median (IQR) ICU length of stay 7 (3.2–11.6) days. There were 16 cases (13.6%) of PLEDVT with an incidence rate of 2.6 (1.6–4.2) cases per 100 patient-days in ICU. The overall agreement was 96.7% (95% CI 94.15–99.33). The Cohen’s Kappa between the IP-CUS and DUS was 0.77 (95% CI 0.59–0.95), and the intraclass correlation coefficient for repeated binary measures was 0.75 (95% CI 0.67–0.81). Conclusions There is a substantial agreement between IP-CUS and DUS for PLEDVT screening in trauma patients in ICU with high risk for DVT. Large multicentre studies are needed to confirm this finding.


Author(s):  
Daniela Claessens ◽  
Alexander K. Schuster ◽  
Ronald V. Krüger ◽  
Marian Liegl ◽  
Laila Singh ◽  
...  

AbstractIn this study, the test-retest-reliability as one aspect of reliability of metamorphopsia measurements using a computer-based measuring method was determined in patients with macular diseases. Metamorphopsia amplitude, position, and area were quantified using AMD – A Metamorphopsia Detector software (app4eyes GmbH & Co. KG, Germany) in patients with diabetic, myopic, or uveitic macular edema, intermediate or neovascular age-associated macular degeneration, epiretinal membrane, vitelliform maculopathy, Irvine-Gass syndrome, or macular edema due to venous retinal occlusion. The intraclass correlation coefficient (ICC) was calculated in order to determine the repeatability of two repeated measurements and was used as an indicator of the reliability of the measurements. In this study, metamorphopsia measurements were conducted on 36 eyes with macular diseases. Metamorphopsia measurements made using AMD – A Metamorphopsia Detector software were highly reliable and repeatable in patients with maculopathies. The intraclass correlation coefficient of all indices was excellent (0.95 – 0.97). For diseases of the vitreoretinal interface or macular diseases with intra- or subretinal edema, this metamorphopsia measurement represents a supplement for visual function testing in the clinic, as well as in clinical studies.


Sign in / Sign up

Export Citation Format

Share Document