Learning how to Differ: Agreement and Reliability Statistics in Psychiatry

1995 ◽  
Vol 40 (2) ◽  
pp. 60-66 ◽  
Author(s):  
L. Streiner David

Whenever two or more raters evaluate a patient or student, it may be necessary to determine the degree to which they assign the same label or rating to the subject. The major problem in deciding which statistic to use is the plethora of different techniques which are available. This paper reviews some of the more commonly used techniques, such as Raw Agreement, Cohen's kappa and weighted kappa, and shows that, in most circumstances, they can all be replaced by the intraclass correlation coefficient (ICC). This paper also shows how the ICC can be used in situations where the other statistics cannot be used and how to select the best subset of raters.

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Tita Mensah ◽  
Sofia Tranæus ◽  
Andreas Cederlund ◽  
Aron Naimi-Akbar ◽  
Gunilla Klingberg

Abstract Background The Swedish Quality Registry for caries and periodontal disease (SKaPa) automatically collects data on caries and periodontitis from patients’ electronic dental records. Provided the data entries are reliable and accurate, the registry has potential value as a data source for registry-based research. The aim of this study was to evaluate the reliability and accuracy of the SKaPa registry information on dental caries in 6- and 12-year-old children. Method This diagnostic accuracy study compared dental caries data registered at an examination with dental health status registered in the patient’s electronic dental records, and with corresponding data retrieved from the SKaPa registry. Clinical examinations of 170 6- and 12-year-old children were undertaken by one of the researchers in conjunction with the children’s regular annual dental examinations where the number of teeth were registered, and dental caries was diagnosed using ICDAS II. Teeth with fillings were defined as filled and were added to the ICDAS II score and subsequently dft/DFT was calculated for each individual. Cohen’s Kappa, the intraclass correlation coefficient (ICC), and sensitivity and specificity were calculated to test the agreement of the ‘decayed and filled teeth’ in deciduous and permanent teeth (dft/DFT) from the three sources. Results Cohen’s Kappa of the dft/DFT-values was calculated to 0.79 between the researcher and the patient record, to 0.95 between patient dental record and SKaPa, and to 0.76 between the researcher and SKaPa. Intraclass correlation coefficient (ICC) was calculated to 0.96 between the researcher and the patient journal, to 0.99 between the patient dental record vs. SKaPa, and to 0.95 between the researcher and SKaPa. Conclusion The SKaPa registry information demonstrated satisfactory reliability and accuracy on dental caries in 6- and 12-year-old children and is a reliable source for registry-based research. Trial registration The study was registered in Clinical Trials (www.ClinicalTrials.gov, NCT03039010)


2018 ◽  
Vol 29 (6) ◽  
pp. 585-592 ◽  
Author(s):  
Ana B Plaza-Puche ◽  
Liberdade C Salerno ◽  
Francesco Versaci ◽  
Daniel Romero ◽  
Jorge L Alio

Purpose:To evaluate the intrasubject repeatability of the ocular aberrometry obtained with a new ocular pyramidal aberrometer technology in a sample of normal eyes.Methods:A total of 53 healthy eyes of 53 subjects with ages ranging from 18 to 45 years were included in this study. In all cases, three consecutive acquisitions were obtained. Intrasubject repeatability of the measurements with a pyramidal aberrometer was calculated. Intrasubject repeatability for 4.0- and 6.0-mm pupils was evaluated within the subject standard deviation (Sw) and intraclass correlation coefficient.Results:Low values of the Swand intraclass correlation coefficient outcomes close to 1 were observed for the sphere and cylinder at 3.0-mm pupil size. Most low Swand intraclass correlation coefficient values close to 1 were observed for total, low-order aberrations and higher-order aberrations root mean square and for each Zernike coefficient analysis (intraclass correlation coefficient ⩾0.798) at 4.0-mm pupil size, with more limited outcomes for the aberrometric coefficient of Z(4, 4) with an intraclass correlation coefficient of 0.683. For a 6.0 mm pupil diameter, low Swand intraclass correlation coefficient values close to 1 were observed for all aberrometric parameters or Zernike coefficients analyzed (intraclass correlation coefficient ⩾0.850).Conclusion:The new pyramidal aberrometer Osiris provides repeatable and consistent measurements of ocular aberrometry measurements in normal eyes.


2017 ◽  
Vol 48 (01) ◽  
pp. 24-38 ◽  
Author(s):  
Courtney Ryder ◽  
Tamara Mackean ◽  
Shahid Ullah ◽  
Heather Burton ◽  
Heather Halls ◽  
...  

Socially accountable health curricula, designed to decrease Aboriginal health inequities through the transformation of health professional students into culturally safe practitioners, has become a focal point for health professional programmes. Despite this inclusion in health curricula there remains the question of how to best assess students in this area in relation to the concept, of cultural safety and transformative unlearning, to facilitate attitudinal change. To address this question, this study developed a research questionnaire to measure thematic areas of transformative unlearning, cultural safety and critical thinking in Aboriginal Health for application on undergraduate and postgraduate students and faculty staff. The Likert-scale questionnaire was developed and validated through face and content validity. Test–retest methodology was utilised to determine stability and reliability of the questionnaire with 40 participants. The extent of agreement and reliability were determined through weighted kappa and intraclass correlation coefficient. Exploratory factor analysis was calculated to determine construct validity for questionnaire items. For the overall population subset the tool met good standards of reliability and validity, with 11 of the 15 items reaching moderate agreement (κ > 0.6) and an intraclass correlation coefficient of 0.72, suggesting substantial agreement. Cronbach's alpha was calculated above 0.7 for the thematic areas. The majority of items provided high factor loadings, low loading items will be reviewed to strengthen the tool, where validations of the revised tool with a larger cohort will allow future use to compare and determine effective teaching methodologies in Aboriginal health and cultural safety curricula.


2017 ◽  
Vol 27 (5) ◽  
pp. 365-372 ◽  
Author(s):  
Julian Edbrooke-Childs ◽  
Jacqueline Hayes ◽  
Evelyn Sharples ◽  
Dawid Gondek ◽  
Emily Stapley ◽  
...  

Background‘Situation Awareness For Everyone’ (SAFE) was a 3-year project which aimed to improve situation awareness in clinical teams in order to detect potential deterioration and other potential risks to children on hospital wards. The key intervention was the ‘huddle’, a structured case management discussion which is central to facilitating situation awareness. This study aimed to develop an observational assessment tool to assess the team processes occurring during huddles, including the effectiveness of the huddle.MethodsA cross-sectional observational design was used to psychometrically develop the ‘Huddle Observation Tool’ (HOT) over three phases using standardised psychometric methodology. Huddles were observed across four NHS paediatric wards participating in SAFE by five researchers; two wards within specialist children hospitals and two within district general hospitals, with location, number of beds and length of stay considered to make the sample as heterogeneous as possible. Inter-rater reliability was calculated using the weighted kappa and intraclass correlation coefficient.ResultsInter-rater reliability was acceptable for the collaborative culture (weighted kappa=0.32, 95% CI 0.17 to 0.42), environment items (weighted kappa=0.78, 95% CI 0.52 to 1) and total score (intraclass correlation coefficient=0.87, 95% CI 0.68 to 0.95). It was lower for the structure and risk management items, suggesting that these were more variable in how observers rated them. However, agreement on the global score for huddles was acceptable.ConclusionWe developed an observational assessment tool to assess the team processes occurring during huddles, including the effectiveness of the huddle. Future research should examine whether observational evaluations of huddles are associated with other indicators of safety on clinical wards (eg, safety climate and incidents of patient harm), and whether scores on the HOT are associated with improved situation awareness and reductions in deterioration and adverse events in clinical settings, such as inpatient wards.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Carmen Lopez de la Fuente ◽  
Ana Sanchez-Cano ◽  
Francisco Segura ◽  
Isabel Pinilla

Purpose. To assess the normal values and the repeatability of the Galilei Dual Scheimpflug Analyzer (GDSA), the biometer IOL Master, and the autokerato/refractometer WAM 5500 in anterior segment examinations.Methods. Eighty-eight eyes from 88 healthy volunteers were prospectively and consecutively recruited. The repeatability was assessed, calculating the intraclass correlation coefficient (ICC).Results. The correlations among the repeated measurements showed nearly perfect reliability (ICC > 0.81) for all of the parameters, except corneal astigmatism Galilei (0.79) and WAM (0.68). There were statistically significant differences (P<0.001) between the values of the flat simulated keratometry (SimK) and the steep SimK measured by GDSA and the other methods; however, there were no statistically significant differences for the values obtained with the IOL Master and WAM 5500 (P=0.302andP=0.172, resp.) or between the values of the ACD (P<0.001) and WTW (P=0.007) measured by the IOL Master and GDSA.Conclusions. The anterior segment measurements from the IOL Master and WAM 5500 were highly repeatable, comparable, and well correlated. In healthy young persons, the evaluated parameters had very good repeatability, although significant differences were found between the GDSA and IOL Master and between the GDSA and WAM 5500.


2019 ◽  
Vol 64 (No. 11) ◽  
pp. 476-481
Author(s):  
I De Amicis ◽  
L Stehlik ◽  
F Del Signore ◽  
S Parrillo ◽  
D Robbe ◽  
...  

Radiography is routinely used for pelvimetry, but it is not easily accessible for farm animals, while ultrasonographic pelvimetry could be used due to the better accessibility and lack of radiation hazard. Radiographic and ultrasonographic pelvimetry in goats were compared, and three diameters of the pelvis were measured; the narrowest transverse pelvic diameter at the level of the acetabula, from the pecten pubis to the sacral promontorium and from the dorsal edge of the pubis to the coccygeal vertebra. The measurement was performed three times by one observer on both modalities. Intraclass correlation coefficient (ICC) and Bland-Altman analyses were performed. The intraobserver agreement was excellent for all the measurements and modalities in the study. Excellent agreement (ICC 0.96) was achieved for the transverse pelvic diameter. The agreement for the other two diameters was poor. We can conclude that the ultrasonographic pelvimetry of a goat is reliable only in the transverse pelvic diameter just cranial to the pecten pubis.


2021 ◽  
Vol 34 ◽  
Author(s):  
Graciella CHIARELLI ◽  
Doroteia Aparecida HÖFELMANN ◽  
João Luiz Gurgel Calvet da SILVEIRA ◽  
Maria Urania ALVES ◽  
Luciane Coutinho de AZEVEDO

ABSTRACT Objective This study evaluated reproducibility, relative validity, using a 24-hour recall questionnaire as a reference standard, and estimated calibration factors for a food frequency questionnaire adapted for use with German descendants living in Brazil. Methods The target population consisted of 50 volunteers, of both genders, aged over 20 years, living in a German colonization city in southern Brazil. The food frequency questionnaire was applied twice, in the first and third months of the investigation. During this period, three 24-hour recalls were applied, with an interval of one month between them. Reproducibility was estimated by the intraclass correlation coefficient. Validity was tested by the intraclass correlation coefficient, weighted kappa test and Bland-Altman method. Calibration factors were estimated using linear regression. Results Among the food frequency questionnaires, there was a strong correlation for energy and most of the nutrients corrected for energy. There was a weak correlation between a food frequency questionnaire and a 24-hour dietary recall. However, the exact concordance in the categorization in tertiles among the instruments ranged from 28% (vitamin A) to 52% (fiber and potassium). Gross values of the food frequency questionnaire were reduced with the calibration and approached the consumption data estimated by the 24-hour dietary recall. Conclusions The food frequency questionnaire showed good reproducibility, however, weak correlation with the 24-hour dietary recall. The calibration of the data obtained by the food frequency questionnaire brought them closer to the reference method.


2015 ◽  
Vol 23 (3) ◽  
pp. 450-457 ◽  
Author(s):  
Ana Railka de Souza Oliveira ◽  
Thelma Leite de Araujo ◽  
Emilia Campos de Carvalho ◽  
Alice Gabrielle de Sousa Costa ◽  
Tahissa Frota Cavalcante ◽  
...  

OBJECTIVE: to develop indicators for the nursing outcome Swallowing Status and the respective conceptual and operational definitions validated by experts and in a clinical setting among patients after having experienced a stroke.METHOD: methodological study with concept analysis and content and clinical validations. The Content Validation Index was verified for the scores assigned by 11 experts to indicators. Two pairs of nurses assessed 81 patients during the clinical validation: one pair used an instrument with definitions and the other used an instrument without definitions. The resulting assessments were compared using Intraclass Correlation Coefficient, Friedman's test, and Minimal Important Difference calculation.RESULTS: All the indicators, with the exception of the indicator Ability to bring food to mouth, presented Content Validation Index above 0.80. The pair using the instrument with definitions presented an Intraclass Correlation Coefficient above 0.80 for all the indicators and similarity was found in all the assessments, according to the Minimal Important Difference calculation. The pair using the instrument without definitions presented a low coefficient (ρ<0.75) for all the indicators.CONCLUSION: the results showed that greater uniformity and accuracy was achieved by the pair of nurses using the conceptual and operational definitions for the indicators of the nursing outcome Swallowing Status.


Sign in / Sign up

Export Citation Format

Share Document