Evaluating Random Error in Clinician-Administered Surveys: Theoretical Considerations and Clinical Applications of Interobserver Reliability and Agreement
PurposeThe purpose of this study is to raise awareness of interobserver concordance and the differences between interobserver reliability and agreement when evaluating the responsiveness of a clinician-administered survey and, specifically, to demonstrate the clinical implications of data types (nominal/categorical, ordinal, interval, or ratio) and statistical index selection (for example, Cohen's kappa, Krippendorff's alpha, or interclass correlation).MethodsIn this prospective cohort study, 3 clinical audiologists, who were masked to each other's scores, administered the Practical Hearing Aid Skills Test–Revised to 18 adult owners of hearing aids. Interobserver concordance was examined using a range of reliability and agreement statistical indices.ResultsThe importance of selecting statistical measures of concordance was demonstrated with a worked example, wherein the level of interobserver concordance achieved varied from “no agreement” to “almost perfect agreement” depending on data types and statistical index selected.ConclusionsThis study demonstrates that the methodology used to evaluate survey score concordance can influence the statistical results obtained and thus affect clinical interpretations.