Evaluating Random Error in Clinician-Administered Surveys: Theoretical Considerations and Clinical Applications of Interobserver Reliability and Agreement

PurposeThe purpose of this study is to raise awareness of interobserver concordance and the differences between interobserver reliability and agreement when evaluating the responsiveness of a clinician-administered survey and, specifically, to demonstrate the clinical implications of data types (nominal/categorical, ordinal, interval, or ratio) and statistical index selection (for example, Cohen's kappa, Krippendorff's alpha, or interclass correlation).MethodsIn this prospective cohort study, 3 clinical audiologists, who were masked to each other's scores, administered the Practical Hearing Aid Skills Test–Revised to 18 adult owners of hearing aids. Interobserver concordance was examined using a range of reliability and agreement statistical indices.ResultsThe importance of selecting statistical measures of concordance was demonstrated with a worked example, wherein the level of interobserver concordance achieved varied from “no agreement” to “almost perfect agreement” depending on data types and statistical index selected.ConclusionsThis study demonstrates that the methodology used to evaluate survey score concordance can influence the statistical results obtained and thus affect clinical interpretations.

Download Full-text

An Electroencephalographic Classification for Coma

Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques ◽

10.1017/s0317167100032996 ◽

1997 ◽

Vol 24 (04) ◽

pp. 320-325 ◽

Cited By ~ 86

Author(s):

G.B. Young ◽

R.S. McLachlan ◽

J.H. Kreeft ◽

J.D. Demelo

Keyword(s):

Interobserver Agreement ◽

Classification Scheme ◽

Interobserver Reliability ◽

Rater Agreement ◽

Substantial Agreement ◽

Kappa Score ◽

Perfect Agreement ◽

Icu Patients ◽

Eeg Classification ◽

Electroencephalogram Eeg

ABSTRACT:Background:The assessment of thalamocortical function in comatose patients in the intensive care unit (ICU) can be difficult to determine. Since the electroencephalogram (EEG) affords such assessment, we have developed an EEG classification for comatose patients in our general ICU.Methods:One hundred EEGs were classified in a blinded fashion by two EEGers, using our method and that of Synek. Interobserver agreement was assessed using kappa score determination.Results:Kappa scores were 0.90 for our system and 0.75 for the Synek system. (The Kappa score represents the inter-rater agreement that is beyond chance; 0.90 is almost perfect agreement, while 0.75 is substantial agreement).Conclusion:Our system for classifying EEGs in comatose patients has a higher interobserver reliability than one that was previously published. This EEG classification scheme should be useful in clinical electrophysiological research involving ICU patients, allowing for internal consistency and comparisons among centres.

Download Full-text

Preliminary Results of Relationship between Preoperative Walking Ability and Magnetic Resonance Imaging Morphology in Patients with Lumbar Canal Stenosis: Comparison between Trefoil and Triangle Types of Spinal Stenosis

Asian Spine Journal ◽

10.4184/asj.2017.11.4.580 ◽

2017 ◽

Vol 11 (4) ◽

pp. 580-585

Author(s):

Parisa Azimi ◽

Taravat Yazdanian ◽

Edward C. Benzel

Keyword(s):

Magnetic Resonance Imaging ◽

Back Pain ◽

Magnetic Resonance ◽

Interobserver Reliability ◽

Walking Ability ◽

Resonance Imaging ◽

Perfect Agreement ◽

Duration Of Symptoms ◽

Lumbar Canal Stenosis ◽

Canal Stenosis

<sec><title>Study Design</title>Cross-sectional.</sec><sec><title>Purpose</title>To examine the relationship between magnetic resonance imaging (MRI) morphology stenosis grades and preoperative walking ability in patients with lumbar canal stenosis (LCS).</sec><sec><title>Overview of Literature</title>No previous study has analyzed the correlation between MRI morphology stenosis grades and walking ability in patients with LCS.</sec><sec><title>Methods</title>This prospective study included 98 consecutive patients with LCS who were candidates for surgery. Using features identified in T2-weighted axial magnetic, stenosis type was determined at the maximal stenosis level, and only trefoil and triangle stenosis grade types were considered because of sufficient sample size. Intraobserver and interobserver reliability were assessed by calculating weighted kappa coefficients. Symptom severity was evaluated via the Japanese Orthopedic Association Back Pain Evaluation Questionnaire (JOABPEQ). Walking ability was assessed using the Self-Paced Walking Test (SPWT) and JOABPEQ subscales. Demographic characteristics, SPWT scores, and JOABPEQ scores were compared between patients with trefoil and triangle stenosis types.</sec><sec><title>Results</title>The mean patient age was 58.1 (standard deviation, 8.4) years. The kappa values of the MRI morphology stenosis grade types showed a perfect agreement between the stenosis grade types. The trefoil group (n=53) and triangle group (n=45) showed similar preoperative JOABPEQ subscale scores (e.g., low back pain, lumbar function, and mental health) and were not significantly different in age, BMI, duration of symptoms, or lumbar stenosis levels (all <italic>p</italic>>0.05); however, trefoil stenosis grade type was associated with a decreased walking ability according to the SPWT and JOABPEQ subscale scores.</sec><sec><title>Conclusions</title>These findings suggest preoperative walking ability is more profoundly affected in patients with trefoil type stenosis than in those with triangle type stenosis.</sec>

Download Full-text

Toronto Facial Grading System: Interobserver reliability

Otolaryngology ◽

10.1016/s0194-5998(00)70241-5 ◽

2000 ◽

Vol 122 (2) ◽

pp. 212-215 ◽

Cited By ~ 40

Author(s):

Fatma Tulin Kayhan ◽

David Zurakowski ◽

Steven D. Rauch

Keyword(s):

Facial Nerve ◽

Interobserver Reliability ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Ease Of Use ◽

Composite Score ◽

Grading System ◽

Perfect Agreement ◽

Intraclass Correlation Coefficients ◽

Facial Function

The Toronto Facial Grading System (TFGS) is an observer scale for rating facial nerve dysfunction. The TFGS scores aspects of resting symmetry, symmetry of voluntary movement, and synkinesis for each division of the face (subscores) and then provides calculated total scores and an overall composite score of facial function. The developers of the scale have validated its sensitivity for identifying small changes in facial dysfunction and the independence of the different components measured. Herein we report our results in a study of interob-server reliability using the TFGS. Twenty-five patients from the Massachusetts Eye and Ear Infirmary Facial Nerve Center with varying degrees of facial paresis, paralysis, and synkinesis were videotaped, and the video recordings were scored by 5 independent observers using the TFGS. Intraclass correlation coefficients (κ) and 95% confidence intervals were calculated for subscores and for each total and composite score. Intraclass correlation coefficients ranged from 0.59 to 0.85, all considered substantial to near-perfect agreement between observers. We believe the TFGS is superior to other scales by virtue of its sensitivity, comprehensiveness, ease of use, and interobserver reliability. The TFGS presently appears to be the best option in those situations in which accurate and precise documentation of facial function is required.

Download Full-text

Statistical measures of the central tendency for H+ activity and pH

Soil Science Annual ◽

10.1515/ssa-2017-0022 ◽

2017 ◽

Vol 68 (4) ◽

pp. 174-181 ◽

Cited By ~ 1

Author(s):

Izabela Kuna-Broniowska ◽

Halina Smal

Keyword(s):

Probability Distributions ◽

Geometric Mean ◽

Arithmetic Mean ◽

Central Tendency ◽

Statistical Measures ◽

Normal Probability ◽

Explicit Opinion ◽

Normal Probability Distribution ◽

Theoretical Considerations

Abstract Despite the numerous papers on the statistical analyses of pH, there is no explicit opinion on the use of arithmetic mean as a measure of the central tendency for pH and H+ activity. The problem arises because the transformation of the arithmetic mean for one does not give the arithmetic mean for the other. The paper presents 1) the theoretical considerations on the distribution of pH and H+ activity and relation between them, properties of these distributions, the choice of distributions which should be consistent with the distribution of pH and the distribution of H+ activity and measures of central tendency for features of such distributions and 2) examples of calculations of measures of central tendency for pH and H+ activity based on the literature data on soil and lake water pH. These data analyses included distributions of pH and H+ activities, properties of distribution, descriptive statistics for pH and for the H+ activity and comparison of arithmetic mean with the geometric mean. From the results, it could be concluded that a uniform approach to the choice of measure for the central tendency of pH and H+ activity requires the determination of the type of measure (mean) for one of them and then consistent transformation of this measure. The choice of measure of the central tendency for the variable should be preceded by determination of its distribution. Normal probability distribution of pH and thus lognormal distribution of H+ activity indicate that the arithmetic mean, and its corresponding geometric mean should be used as proper measures of the central tendency for pH and for H+ activity. Besides, the position statistic that is a median can be used for each of those variables, irrespective of their probability distributions.

Download Full-text

Estimation of intra-arterial chemotherapy distribution to the retina in pediatric retinoblastoma patients using quantitative digital subtraction angiography

Interventional Neuroradiology ◽

10.1177/1591019917749825 ◽

2018 ◽

Vol 24 (2) ◽

pp. 214-219 ◽

Cited By ~ 5

Author(s):

Sravani Kondapavulur ◽

Daniel L Cooke ◽

Andrew Kao ◽

Matthew R Amans ◽

Matthew Alexander ◽

...

Keyword(s):

Digital Subtraction Angiography ◽

Ophthalmic Artery ◽

Interobserver Reliability ◽

Regions Of Interest ◽

Patient Specific ◽

Target Tissue ◽

Digital Subtraction ◽

Distal Catheter ◽

Interclass Correlation ◽

Dosing Strategies

Background and purpose The purpose of this article is to estimate the distribution of superselective intra-arterial chemotherapy (IAC) delivery to ocular target tissue using quantitative digital subtraction angiography (qDSA). Materials and methods From March 2010 to January 2016, 50 ophthalmic artery contrast DSAs obtained immediately prior to IAC infusions in 22 patients were analyzed. This study was conducted under a retrospective review IRB (no. 10-01862). Parametric color-coded DSAs (iFlow, Siemens Medical) were post-processed (MATLAB, The Mathworks Inc.) using two methods: two box regions of interest (pre-retina and globe) and four custom regions of interest (ROIs—ophthalmic artery, choroid, supraclinoid internal carotid artery (ICA), cavernous ICA). Mean interobserver reliability of custom ROI selection is presented as a 95% confidence interval of interclass correlation, and fractional chemotherapy delivery to selected ROIs as means ± standard deviation in this study. Results The estimated fraction of chemotherapy delivered to the globe with the first method was 79.5%. Percentage regional delivery using the second method was as follows: ophthalmic artery, 85.8%; choroid, 60.5%; supraclinoid ICA, 14.2%. The cavernous ICA ROI (encompassing distal catheter and potential reflux) gave a signal equivalent to 9.3% of total delivery. Conclusion Parametric color-coded qDSA can estimate the fraction of IAC delivered to the retina and other orbital structures in ocular retinoblastoma patients. This information can inform delivery location and dosing strategies on a patient-specific basis.

Download Full-text

The Reliability of Classifying the Morphology of Anterior Cruciate Ligament Remnants during Surgery

The Journal of Knee Surgery ◽

10.1055/s-0039-1700810 ◽

2019 ◽

Author(s):

Barak Haviv ◽

Shai Shemesh ◽

Mohamed Kittani ◽

Mustafa Yassin ◽

Lee Yaari

Keyword(s):

Anterior Cruciate Ligament ◽

Acl Reconstruction ◽

Interobserver Reliability ◽

Cruciate Ligament ◽

Scar Formation ◽

Kappa Statistics ◽

Morphological Pattern ◽

Perfect Agreement ◽

Anterior Cruciate

AbstractArthroscopic classification of the torn anterior cruciate ligament (ACL) morphology is fundamental for clinical studies on emerging techniques such as repair and preservation. At present, the most acknowledged classification is Crain description of four morphological patterns. The purpose of the study was to analyze the intra- and interobserver reliability of Crain classification in patients undergoing ACL reconstruction surgeries. The study included 101 patients who had ACL reconstruction surgery between the years 2014 and 2017. The morphological pattern of ACL remnant scar formation during surgery was observed and classified according to Crain by three orthopaedic surgeons. Inter- and intraobserver reliabilities were measured using kappa statistics. Intraobserver reliability for the Crain classification ranged from 0.63 to 0.83 (substantial to almost perfect agreement). Interobserver reliability was 0.51 (moderate agreement). In almost a third of the cases, observers reported on additional morphological pattern of scar formation that was not well defined by Crain. A modified classification of four patterns was suggested: (A) without scar tissue, (B) with adhesion to the femoral notch (wall or roof), (C) with adhesion to the notch and posterior cruciate ligament (PCL), and (D) with adhesion to the PCL. Reanalysis of these four morphological configurations resulted in interobserver reliability of 0.82 (almost perfect agreement). In conclusion, the Crain classification of torn ACL remnant morphology has moderate interobserver reliability; however, a suggested classification with modified and additional configurations has almost perfect reliability and may be useful for studies on ACL repair and preservation.

Download Full-text

The MISDEF2 algorithm: an updated algorithm for patient selection in minimally invasive deformity surgery

Journal of Neurosurgery Spine ◽

10.3171/2019.7.spine181104 ◽

2020 ◽

Vol 32 (2) ◽

pp. 221-228 ◽

Cited By ~ 6

Author(s):

Praveen V. Mummaneni ◽

Paul Park ◽

Christopher I. Shaffrey ◽

Michael Y. Wang ◽

Juan S. Uribe ◽

...

Keyword(s):

Minimally Invasive ◽

Spinal Deformity ◽

Adult Spinal Deformity ◽

Interobserver Reliability ◽

Open Approach ◽

Case Review ◽

Perfect Agreement ◽

Recent Advances ◽

Spinal Deformity Surgery ◽

Spss Software

OBJECTIVEMinimally invasive surgery (MIS) can be used as an alternative or adjunct to traditional open techniques for the treatment of patients with adult spinal deformity. Recent advances in MIS techniques, including advanced anterior approaches, have increased the range of candidates for MIS deformity surgery. The minimally invasive spinal deformity surgery (MISDEF2) algorithm was created to provide an updated framework for decision-making when considering MIS techniques in correction of adult spinal deformity.METHODSA modified algorithm was developed that incorporates a patient’s preoperative radiographic parameters and leads to one of 4 general plans ranging from basic to advanced MIS techniques to open deformity surgery with osteotomies. The authors surveyed 14 fellowship-trained spine surgeons experienced with spinal deformity surgery to validate the algorithm using a set of 24 cases to establish interobserver reliability. They then re-surveyed the same surgeons 2 months later with the same cases presented in a different sequence to establish intraobserver reliability. Responses were collected and analyzed. Correlation values were determined using SPSS software.RESULTSOver a 3-month period, 14 fellowship-trained deformity surgeons completed the surveys. Responses for MISDEF2 algorithm case review demonstrated an interobserver kappa of 0.85 for the first round of surveys and an interobserver kappa of 0.82 for the second round of surveys, consistent with substantial agreement. In at least 7 cases, there was perfect agreement between the reviewing surgeons. The mean intraobserver kappa for the 2 surveys was 0.8.CONCLUSIONSThe MISDEF2 algorithm was found to have substantial inter- and intraobserver agreement. The MISDEF2 algorithm incorporates recent advances in MIS surgery. The use of the MISDEF2 algorithm provides reliable guidance for surgeons who are considering either an MIS or an open approach for the treatment of patients with adult spinal deformity.

Download Full-text

Interobserver and Intraobserver Reliability of an MRI-Based Classification System for Injuries to the Ulnar Collateral Ligament

The American Journal of Sports Medicine ◽

10.1177/0363546518786970 ◽

2018 ◽

Vol 46 (11) ◽

pp. 2755-2760 ◽

Cited By ~ 8

Author(s):

Prem N. Ramkumar ◽

Salvatore J. Frangiamore ◽

Sergio M. Navarro ◽

T. Sean Lynch ◽

Michael C. Forney ◽

...

Keyword(s):

Classification System ◽

Clinical Decision Making ◽

Interobserver Reliability ◽

Ulnar Collateral Ligament ◽

Random Order ◽

Weighted Kappa ◽

Intraobserver Variability ◽

Collateral Ligament ◽

Level Of Evidence ◽

Perfect Agreement

Background: Despite improvements in understanding biomechanics and surgical options for ulnar collateral ligament (UCL) tears, there remains a need for a reliable classification of UCL tears that has the potential to guide clinical decision making. Purpose: To assess the intra- and interobserver reliability of the newly proposed magnetic resonance imaging (MRI)–based classification for UCL tears. Secondary objectives included assessing the effect of additional views, discrimination between distal and nondistal tears, and correlation of imaging reads with intraoperative findings of the UCL. Study Design: Cohort study (diagnosis); Level of evidence, 2. Methods: Nine fellowship-trained specialists from 7 institutions independently completed 4 surveys consisting of 60 elbow MRI scans with UCL tears using a newly proposed 6-stage classification system. The first and third surveys contained 60 coronal images, while the second and fourth contained the same images with coronal and axial views presented in a random order to assess intraobserver variability via the weighted kappa value and the effect of additional imaging views. Weighted kappa values were also calculated for each of the 4 surveys to acquire interobserver reliability. Reliability analysis was repeated through a 2-group classification analysis for distal and nondistal tears. Observer readings were compared with intraoperative UCL findings. Results: For the newly proposed 6-stage MRI-based classification, intra- and interobserver reliability demonstrated near perfect and substantial agreement, respectively. These values increased only when substratified into the 2-group distal and nondistal tear classification ( P < .05). The additional axial view did not statistically improve the agreement within and among readers. When compared with intraoperative findings from 30 elbows, observer readings were accurate for tear grade (partial and complete), proximal location, and distal location but not midsubstance tears. Conclusion: The newly proposed 6-stage MRI-based classification utilizing grade and location of the injury had substantial to near perfect agreement among and within fellowship-trained observers.

Download Full-text

Assessment of Basic Ankle Arthroscopy Skills in Orthopedic Trainees

Foot & Ankle International ◽

10.1177/1071100719891418 ◽

2019 ◽

Vol 41 (2) ◽

pp. 193-199 ◽

Cited By ~ 2

Author(s):

Jeremiah D. Johnson ◽

Christopher Cheng ◽

Brian Schmidtberg ◽

Mark Cote ◽

Lauren E. Geaney

Keyword(s):

Medical Students ◽

Scoring System ◽

Rating Scale ◽

Interobserver Reliability ◽

Ankle Arthroscopy ◽

Assessment Tools ◽

Residency Programs ◽

Interclass Correlation ◽

Objective Model ◽

Junior Residents

Background: There is increasing emphasis on assessing resident competency, but little has been published on how to best evaluate trainee competency for ankle arthroscopy. The purpose of this study was to validate an objective model for assessing basic ankle arthroscopy knowledge and operative skills on a cadaveric ankle. Methods: The Diagnostic Ankle Arthroscopy Skills Scoring System was adapted from previously validated assessment tools for knee arthroscopy. The scoring system included (1) an oral questionnaire (0-23 points), (2) an operative task-specific checklist (0-19 points), and (3) a global operative skills rating (12-60 points). Thirty-three trainees consisting of orthopedic residents and medical students performed a diagnostic ankle arthroscopy on a cadaveric ankle and were assessed by a single observer, while a subset were tested by 2 evaluators to determine interobserver reliability. Results: There was strong correlation between educational level and scores on the global operative skills rating scale ( r = 0.967, P < .0001), task-specific checklist ( r = 0.815, P < .815), and oral questionnaire ( r = 0.896, P < .0001). The global operative skills scores significantly improved with training level, and the largest difference was between medical students and senior residents. The most notable year-to-year increases in skill were between postgraduate year (PGY) 1 and 2 ( P < .01) and between PGY2 and PGY3 ( P < .05). Oral questionnaire and task-specific checklists were significantly lower for medical students than PGY1 residents ( P < .001). There was also significant improvement in the oral questionnaire between senior and junior residents ( P < .05). There was a moderate correlation between number of self-reported ankle arthroscopy cases and scores on the global operative skills score ( r = 0.7019, P < .0001). Interobserver reliability was high for the global operative skills scores (interclass correlation coefficient = 0.89). Conclusion: The study revealed a valid measure to objectively assess trainees’ ankle arthroscopy clinical knowledge and operative skills in a bioskills laboratory. Clinical Relevance: This tool should enable residency programs to evaluate competency and track individual trainee progress over time.

Download Full-text

Measurement of finger joint motion after flexor tendon repair: smartphone photography compared with traditional goniometry

Journal of Hand Surgery (European Volume) ◽

10.1177/1753193421991062 ◽

2021 ◽

pp. 175319342199106

Author(s):

Jing Chen ◽

Ai Xian Zhang ◽

Si Jia Qian ◽

Yu Jing Wang

Keyword(s):

Flexor Tendon ◽

Interobserver Reliability ◽

Tendon Repair ◽

Correlation Coefficients ◽

Joint Motion ◽

Altman Analysis ◽

Flexor Tendon Repair ◽

Joint Range Of Motion ◽

Interclass Correlation ◽

Joint Range

The purpose of our study was to determine whether smartphone photography is as reliable and valid as clinical goniometry for measuring interphalangeal joint range of motion. We conducted a retrospective review of 37 fingers in 33 patients after flexor tendon repair. The measurements on photographs taken with a smartphone by a surgeon were compared with manual measurements with goniometry by the same surgeon. Pearson coefficients and interclass correlation coefficients were all above 0.85, and Bland–Altman analysis demonstrated that at least 35 of 37 measurements were within the 95% confidence interval in all variables. According to the Tang criteria, the total number of excellent and good results were equivalent according to both methods. There was high interobserver reliability between measurements by surgeons and a therapist. We conclude that if the pictures are properly taken, the measurement of the angles in the smartphone pictures are as reliable as measuring the angles with goniometry and that grading of the results according to the two methods gives identical results.

Download Full-text