Reliability of Simulation-Based Assessment for Practicing Physicians: Performance is Context-Specific
Abstract Introduction: Even physicians who routinely work in complex, dynamic practices may be unprepared to optimally manage challenging critical events. High-fidelity simulation can realistically mimic critical clinically relevant events, however the reliability and validity of simulation-based assessment scores for practicing physicians has not been established.Methods: Standardized complex simulation scenarios were developed and administered to board-certified, practicing anesthesiologists who volunteered to participate in an assessment study during formative maintenance of certification activities. A subset of the study population agreed to participate as the primary responder in a second scenario for this study. The physicians were assessed independently by trained raters on both teamwork/behavioral and technical performance measures. Analysis using Generalizability and Decision studies were completed for the two scenarios with two raters.Results: The technical score was not more reliable than the behavioral score. With two raters > 20 scenarios would be required to achieve a reliability estimate of 0.7. Increasing the number of raters would have little effect on reliability.Discussion: The performance of practicing physicians on simulated critical events may be highly context-specific. Realistic simulation-based assessment for practicing physicians is resource-intensive and may be best-suited for individualized formative feedback. Moreover, aggregate data from a population of participants may yield even higher impact if used to identify skill or knowledge gaps to be addressed by training programs and continuing education improvements across the profession.