Development and Testing of Screen-Based and Psychometric Instruments for Assessing Resident Performance in an Operating Room Simulator

Introduction. Medical simulators are used for assessing clinical skills and increasingly for testing hypotheses. We developed and tested an approach for assessing performance in anesthesia residents using screen-based simulation that ensures expert raters remain blinded to subject identity and experimental condition.Methods. Twenty anesthesia residents managed emergencies in an operating room simulator by logging actions through a custom graphical user interface. Two expert raters rated performance based on these entries using custom Global Rating Scale (GRS) and Crisis Management Checklist (CMC) instruments. Interrater reliability was measured by calculating intraclass correlation coefficients (ICC), and internal consistency of the instruments was assessed with Cronbach’s alpha. Agreement between GRS and CMC was measured using Spearman rank correlation (SRC).Results. Interrater agreement (GRS: ICC = 0.825, CMC: ICC = 0.878) and internal consistency (GRS: alpha = 0.838, CMC: alpha = 0.886) were good for both instruments. Subscale analysis indicated that several instrument items can be discarded. GRS and CMC scores were highly correlated (SRC = 0.948).Conclusions. In this pilot study, we demonstrated that screen-based simulation can allow blinded assessment of performance. GRS and CMC instruments demonstrated good rater agreement and internal consistency. We plan to further test construct validity of our instruments by measuring performance in our simulator as a function of training level.

Download Full-text

Evaluation of Patient Simulator Performance as an Adjunct to the Oral Examination for Senior Anesthesia Residents

Anesthesiology ◽

10.1097/00000542-200603000-00014 ◽

2006 ◽

Vol 104 (3) ◽

pp. 475-481 ◽

Cited By ~ 56

Author(s):

Georges L. Savoldelli ◽

Viren N. Naik ◽

Hwan S. Joo ◽

Patricia L. Houston ◽

Marianne Graham ◽

...

Keyword(s):

Concurrent Validity ◽

Rating Scale ◽

Intraclass Correlation ◽

Oral Examination ◽

Added Value ◽

Laboratory Simulation ◽

Global Rating Scale ◽

Average Score ◽

Global Rating ◽

Anesthesia Residents

Background Patient simulators possess features for performance assessment. However, the concurrent validity and the "added value" of simulator-based examinations over traditional examinations have not been adequately addressed. The current study compared a simulator-based examination with an oral examination for assessing the management skills of senior anesthesia residents. Methods Twenty senior anesthesia residents were assessed sequentially in resuscitation and trauma scenarios using two assessment modalities: an oral examination, followed by a simulator-based examination. Two independent examiners scored the performances with a previously validated global rating scale developed by the Anesthesia Oral Examination Board of the Royal College of Physicians and Surgeons of Canada. Different examiners were used to rate the oral and simulation performances. Results Interrater reliability was good to excellent across scenarios and modalities: intraclass correlation coefficients ranged from 0.77 to 0.87. The within-scenario between-modality score correlations (concurrent validity) were moderate: r = 0.52 (resuscitation) and r = 0.53 (trauma) (P < 0.05). Forty percent of the average score variance was accounted for by the participants, and 30% was accounted for by the participant-by-modality interaction. Conclusions Variance in participant scores suggests that the examination is able to perform as expected in terms of discriminating among test takers. The rather large participant-by-modality interaction, along with the pattern of correlations, suggests that an examinee's performance varies based on the testing modality and a trainee who "knows how" in an oral examination may not necessarily be able to "show how" in a simulation laboratory. Simulation may therefore be considered a useful adjunct to the oral examination.

Download Full-text

Evaluating Teamwork in a Simulated Obstetric Environment

Anesthesiology ◽

10.1097/01.anes.0000265149.94190.04 ◽

2007 ◽

Vol 106 (5) ◽

pp. 907-915 ◽

Cited By ~ 63

Author(s):

Pamela J. Morgan ◽

Richard Pittini ◽

Glenn Regehr ◽

Carol Marrs ◽

Michèle F. Haley

Keyword(s):

Rating Scale ◽

Assessment Tool ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Obstetric Care ◽

Global Rating Scale ◽

Global Rating ◽

Good Reliability ◽

Domain Specific ◽

Team Assessment

Background The National Confidential Enquiry into Maternal Deaths identified "lack of communication and teamwork" as a leading cause of substandard obstetric care. The authors used high-fidelity simulation to present obstetric scenarios for team assessment. Methods Obstetric nurses, physicians, and resident physicians were repeatedly assigned to teams of five or six, each team managing one of four scenarios. Each person participated in two or three scenarios with differently constructed teams. Participants and nine external raters rated the teams' performances using a Human Factors Rating Scale (HFRS) and a Global Rating Scale (GRS). Interrater reliability was determined using intraclass correlations and the Cronbach alpha. Analyses of variance were used to determine the reliability of the two measures, and effects of both scenario and rater profession (R.N. vs. M.D.) on scores. Pearson product-moment correlations were used to compare external with self-generated assessments. Results The average of nine external rater scores showed good reliability for both HFRS and GRS; however, the intraclass correlation coefficients for a single rater was low. There was some effect of rater profession on self-generated HFRS but not on GRS. An analysis of profession-specific subscores on the HFRS revealed no interaction between profession of rater and profession being rated. There was low correlation between externally and self-generated team assessments. Conclusions This study does not support the use of the HFRS for assessment of obstetric teams. The GRS shows promise as a summative but not a formative assessment tool. It is necessary to develop a domain specific behavioral marking system for obstetric teams.

Download Full-text

Validation of a Dry Model for Assessing the Performance of Arthroscopic Hip Labral Repair

The American Journal of Sports Medicine ◽

10.1177/0363546517696316 ◽

2017 ◽

Vol 45 (9) ◽

pp. 2125-2130 ◽

Cited By ~ 11

Author(s):

Lisa Phillips ◽

Jeffrey J.H. Cheung ◽

Daniel B. Whelan ◽

Michael Lucas Murnaghan ◽

Jas Chahal ◽

...

Keyword(s):

Rating Scale ◽

Intraclass Correlation ◽

Total Asset ◽

Labral Repair ◽

Global Rating Scale ◽

Task Completion ◽

Global Rating ◽

Evaluation Tool ◽

Level Of Evidence ◽

Level Of Training

Background: Arthroscopic hip labral repair is a technically challenging and demanding surgical technique with a steep learning curve. Arthroscopic simulation allows trainees to develop these skills in a safe environment. Purpose: The purpose of this study was to evaluate the use of a combination of assessment ratings for the performance of arthroscopic hip labral repair on a dry model. Study Design: Cross-sectional study; Level of evidence, 3. Methods: A total of 47 participants including orthopaedic surgery residents (n = 37), sports medicine fellows (n = 5), and staff surgeons (n = 5) performed arthroscopic hip labral repair on a dry model. Prior arthroscopic experience was noted. Participants were evaluated by 2 orthopaedic surgeons using a task-specific checklist, the Arthroscopic Surgical Skill Evaluation Tool (ASSET), task completion time, and a final global rating scale. All procedures were video-recorded and scored by an orthopaedic fellow blinded to the level of training of each participant. Results: The internal consistency/reliability (Cronbach alpha) using the total ASSET score for the procedure was high (intraclass correlation coefficient > 0.9). One-way analysis of variance for the total ASSET score demonstrated a difference between participants based on the level of training ( F3,43 = 27.8, P < .001). A good correlation was seen between the ASSET score and previous exposure to arthroscopic procedures ( r = 0.52-0.73, P < .001). The interrater reliability for the ASSET score was excellent (>0.9). Conclusion: The results of this study demonstrate that the use of dry models to assess the performance of arthroscopic hip labral repair by trainees is both valid and reliable. Further research will be required to demonstrate a correlation with performance on cadaveric specimens or in the operating room.

Download Full-text

Advanced closed-loop communication training: the blindfolded resuscitation

BMJ Simulation and Technology Enhanced Learning ◽

10.1136/bmjstel-2019-000498 ◽

2019 ◽

Vol 6 (4) ◽

pp. 235-238

Author(s):

Kate E Hughes ◽

Patrick G Hughes ◽

Thomas Cahir ◽

Jennifer Plitt ◽

Vivienne Ng ◽

...

Keyword(s):

Closed Loop ◽

Rating Scale ◽

Intraclass Correlation ◽

Calculated Data ◽

Global Rating Scale ◽

Global Rating ◽

Control Groups ◽

Advanced Technique ◽

Task Load ◽

And Control

Closed-loop communication (CLC) improves task efficiency and decreases medical errors; however, limited literature on strategies to improve real-time use exist. The primary objective was whether blindfolding a resuscitation leader was effective to improve crisis resource management (CRM) skills, as measured by increased frequency of CLC. Secondary objectives included whether blindfolding affected overall CRM performance or perceived task load. Participants included emergency medicine (EM) or EM/paediatric dual resident physicians. Participants completed presurveys, were block randomised into intervention (blindfolded) or control groups, lead both adult and paediatric resuscitations and completed postsurveys before debriefing. Video recordings of the simulations were reviewed by simulation fellowship-trained EM physicians and rated using the Ottawa CRM Global Rating Scale (GRS). Frequency of CLC was assessed by one rater via video review. Summary statistics were performed. Intraclass correlation coefficient was calculated. Data were analysed using R program for analysis of variance and regression analysis. There were no significant differences between intervention and control groups in any Ottawa CRM GRS category. Postgraduate year (PGY) significantly impacts all Ottawa GRS categories. Frequency of CLC use significantly increased in the blindfolded group (31.7, 95% CI 29.34 to 34.1) vs the non-blindfolded group (24.6, 95% CI 21.5 to 27.7). Participant’s self-rated perceived NASA Task Load Index scores demonstrated no difference between intervention and control groups via a Wilcoxon rank sum test. Blindfolding the resuscitation leader significantly increases frequency of CLC. The blindfold code training exercise is an advanced technique that may increase the use of CLC.

Download Full-text

Validity Evidence of Non-Technical Skills Assessment Instruments in Simulated Anaesthesia Crisis Management

Anaesthesia and Intensive Care ◽

10.1177/0310057x1704500410 ◽

2017 ◽

Vol 45 (4) ◽

pp. 469-475 ◽

Cited By ~ 8

Author(s):

T. Jirativanont ◽

K. Raksamani ◽

N. Aroonpruksakul ◽

P. Apidechakul ◽

S. Suraseranivongse

Keyword(s):

Crisis Management ◽

Situation Awareness ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Technical Skills ◽

Sufficient Evidence ◽

Global Rating Scale ◽

Global Rating ◽

Validity Evidence ◽

Intraclass Correlation Coefficients

We sought to evaluate the validity of two non-technical skills evaluation instruments, the Anaesthetists’ Non-Technical Skills (ANTS) behavioural marker system and the Ottawa Global Rating Scale (GRS), to apply them to anaesthesia training. The content validity, response process, internal structure, relations with other variables and consequences were described for validity evidence. Simulated crisis management sessions were initiated during which two trained raters evaluated the performance of postgraduate first-, second- and third-year (PGY-1, PGY-2 and PGY-3) anaesthesia residents. The study included 70 participants, composed of 24 PGY-1, 24 PGY-2 and 22 PGY-3 residents. Both instruments differentiated the non-technical skills of PGY-1 from PGY-3 residents (P <0.05). Inter-rater agreement was measured using the intraclass correlation coefficient. For the ANTS instrument, the intraclass correlation coefficients for task management, team-working, situation awareness and decision-making were 0.79, 0.34, 0.81 and 0.70, respectively. For the Ottawa GRS, the intraclass correlation coefficients for overall performance, leadership, problem-solving, situation awareness, resource utilisation and communication skills were 0.86, 0.83, 0.84, 0.87, 0.80 and 0.86, respectively. The Cronbach's alpha for internal consistency of the ANTS instrument was 0.93, and was 0.96 for the Ottawa GRS. There was a high correlation between the ANTS and Ottawa GRS. The raters reported the ease of use of the Ottawa GRS compared to the ANTS. We found sufficient evidence of validity in the ANTS instrument and the Ottawa GRS for the evaluation of non-technical skills in a simulated anaesthesia setting, but the Ottawa GRS was more practical and had higher reliability.

Download Full-text

Cross-cultural adaptation and validation of the International Cooperative Ataxia Rating Scale (ICARS) to Brazilian Portuguese

Arquivos de Neuro-Psiquiatria ◽

10.1590/0004-282x20180098 ◽

2018 ◽

Vol 76 (10) ◽

pp. 674-684

Author(s):

Fernanda Aparecida Maggi ◽

Pedro Braga-Neto ◽

Hsin Fen Chien ◽

Maria Thereza Drumond Gama ◽

Flávio Moura Rezende Filho ◽

...

Keyword(s):

Internal Consistency ◽

Rating Scale ◽

Intraclass Correlation ◽

Correlation Coefficients ◽

Brazilian Portuguese ◽

Expert Committee ◽

Pilot Testing ◽

Consistency Results ◽

Forward Translation ◽

Brazilian Culture

ABSTRACT Introduction: The clinical assessment of patients with ataxias requires reliable scales. We aimed to translate, adapt and validate the International Cooperative Ataxia Rating Scale (ICARS) into Brazilian Portuguese. Methods: The steps of this study were forward translation, translation synthesis, backward translation, expert committee meeting, preliminary pilot testing and final assessment. Thirty patients were enrolled in the preliminary pilot testing and 61 patients were evaluated for construct validity, internal consistency, intra- and inter-rater reliability and external consistency. Results: This study showed good validity of the construct and high internal consistency for the full scale, except for the oculomotor domain (Cronbach's alpha = 0.316, intraclass correlation coefficients intra- = 82.4% and inter- = 79.2%). A high correlation with the Scale for the Assessment and Rating of Ataxia was observed. We found good intra-rater agreement and relative inter-rater disagreement, except in the posture and gait domain. Conclusion: The present ICARS version is adapted for the Brazilian culture and can be used to assess our ataxic patients.

Download Full-text

A universal global rating scale for the evaluation of technical skills in the operating room

The American Journal of Surgery ◽

10.1016/j.amjsurg.2007.02.003 ◽

2007 ◽

Vol 193 (5) ◽

pp. 551-555 ◽

Cited By ~ 113

Author(s):

Jeffrey D. Doyle ◽

Eric M. Webber ◽

Ravi S. Sidhu

Keyword(s):

Operating Room ◽

Rating Scale ◽

Technical Skills ◽

Global Rating Scale ◽

Global Rating

Download Full-text

Using the Objective Structured Assessment of Technical Skills (OSATS) global rating scale to evaluate the skills of surgical trainees in the operating room

Surgery Today ◽

10.1007/s00595-012-0313-7 ◽

2012 ◽

Vol 43 (3) ◽

pp. 271-275 ◽

Cited By ~ 86

Author(s):

Hiroaki Niitsu ◽

Naoki Hirabayashi ◽

Masanori Yoshimitsu ◽

Takeshi Mimura ◽

Junya Taomoto ◽

...

Keyword(s):

Operating Room ◽

Rating Scale ◽

Technical Skills ◽

Global Rating Scale ◽

Global Rating ◽

Surgical Trainees

Download Full-text

Psychometrics of the Wrist Stability and Hand Mobility Subscales of the Fugl-Meyer Assessment in Moderately Impaired Stroke

Physical Therapy ◽

10.2522/ptj.20130235 ◽

2015 ◽

Vol 95 (1) ◽

pp. 103-108 ◽

Cited By ~ 19

Author(s):

Stephen J. Page ◽

Erinn Hade ◽

Andrew Persch

Keyword(s):

Upper Extremity ◽

Internal Consistency ◽

Concurrent Validity ◽

Intraclass Correlation ◽

Rank Correlation ◽

Correlation Coefficients ◽

Outpatient Rehabilitation ◽

Finger Movement ◽

Intraclass Correlation Coefficients ◽

Upper Extremity Movement

Background There remains a need for a quickly administered, stroke-specific, bedside measure of active wrist and finger movement for the expanding stroke population. The wrist stability and hand mobility scales of the upper extremity Fugl-Meyer Assessment (w/h UE FM) constitute a valid, reliable measure of paretic UE impairment in patients with active wrist and finger movement. Objective The aim of this study was to determine performance on the w/h UE FM in a stable cohort of survivors of stroke with only palpable movement in their paretic wrist flexors. Design A single-center cohort study was conducted. Method Thirty-two individuals exhibiting stable, moderate upper extremity hemiparesis (15 male, 17 female; mean age=56.6 years, SD=10.1; mean time since stroke=4.6 years, SD=5.8) participated in the study, which was conducted at an outpatient rehabilitation clinic in the midwestern United States. The w/h UE FM and Action Research Arm Test (ARAT) were administered twice. Intraclass correlation coefficients (ICCs), Cronbach alpha, and ordinal alpha were computed to determine reliability, and Spearman rank correlation coefficients and Bland-Altman plots were computed to establish validity. Results Intraclass correlation coefficients for the w/h UE FM and ARAT were .95 and .99, respectively. The w/h UE FM intrarater reliability and internal consistency were greater than .80, and concurrent validity was greater than .70. This also was the first stroke rehabilitative study to apply ordinal alpha to examine internal consistency values, revealing w/h UE FM levels greater than .85. Concurrent validity findings were corroborated by Bland-Altman plots. Conclusions It appears that the w/h UE FM is a promising tool to measure distal upper extremity movement in patients with little active paretic wrist and finger movement. This finding widens the segment of patients on whom the w/h UE FM can be effectively used and addresses a gap, as commonly used measures necessitate active distal upper extremity movement.

Download Full-text

Translation, linguistic and cultural adaptation, reliability and validity of the Radboud Oral Motor Inventory for Parkinson's Disease – ROMP questionnaire

Arquivos de Neuro-Psiquiatria ◽

10.1590/0004-282x20180033 ◽

2018 ◽

Vol 76 (5) ◽

pp. 316-323 ◽

Cited By ~ 2

Author(s):

Monia Presotto ◽

Maira Rozenfeld Olchik ◽

Johanna G. Kalf ◽

Carlos R.M. Rieder

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Internal Consistency ◽

Rating Scale ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Correlation Coefficients ◽

Brazilian Portuguese ◽

Intraclass Correlation Coefficients ◽

Oral Motor

ABSTRACT Objective: To translate and linguistically and culturally adapt to Brazilian Portuguese, and verify the reliability and validity of the Radboud Oral Motor Inventory for Parkinson's Disease (ROMP). Methods: The ROMP was translated and retranslated, and the instrument reliability was verified by analyzing the internal consistency and the reproducibility of the intra-examiner retest. The final version was applied to 27 participants with Parkinson's disease. Results: Internal consistency was 0.99 for the total ROMP and 0.96 to 0.99 for the three domains. Intraclass correlation coefficients for reproducibility were 0.99 for the total ROMP and 0.93 to 0.99 for the subscales. The ROMP and its subscales correlated substantially with the Likert-type scale, as well as with the unified Parkinson's disease rating scale II and III items. Conclusion: The linguistic and cultural equivalence of the ROMP in Brazilian Portuguese is now available, with excellent reliability and validity.

Download Full-text