Consistency, Inter-Rater Reliability, and Validity of 441 Consecutive Mock Oral Examinations in Anesthesiology: Implications for Use as a Tool for Assessment of Residents

2000 ◽  
Vol 44 (2) ◽  
pp. 72-73
Author(s):  
ARMIN SCHUBERT ◽  
JOHN E. TETZLAFF ◽  
MING TAN ◽  
VICTOR RYCKMAN ◽  
EDWARD MASCHA
1999 ◽  
Vol 91 (1) ◽  
pp. 288-298 ◽  
Author(s):  
Armin Schubert ◽  
John E. Tetzlaff ◽  
Ming Tan ◽  
Victor J. Ryckman ◽  
Edward Mascha

Background Oral practice examinations (OPEs) are used extensively in many anesthesiology programs for various reasons, including assessment of clinical judgment. Yet oral examinations have been criticized for their subjectivity. The authors studied the reliability, consistency, and validity of their OPE program to determine if it was a useful assessment tool. Methods From 1989 through 1993, we prospectively studied 441 OPEs given to 190 residents. The examination format closely approximated that used by the American Board of Anesthesiology. Pass-fail grade and an overall numerical score were the OPE results of interest. Internal consistency and inter-rater reliability were determined using agreement measures. To assess their validity in describing competence, OPE results were correlated with in-training examination results and faculty evaluations. Furthermore, we analyzed the relationship of OPE with implicit indicators of resident preparation such as length of training. Results The internal consistency coefficient for the overall numerical score was 0.82, indicating good correlation among component scores. The interexaminer agreement was 0.68, indicating moderate or good agreement beyond that expected by chance. The actual agreement among examiners on pass-fail was 84%. Correlation of overall numerical score with in-training examination scores and faculty evaluations was moderate (r = 0.47 and 0.41, respectively; P < 0.01). OPE results were significantly (P < 0.01) associated with training duration, previous OPE experience, trainee preparedness, and trainee anxiety. Conclusion Our results show the substantial internal consistency and reliability of OPE results at a single institution. The positive correlation of OPE scores with in-training examination scores, faculty evaluations, and other indicators of preparation suggest that OPEs are a reasonably valid tool for assessment of resident performance.


2020 ◽  
Vol 41 (5) ◽  
pp. e597-e602
Author(s):  
Yazeed Al-shawi ◽  
Tamer A. Mesallam ◽  
Rayan Alfallaj ◽  
Turki Aldrees ◽  
Nouf Albakheet ◽  
...  

2016 ◽  
Vol 77 (1) ◽  
pp. 17-24 ◽  
Author(s):  
Brian K.C. Lo ◽  
Leia Minaker ◽  
Alicia N.T. Chan ◽  
Jessica Hrgetic ◽  
Catherine L. Mah

Purpose: To adapt and validate a survey instrument to assess the nutrition environment of grab-and-go establishments at a university campus. Methods: A version of the Nutrition Environment Measures Survey for grab-and-go establishments (NEMS-GG) was adapted from existing NEMS instruments and tested for reliability and validity through a cross-sectional assessment of the grab-and-go establishments at the University of Toronto. Product availability, price, and presence of nutrition information were evaluated. Cohen’s kappa coefficient and intra-class correlation coefficients (ICC) were assessed for inter-rater reliability, and construct validity was assessed using the known-groups comparison method (via store scores). Results: Fifteen grab-and-go establishments were assessed. Inter-rater reliability was high with an almost perfect agreement for availability (mean κ = 0.995) and store scores (ICC = 0.999). The tool demonstrated good face and construct validity. About half of the venues carried fruit and vegetables (46.7% and 53.3%, respectively). Regular and healthier entrée items were generally the same price. Healthier grains were cheaper than regular options. Six establishments displayed nutrition information. Establishments operated by the university’s Food Services consistently scored the highest across all food premise types for nutrition signage, availability, and cost of healthier options. Conclusions: Health promotion strategies are needed to address availability and variety of healthier grab-and-go options in university settings.


2005 ◽  
Vol 39 (11) ◽  
pp. 1823-1827 ◽  
Author(s):  
Sandra L Kane-Gill ◽  
Levent Kirisci ◽  
Dev S Pathak

BACKGROUND The Naranjo criteria are frequently used for determination of causality for suspected adverse drug reactions (ADRs); however, the psychometric properties have not been studied in the critically ill. OBJECTIVE To evaluate the reliability and validity of the Naranjo criteria for ADR determination in the intensive care unit (ICU). METHODS All patients admitted to a surgical ICU during a 3-month period were enrolled. Four raters independently reviewed 142 suspected ADRs using the Naranjo criteria (review 1). Raters evaluated the 142 suspected ADRs 3–4 weeks later, again using the Naranjo criteria (review 2). Inter-rater reliability was tested using the kappa statistic. The weighted kappa statistic was calculated between reviews 1 and 2 for the intra-rater reliability of each rater. Cronbach alpha was computed to assess the inter-item consistency correlation. The Naranjo criteria were compared with expert opinion for criterion validity for each rater and reported as a Spearman rank (rs) coefficient. RESULTS The kappa statistic ranged from 0.14 to 0.33, reflecting poor inter-rater agreement. The weighted kappa within raters was 0.5402–0.9371. The Cronbach alpha ranged from 0.443 to 0.660, which is considered moderate to good. The rs coefficient range was 0.385–0.545; all rs coefficients were statistically significant (p < 0.05). CONCLUSIONS Inter-rater reliability is marginal; however, within-rater evaluation appears to be consistent. The inter-item correlation is expected to be higher since all questions pertain to ADRs. Overall, the Naranjo criteria need modification for use in the ICU to improve reliability, validity, and clinical usefulness.


2017 ◽  
Vol 20 (3) ◽  
pp. 135-140 ◽  
Author(s):  
Soichiro Koyama ◽  
Shigeo Tanabe ◽  
Norihide Itoh ◽  
Eiichi Saitoh ◽  
Kazuya Takeda ◽  
...  

2011 ◽  
Vol 27 (8) ◽  
pp. 582-590
Author(s):  
L.J. Irastorza ◽  
P. Rojano ◽  
T. Gonzalez-Salvador ◽  
J. Cotobal ◽  
M. Leira ◽  
...  

AbstractThe aim of this study was to evaluate the reliability and validity of the Spanish-language version of the Diagnostic Interview for Depressive Personality (DIDP). The DIDP was administered to 328 consecutive outpatients and the test–retest and inter-rater reliability were assessed. Factor analysis was used in search of factors capable of explaining the scale and a cutoff point was established. The DIDP scales showed adequate Cronbach's α values and acceptable test–retest and inter-rater reliability coefficients. Convergent and discriminant validity were explored, the latter with respect to avoidant and borderline personality disorders. The results of the factor analysis were consistent with the four-factor structure of the DIDP scales. The receiver operating characteristic (ROC) analysis revealed the area under the curve to be 0.848. We found 30 to be a good cutoff point, with a sensitivity of 74.5% and a specificity of 78.5%. The DIDP proved to be a reliable and valid instrument for assessing depressive personality disorder, at least among our outpatients. The psychometric properties of the DIDP support its clinical usefulness in assessing depressive personality.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Vedasri Dasoju ◽  
Rakesh Krishna Kovela ◽  
Jaya Shanker Tedla ◽  
Devika Rani Sangadala ◽  
Ravi Shankar Reddy

AbstractThe Trunk Impairment Scale (TIS) is a valid and reliable tool to assess trunk impairment in children with heterogeneous cerebral palsy. The purpose of this study was to determine the reliability and validity of the TIS in assessing children with spastic diplegic cerebral palsy. The sample was a total of 30 subjects (15 = boys, 15 = girls). All subjects underwent an assessment of the sitting component of the Gross Motor Function Measure-88 and TIS by rater 1. Rater one observed video recordings within 24 h and scored TIS for intra-rater reliability, while rater two did likewise after 48 h for inter-rater reliability. The mean and standard deviation of the TIS and sitting components of the Gross Motor Function Measure-88 were 15.66 ± 4.20 and 52.36 ± 6.26, respectively. We established intra-rater and inter-rater reliability of the TIS with Intra Class Correlation Coefficient 0.991 and 0.972, respectively. The concurrent validity of the TIS with the sitting component of the Gross Motor Function Measure-88 was good, with an r-value of 0.844 (p < 0.001). This study showed the excellent intra-rater and inter-rater reliability and high concurrent validity of the TIS in assessing children with spastic diplegic cerebral palsy.


2020 ◽  
Author(s):  
Yuen Yee Alice Chiu ◽  
Chun Wai Lo ◽  
Chi Kuk Connie Hui ◽  
Wai Chong Susanna Choi ◽  
So Lun Lee ◽  
...  

Abstract Background Duchenne muscular dystrophy is a genetic disease leading to progressive muscle weakness and degeneration. Effective assessment tool is needed to allow monitoring of progress to guide the management. This study assessed the reliability and validity of the Performance of Upper Limb (PUL) Module when used in patients with Duchenne Muscular Dystrophy (DMD). MethodsTotal thirty-three Chinese DMD patients were included. Twenty-five video-recorded PUL Module version 1.3 assessments were performed for the recruited patients with three raters evaluated the same recorded video for inter-rater reliability and evaluated the same performance one month later for intra-rater reliability. Construct validity was assessed correlating the PUL Module scores with the patients’ age, their forced vital capacity (N=25) and their Hammersmith motor scale scores (N=25) performed on the same day. ResultsThe intra-rater and inter-rater reliability (ICC 0.92 - 0.99), internal consistency (Cronbach’s alpha 0.97 - 0.99) and known groups validity (AUC 0.97) of PUL module were excellent. PUL was negatively correlated with age (r = -0.912), and positively correlated with the forced vital capacity (r = 0.87) and the Hammersmith motor scale (r = 0.84). The findings confirm the high reliability and validity of PUL module, and its high clinical relevancy in monitoring the deteriorating upper limb motor performance that strongly correlated with the lung function and generalized motor performance as age increased in DMD. ConclusionThis first study of PUL module in Chinese patients with DMD confirmed that it is a reliable valid tool to monitor clinical progress and outcome for DMD.


Sign in / Sign up

Export Citation Format

Share Document