Cross-institutional OSCE Quality Assurance in Europe using a mutual assessment strategy; are we equipped for it?

2020 ◽  
Author(s):  
Thomas JB Kropmans ◽  
Eirik Søfteland ◽  
Anneke Wijnalda ◽  
Marie Thoresen ◽  
Magnus Hultin ◽  
...  

Abstract BackgroundUntil 2008, Objective Structured Clinical Examinations (OSCE) were well researched, laborious and costly paper based method of exam delivery restricting international comparison. Cross-institutional comparison of OSCE Quality Assurance in Europe has never been done and due to wide spread electronic assessment analysis is now available.MethodsTwenty educational institutions across Europe using an electronic OSCE Management Information System where invited of which 8 confirmed to join a mutual comparison of Quality Assurance outcome. Two theories evaluate the quality of the observed test scores, the Classical psychometric theory (Cronbach’s alpha) and the Generalizability theory. Outcomes for both were compared for all universities including the Standard Error of Measurement (SEM) as well as cut-scores, Pass/Fail score and Global Rating Scores, Cronbach’s Alpha and related SEM (68% and 95% CI) and G-theory Coefficients with related absolute and relative SEM (68% and 95% CI). ResultsOutcomes differ between participating universities and observed marks contradict global rating of fail, borderline and excellent performance. G-theory coefficients and Standard Error of Measurement were lower and smaller compared to the classical approach using Cronbach’s Alpha as measure of reliability. The Classical psychometric based SEM varies from 2.8% to 11.2% respectively whereas the 95% CI equivalent varied from 9.2% up to 22% (on a 0 - 100% scale). The relative SEM from G-theory analysis varied from 3.1% to 7.0% for criterion-referenced marks, and the absolute SEM for norm-referenced marks varied from 3.8% to 7.8% respectively. The 95% CI around the relative and absolute SEMs values varied from 7.3% to 15.3%. More students failed the examination if the 95% CI is applied to the observed scores.ConclusionTo protect society and to improve educational decision making, the Standard Error of Measurement and associated confidence intervals needs to be embedded in EU assessment strategies to rule out ‘false positive Pass decisions’.

2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Marcos Serrano-Dueñas ◽  
Luis Masabanda ◽  
Maria-Rosario Luquin

Objective. This study has been designed with the aim of using optimal scaling to perform the allocation of scores and to be able to construct an indicator of the Parkinson’s Disease Gravity Index. Scores were assigned to interrelated dimensions that share information about the patient’s situation, to have an objective, holistic tool which integrates scores so that doctors can have a comprehensive idea of the patient’s situation. Patients and Methods. 120 consecutive patients with Parkinson’s diagnosis were chosen according to the United Kingdom Parkinson’s Disease Society Brain Bank criteria. Subsequently, all the chosen dimensions were transformed into interval variables for which the formula proposed by Sturges was used. Once the dimensions were transformed into interval variables, optimal scaling was carried out. Subsequently, the following attributes were analyzed: quality and acceptability of the data; reliability: internal consistency, reliability index, Cronbach’s alpha, and standard error of measurement; finally, validity: convergent validity and validity for known groups. Results. There were no missing data. An appropriate Cronbach’s alpha value of 0.71 was gathered, and all items were found to be pertinent to the scale. The item homogeneity index was 0.36. Precision evaluated with the standard error of measurement was 7.8. The Parkinson’s Disease Gravity Index discriminant validity (validity for known groups), assessed among the different stages of Hoehn and Yahr scale by the Kruskal–Wallis test, showed major significance (X2 = 32.7, p ≤ 0.001 ). Conclusions. The Parkinson’s Disease Gravity Index has shown adequate metric properties.


2014 ◽  
Vol 49 (3) ◽  
pp. 373-380 ◽  
Author(s):  
Mark R. Lafave ◽  
Larry Katz

Context: Health care professions have replaced traditional multiple choice tests or essays with structured and practical, performance-based examinations with the hope of eliminating rater bias and measuring clinical competence. Objective: To establish the validity and reliability of the Standardized Orthopedic Assessment Tool (SOAT) as a measure of clinical competence of orthopaedic injury evaluation. Design: Descriptive laboratory study. Setting: University. Patients or Other Participants: A total of 60 undergraduate students and 11 raters from 3 Canadian universities and 1 standardized patient. Intervention(s): Students were required to complete a 30-minute musculoskeletal evaluation in 1 of 2 randomly assigned mock scenarios involving the knee (second-degree medial collateral ligament sprain) or the shoulder (third-degree supraspinatus muscle strain). Main Outcome Measure(s): We measured interreliability with an intraclass correlation coefficient (ICC) (2,k) and stability of the tool with standard error of measurement and confidence intervals. Agreement was measured using Bland-Altman plots. Concurrent validity was measured using a Pearson product moment correlation coefficient whereby the raters' global rating of a student was matched to the cumulative mean grade score. Results: The ICCs were 0.75 and 0.82 for the shoulder and knee cases, respectively. Bland-Altman plots indicated no systematic bias between raters. In addition, Pearson product moment correlation analysis demonstrated a strong relationship between the overall cumulative mean grade score and the global rating score of the examinees' performances. Conclusions: This study demonstrated good interrater reliability of the SOAT with a standard error of measurement that indicated very modest stability, strong agreement between raters, and correlation indicative of concurrent validity.


2020 ◽  
Vol 9 ◽  
pp. 1754
Author(s):  
Masoumeh Fazeli Tarmazdi ◽  
Zahra Tagharrobi ◽  
Zahra Sooki ◽  
Khadijeh Sharifi

Background: The first step to successful aging planning is to assess the current status using valid instruments. This study aimed to evaluate the psychometric properties of the Persian version of the Successful Aging Inventory (SAI). Materials and Methods: In the first step, SAI. was translated through forward-backward translation, and its face and content validity were qualitatively and quantitatively assessed. For construct validity assessment, 300 elderly were recruited through multi-stage random sampling. Exploratory factor analysis and known-group comparison were used. SAI reliability through internal consistency and stability was assessed using the Cronbach’s alpha values of the inventory and intraclass correlation coefficient (ICC), respectively. The standard error of measurement, smallest detectable change, and floor and ceiling effects were calculated. Results: The impact scores, content validity ratios, and content validity indices of all items were more than 1.5, 0.62, and 0.8, respectively. The scale-level content validity index was 0.94. Factor analysis identified four factors for the inventory, which explained 58.17% of the total variance of the SAI score. SAI mean score among mentally healthy participants was significantly higher (P<0.001). The relative frequencies with the lowest and highest possible scores of SAI were 0 and 3.7%, respectively. The Cronbach’s alpha, ICC, standard error of measurement, and the smallest detectable change of SAI were 0.835, 0.999, ±0.47, and 1.9, respectively. Conclusion: As a valid and reliable instrument, the Persian version of SAI could be used for a successful aging assessment. [GMJ.2020;9:e1754]


1966 ◽  
Vol 19 (2) ◽  
pp. 611-617 ◽  
Author(s):  
Donald W. Zimmerman ◽  
Richard H. Williams

It is shown that for the case of non-independence of true scores and error scores interpretation of the standard error of measurement is modified in two ways. First, the standard deviation of the distribution of error scores is given by a modified equation. Second, the confidence interval for true score varies with the individual's observed score. It is shown that the equation, so=√[(N−O/a]+[so2(roō−roo)/roō]̄, where N is the number of items, O is the individual's observed score, a is the number of choices per item, so2 is observed variance, roo is test reliability as empirically determined, and roō is reliability for the case where only non-independent error is present, provides a more accurate interpretation of the test score of an individual.


1993 ◽  
Vol 2 (2) ◽  
pp. 97-103 ◽  
Author(s):  
Kelly R. Holcomb ◽  
Cheryl A. Skaggs ◽  
Teddy W. Worrell ◽  
Mark DeCarlo ◽  
K. Donald Shelbourne

A paucity of information exists concerning reliability of the KT-1000 knee arthrometer (MEDmetric Corp., San Diego, CA) when used by different clinicians to assess the same anterior cruciate ligament-deficient patient. The purpose of this study was to determine the reliability and standard error of measurement of four clinicians who routinely report KT-1000 arthrometer values to referring orthopedic surgeons. Two physical therapists and two athletic trainers performed anterior laxity tests using the KT-1000 on 19 subjects. Intraclass correlation coefficients (ICC) and standard error of measurement (SEM) were used to determine reliability. Intratester ICC ranged from .98 to 1.0 and intratesterSEMranged from 0.0 to .28 mm. Intertester ICC andSEMfor all four testers were .53 and 1.2 mm, respectively. A 95% confidence interval (M ± 1.96 ×SEM) of the intertester variability ranged from −0.18 to 4.52 mm. Therefore, large intertester variation existed in KT-1000 values. Each facility should standardize testing procedures and establish intratester and intertester reliability for all clinicians reporting KT-1000 values.


Author(s):  
Víctor Rodríguez-Rielves ◽  
Alejandro Martínez-Cava ◽  
Ángel Buendía-Romero ◽  
José Ramón Lillo-Beviá ◽  
Javier Courel-Ibáñez ◽  
...  

Purpose: To examine the reproducibility (intradevice and interdevice agreement) of the Rotor 2INpower device under a wide range of cycling conditions. Methods: Twelve highly trained male cyclists and triathletes completed 5 cycling tests, including graded exercise tests at different cadences (70–100 rpm), workloads (100–650 W), pedaling positions (seated and standing), and vibration conditions (20–40 Hz) and an 8-second maximal sprint (>1000 W). An intradevice analysis included a comparison between the power output registered by 3 units of Rotor 2INpower, whereas the power output provided by each one of these units and the gold-standard SRM crankset were compared for the interdevice analysis. Among others, statistical calculations included the standard error of measurement, expressed in absolute (in watts) and relative terms as the coefficient of variation (CV). Results: Except for the graded exercise test seated at 100 rpm/100 W (CV = 10.2%), the intradevice analysis showed an acceptable magnitude of error (CV ≤ 6.9%, standard error of measurement ≤ 12.3 W) between the 3 Rotor 2INpower. Similarly, these 3 units showed an acceptable agreement with the gold standard in all graded exercise test situations (CV ≤ 4.0%, standard error of measurement ≤ 13.1 W). On the other hand, both the intradevice and interdevice agreements proved to be slightly reduced under high cadences (intradevice: CV ≤ 10.2%; interdevice: CV ≤ 4.0%) and vibration (intradevice: CV ≤ 4.0%; interdevice: CV ≤ 3.6%), as well as during standing pedaling (intradevice: CV ≤ 4.1%; interdevice: CV ≤ 2.5%). Although within the limits of an acceptable agreement, measurement errors increased during the sprint tests (CV ≤ 7.4%). Conclusions: Based on these results, the Rotor 2INpower could be considered a reproducible tool to monitor power output in most cycling situations.


Author(s):  
Andrea Berger

Inhibition of Return (IOR) is a mechanism whereby the attentional system favors novel locations by inhibiting already scanned ones. In spatial attention tasks, it commonly occurs when the interval between cue onset and target onset is longer than 300 ms. The positive difference between reactions in the valid condition and those in the invalid one shows that responses to target stimuli are slower following a valid cue than responses to target stimuli following an invalid cue. IOR is a very robust phenomenon at the group mean level; however, this study demonstrates that its standard error of measurement is extremely high, which seriously challenges any attempt to interpret an individual score as representing the characteristics of a subject's attention system. Furthermore, this reliability problem might diminish the likelihood of finding differences between groups and conditions. The study shows that these problems may be partially corrected by employing the back-to-center paradigm.


Sign in / Sign up

Export Citation Format

Share Document