scholarly journals Reconsidering the Cut Score of Korean National Medical Licensing Examination

Author(s):  
Duck Sun Ahn ◽  
Sowon Ahn

After briefly reviewing theories of standard setting we analyzed the problems of the current cut scores. Then, we reported the results of need assessment on the standard setting among medical educators and psychometricians. Analyses of the standard setting methods of developed countries were reported as well. Based on these findings, we suggested the Bookmark and the modified Angoff methods as alternative methods for setting standard. Possible problems and challenges were discussed when these methods were applied to the National Medical Licensing Examination.

Author(s):  
Janghee Park ◽  
Mi Kyoung Yim ◽  
Na Jin Kim ◽  
Duck Sun Ahn ◽  
Young-Min Kim

Purpose: The Korea Medical Licensing Exam (KMLE) typically contains a large number of items. The purpose of this study was to investigate whether there is a difference in the cut score between evaluating all items of the exam and evaluating only some items when conducting standard-setting.Methods: We divided the item sets that appeared on 3 recent KMLEs for the past 3 years into 4 subsets of each year of 25% each based on their item content categories, discrimination index, and difficulty index. The entire panel of 15 members assessed all the items (360 items, 100%) of the year 2017. In split-half set 1, each item set contained 184 (51%) items of year 2018 and each set from split-half set 2 contained 182 (51%) items of the year 2019 using the same method. We used the modified Angoff, modified Ebel, and Hofstee methods in the standard-setting process.Results: Less than a 1% cut score difference was observed when the same method was used to stratify item subsets containing 25%, 51%, or 100% of the entire set. When rating fewer items, higher rater reliability was observed.Conclusion: When the entire item set was divided into equivalent subsets, assessing the exam using a portion of the item set (90 out of 360 items) yielded similar cut scores to those derived using the entire item set. There was a higher correlation between panelists’ individual assessments and the overall assessments.


Author(s):  
Guemin Lee

National Health Personnel Licensing Examination Board (hereafter NHPLEB) has used 60% correct responses of overall tests and 40% correct responses of each subject area test as a criterion to give physician licenses to satisfactory candidates. The 60%-40% criterion seems reasonable to laypersons without pychometric or measurement knowledge, but it may causes several severe problems on pychometrician's perspective. This paper pointed out several problematic cases that can be encountered by using the 60%-40% criterion, and provided several pychometric alternatives that could overcome these problems. A fairly new approach, named Bookmark standard setting method, was introduced and explained in detail as an example. This paper concluded with five considerations when the NHPLEB decides to adopt a pychometric standard setting approach to set a cutscore for a licensure test like medical licensing examination.


Author(s):  
Mi Kyoung Yim ◽  
Sujin Shin

Purpose: This study explored the possibility of using the Angoff method, in which panel experts determine the cut score of an exam, for the Korean Nursing Licensing Examination (KNLE). Two mock exams for the KNLE were analyzed. The Angoff standard setting procedure was conducted and the results were analyzed. We also aimed to examine the procedural validity of applying the Angoff method in this context.Methods: For both mock exams, we set a pass-fail cut score using the Angoff method. The standard setting panel consisted of 16 nursing professors. After the Angoff procedure, the procedural validity of establishing the standard was evaluated by investigating the responses of the standard setters.Results: The descriptions of the minimally competent person for the KNLE were presented at the levels of general and subject performance. The cut scores of first and second mock exams were 74.4 and 76.8, respectively. These were higher than the traditional cut score (60% of the total score of the KNLE). The panel survey showed very positive responses, with scores higher than 4 out of 5 points on a Likert scale.Conclusion: The scores calculated for both mock tests were similar, and were much higher than the existing cut scores. In the second simulation, the standard deviation of the Angoff rating was lower than in the first simulation. According to the survey results, procedural validity was acceptable, as shown by a high level of confidence. The results show that determining cut scores by an expert panel is an applicable method.


2012 ◽  
Vol 35 (2) ◽  
pp. 173-173 ◽  
Author(s):  
Keh-Min Liu ◽  
Tsuen-Chiuan Tsai ◽  
Shih-Li Tsai

2018 ◽  
Vol 12 (4) ◽  
pp. 15
Author(s):  
Eli Moe ◽  
Hildegunn Lahlum Helness ◽  
Craig Grocott ◽  
Norman Verhelst

Formålet med denne artikkelen er å beskrive framgangsmåten som ble brukt for å bestemme kuttskårer (grenser) mellom tre nivåer i Det europeiske ramme-verket for språk (A2, B1 og B2) på to læringsstøttende lytteprøver i engelsk for Vg1-elever. Målet har vært å undersøke om det er mulig å etablere enighet om kuttskårene, og om standardsetterne som deltok i arbeidet fikk tilstrekkelig opp-læring på forhånd. Videre var det et mål å se på hvilke konsekvenser kuttskårene vil få for fordeling av elever på de ulike rammeverksnivåene. Standardsettingen ble gjennomført med utgangspunkt i pilotdata fra 3199 elever på Vg1, Cito-metoden og 16 panelmedlemmer med god kjennskap til Rammeverkets nivåer. Flere av panelmedlemmene var eller hadde vært lærere i engelsk for elever på 10. trinn eller Vg1. Cito-metoden fungerte bra for å etablere kuttskårer som standardsetterne var forholdsvis enige om. Sluttresultatene viser at målefeilen var relativt liten. Resultatene viser større enighet om kuttskåren mellom nivåene B1 og B2 enn mellom A2 og B1, og dette kan ha en sammenheng med at det ble brukt mer tid på forberedelsesarbeid for B1 og B2. Lærere i panelet som kjenner elevgruppa godt, mener at konsekvensen kutt-skåren har for fordeling av elever på de ulike rammeverksnivåene, stemmer med deres egen vurdering av elevenes lytteferdigheter.Nøkkelord: standardsetting, testsentrert metode, Cito-metoden, standard, kutt-skår, vippekandidatStandard setting for English tests for 11th grade students in NorwayAbstractThis article presents the process used to determine the cut scores between three levels of the Common European Framework of Reference for languages (A2, B1 and B2) for two English listening tests, taken by Norwegian pupils at the 11th grade. The aim was to establish whether agreement can be reached on cut scores and whether the standard setters received enough preparation before the event. Another aim was to examine the potential consequences the cut scores would have for the distribution of pupils across the different levels. The standard setting took place using pilot data from 3199 pupils, the Cito method and 16 panel members with a good knowledge of the framework levels. Some panel members were or had been 10th or 11th grade English teachers. The Cito method worked well for establishing cut scores with which the panel members mostly agreed. The results indicated a small margin of error. The results showed a higher level of agreement for the cut score between B1 and B2 than between A2 and B1, possibly connected to the longer preparation time dedicated to B1 and B2. Teachers on the panel with good knowledge of the pupil base believe that the consequences these cut scores have for the distribution of pupils, correlate with their own experiences of pupils' ability.Keywords: standard setting, test-centered method, the Cito method, standard, cut score, borderline person / minimally competent user


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0257871
Author(s):  
Tabea Feseker ◽  
Timo Gnambs ◽  
Cordula Artelt

In order to draw pertinent conclusions about persons with low reading skills, it is essential to use validated standard-setting procedures by which they can be assigned to their appropriate level of proficiency. Since there is no standard-setting procedure without weaknesses, external validity studies are essential. Traditionally, studies have assessed validity by comparing different judgement-based standard-setting procedures. Only a few studies have used model-based approaches for validating judgement-based procedures. The present study addressed this shortcoming and compared agreement of the cut score placement between a judgement-based approach (i.e., Bookmark procedure) and a model-based one (i.e., constrained mixture Rasch model). This was performed by differentiating between individuals with low reading proficiency and those with a functional level of reading proficiency in three independent samples of the German National Educational Panel Study that included students from the ninth grade (N = 13,897) as well as adults (Ns = 5,335 and 3,145). The analyses showed quite similar mean cut scores for the two standard-setting procedures in two of the samples, whereas the third sample showed more pronounced differences. Importantly, these findings demonstrate that model-based approaches provide a valid and resource-efficient alternative for external validation, although they can be sensitive to the ability distribution within a sample.


2021 ◽  
pp. 014662162110468
Author(s):  
Irina Grabovsky ◽  
Jesse Pace ◽  
Christopher Runyon

We model pass/fail examinations aiming to provide a systematic tool to minimize classification errors. We use the method of cut-score operating functions to generate specific cut-scores on the basis of minimizing several important misclassification measures. The goal of this research is to examine the combined effects of a known distribution of examinee abilities and uncertainty in the standard setting on the optimal choice of the cut-score. In addition, we describe an online application that allows others to utilize the cut-score operating function for their own standard settings.


Sign in / Sign up

Export Citation Format

Share Document