Implementation of the ACS/ APDS Resident Skills Curriculum reveals a need for rater training: An analysis using generalizability theory

Author(s):  
Kristen M. Jogerst ◽  
Chalerm Eurboonyanun ◽  
Yoon Soo Park ◽  
Douglas Cassidy ◽  
Sophia K. McKinley ◽  
...  
Author(s):  
Alice A. Edler ◽  
Ruth G. Fanning ◽  
Michael. I. Chen ◽  
Rebecca Claure ◽  
Dondee Almazan ◽  
...  

High-fidelity patient simulation (HFPS) has been hypothesized as a modality for assessing competency of knowledge and skill in patient simulation, but uniform methods for HFPS performance assessment (PA) have not yet been completely achieved. Anesthesiology as a field founded the HFPS discipline and also leads in its PA. This project reviews the types, quality, and designated purpose of HFPS PA tools in anesthesiology. We used the systematic review method and systematically reviewed anesthesiology literature referenced in PubMed to assess the quality and reliability of available PA tools in HFPS. Of 412 articles identified, 50 met our inclusion criteria. Seventy seven percent of studies have been published since 2000; more recent studies demonstrated higher quality. Investigators reported a variety of test construction and validation methods. The most commonly reported test construction methods included ?占퐉odified Delphi Techniques??for item selection, reliability measurement using inter-rater agreement, and intra-class correlations between test items or subtests. Modern test theory, in particular generalizability theory, was used in nine (18%) of studies. Test score validity has been addressed in multiple investigations and shown a significant improvement in reporting accuracy. However the assessment of predicative has been low across the majority of studies. Usability and practicality of testing occasions and tools was only anecdotally reported. To more completely comply with the gold standards for PA design, both shared experience of experts and recognition of test construction standards, including reliability and validity measurements, instrument piloting, rater training, and explicit identification of the purpose and proposed use of the assessment tool, are required.


2020 ◽  
Vol 63 (6) ◽  
pp. 1947-1957
Author(s):  
Alexandra Hollo ◽  
Johanna L. Staubitz ◽  
Jason C. Chow

Purpose Although sampling teachers' child-directed speech in school settings is needed to understand the influence of linguistic input on child outcomes, empirical guidance for measurement procedures needed to obtain representative samples is lacking. To optimize resources needed to transcribe, code, and analyze classroom samples, this exploratory study assessed the minimum number and duration of samples needed for a reliable analysis of conventional and researcher-developed measures of teacher talk in elementary classrooms. Method This study applied fully crossed, Person (teacher) × Session (samples obtained on 3 separate occasions) generalizability studies to analyze an extant data set of three 10-min language samples provided by 28 general and special education teachers recorded during large-group instruction across the school year. Subsequently, a series of decision studies estimated of the number and duration of sessions needed to obtain the criterion g coefficient ( g > .70). Results The most stable variables were total number of words and mazes, requiring only a single 10-min sample, two 6-min samples, or three 3-min samples to reach criterion. No measured variables related to content or complexity were adequately stable regardless of number and duration of samples. Conclusions Generalizability studies confirmed that a large proportion of variance was attributable to individuals rather than the sampling occasion when analyzing the amount and fluency of spontaneous teacher talk. In general, conventionally reported outcomes were more stable than researcher-developed codes, which suggests some categories of teacher talk are more context dependent than others and thus require more intensive data collection to measure reliably.


2009 ◽  
Vol 31 (1) ◽  
pp. 81
Author(s):  
Takeaki Kumazawa

Classical test theory (CTT) has been widely used to estimate the reliability of measurements. Generalizability theory (G theory), an extension of CTT, is a powerful statistical procedure, particularly useful for performance testing, because it enables estimating the percentages of persons variance and multiple sources of error variance. This study focuses on a generalizability study (G study) conducted to investigate such variance components for a paper-pencil multiple-choice vocabulary test used as a diagnostic pretest. Further, a decision study (D study) was conducted to compute the generalizability coefficient (G coefficient) for absolute decisions. The results of the G and D studies indicated that 46% of the total variance was due to the items effect; further, the G coefficient for absolute decisions was low. 古典的テスト理論は尺度の信頼性を測定するため広く用いられている。古典的テスト理論の応用である一般化可能性理論(G理論)は特にパフォーマンステストにおいて有効な分析手法であり、受験者と誤差の要因となる分散成分の割合を測定することができる。本研究では診断テストとして用いられた多岐選択式語彙テストの分散成分を測定するため一般化可能性研究(G研究)を行った。さらに、決定研究(D研究)では絶対評価に用いる一般化可能性係数を算出した。G研究とD研究の結果、項目の分散成分が全体の分散の46%を占め、また信頼度指数は高くなかった。


Sign in / Sign up

Export Citation Format

Share Document