A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing

Objective: There is a paucity of validation evidence for assessing clinical case-presentations by Doctor of Pharmacy (PharmD) students. Within Kane’s Framework for Validation, evidence for inferences of scoring and generalization should be generated first. Thus, our objectives were to characterize and improve scoring, as well as build initial generalization evidence, in order to provide validation evidence for performance-based assessment of clinical case-presentations. Design: Third-year PharmD students worked up patient-cases from a local hospital. Students orally presented and defended their therapeutic care-plan to pharmacist preceptors (evaluators) and fellow students. Evaluators scored each presentation using an 11-item instrument with a 6-point rating-scale. In addition, evaluators scored a global-item with a 4-point rating-scale. Rasch Measurement was used for scoring analysis, while Generalizability Theory was used for generalization analysis. Findings: Thirty students each presented five cases that were evaluated by 15 preceptors using an 11-item instrument. Using Rasch Measurement, the 11-item instrument’s 6-point rating-scale did not work; it only worked once collapsed to a 4-point rating-scale. This revised 11-item instrument also showed redundancy. Alternatively, the global-item performed reasonably on its own. Using multivariate Generalizability Theory, the g-coefficient (reliability) for the series of five case-presentations was 0.76 with the 11-item instrument, and 0.78 with the global-item. Reliability was largely dependent on multiple case-presentations and, to a lesser extent, the number of evaluators per case-presentation. Conclusions: Our pilot results confirm that scoring should be simple (scale and instrument). More specifically, the longer 11-item instrument measured but had redundancy, whereas the single global-item provided measurement over multiple case-presentations. Further, acceptable reliability can be balanced between more/fewer case-presentations and using more/fewer evaluators.

Download Full-text

Investigation of standardized patient ratings of humanistic competence on a medical licensure examination using Many-Facet Rasch Measurement and generalizability theory

Advances in Health Sciences Education ◽

10.1007/s10459-012-9433-5 ◽

2012 ◽

Vol 18 (5) ◽

pp. 929-944 ◽

Cited By ~ 3

Author(s):

Xiuyuan Zhang ◽

William L. Roberts

Keyword(s):

Generalizability Theory ◽

Rasch Measurement ◽

Standardized Patient ◽

Licensure Examination

Download Full-text

Quality control of an OSCE using generalizability theory and many-faceted Rasch measurement

Advances in Health Sciences Education ◽

10.1007/s10459-007-9060-8 ◽

2007 ◽

Vol 13 (4) ◽

pp. 479-493 ◽

Cited By ~ 37

Author(s):

Cherdsak Iramaneerat ◽

Rachel Yudkowsky ◽

Carol M. Myford ◽

Steven M. Downing

Keyword(s):

Quality Control ◽

Generalizability Theory ◽

Rasch Measurement

Download Full-text

A Comparison of Generalizability Theory and Many Facet Rasch Measurement in an Analysis of Mathematics Creative Problem Solving Test

Journal of Curriculum and Evaluation ◽

10.29221/jce.2016.19.2.251 ◽

2016 ◽

Vol 19 (2) ◽

pp. 251-279

Author(s):

Moonsoo Lee ◽

Dongchun Cha

Keyword(s):

Problem Solving ◽

Generalizability Theory ◽

Rasch Measurement ◽

Creative Problem Solving ◽

Creative Problem

Download Full-text

An Application of Generalizability Theory and Many-Facet Rasch Measurement Using a Complex Problem-Solving Skills Assessment

Educational and Psychological Measurement ◽

10.1177/0013164404263876 ◽

2004 ◽

Vol 64 (4) ◽

pp. 617-639 ◽

Cited By ~ 26

Author(s):

Everett V. Smith ◽

Jonna M. Kulikowich

Keyword(s):

Problem Solving ◽

Generalizability Theory ◽

Complex Problem ◽

Rasch Measurement ◽

Skills Assessment ◽

Complex Problem Solving ◽

Problem Solving Skills

Download Full-text

Applying Generalizability Theory to Optimize Analysis of Spontaneous Teacher Talk in Elementary Classrooms

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-19-00118 ◽

2020 ◽

Vol 63 (6) ◽

pp. 1947-1957

Author(s):

Alexandra Hollo ◽

Johanna L. Staubitz ◽

Jason C. Chow

Keyword(s):

Special Education Teachers ◽

Generalizability Theory ◽

Child Outcomes ◽

Elementary Classrooms ◽

Teacher Talk ◽

Group Instruction ◽

Data Set ◽

School Year ◽

Minimum Number ◽

Representative Samples

Purpose Although sampling teachers' child-directed speech in school settings is needed to understand the influence of linguistic input on child outcomes, empirical guidance for measurement procedures needed to obtain representative samples is lacking. To optimize resources needed to transcribe, code, and analyze classroom samples, this exploratory study assessed the minimum number and duration of samples needed for a reliable analysis of conventional and researcher-developed measures of teacher talk in elementary classrooms. Method This study applied fully crossed, Person (teacher) × Session (samples obtained on 3 separate occasions) generalizability studies to analyze an extant data set of three 10-min language samples provided by 28 general and special education teachers recorded during large-group instruction across the school year. Subsequently, a series of decision studies estimated of the number and duration of sessions needed to obtain the criterion g coefficient ( g > .70). Results The most stable variables were total number of words and mazes, requiring only a single 10-min sample, two 6-min samples, or three 3-min samples to reach criterion. No measured variables related to content or complexity were adequately stable regardless of number and duration of samples. Conclusions Generalizability studies confirmed that a large proportion of variance was attributable to individuals rather than the sampling occasion when analyzing the amount and fluency of spontaneous teacher talk. In general, conventionally reported outcomes were more stable than researcher-developed codes, which suggests some categories of teacher talk are more context dependent than others and thus require more intensive data collection to measure reliably.

Download Full-text

Beurteilerübereinstimmung und Beurteilerstrenge

Diagnostica ◽

10.1026/0012-1924.50.2.65 ◽

2004 ◽

Vol 50 (2) ◽

pp. 65-77 ◽

Cited By ~ 11

Author(s):

Thomas Eckes

Keyword(s):

Item Response ◽

Rasch Measurement

Zusammenfassung. Leistungsbeurteilungen unterliegen einer Reihe von Urteilsfehlern, die ihre Genauigkeit und Validität erheblich mindern können. Ein besonders kritischer Urteilsfehler ist die Tendenz zur Strenge bzw. Milde. In der vorliegenden Arbeit wird mit der Multifacetten-Rasch-Analyse (“many-facet Rasch measurement“; Linacre, 1989 ; Linacre & Wright, 2002 ) ein Item-Response-Modell vorgestellt, das Messungen der Strenge bzw. Milde eines jeden Beurteilers erlaubt und die ermittelten Strengemaße zusammen mit den Fähigkeitsmaßen der beurteilten Personen und den Schwierigkeitsmaßen der Aufgaben oder Beurteilungskriterien in einen gemeinsamen Bezugsrahmen stellt. Das Modell ermöglicht ferner eine um die Strenge der Beurteiler korrigierte Leistungsmessung. Mittels dieses Ansatzes werden im Rahmen des “Test Deutsch als Fremdsprache“ (TestDaF) Beurteilungen analysiert, die je 2 von insgesamt 29 Beurteilern zu Leistungen von 1359 Pbn im schriftlichen Ausdruck nach 3 Kriterien abgegeben haben. Die Gruppe der Beurteiler erweist sich als sehr heterogen, so dass eine Strengekorrektur der Urteile geboten ist. Abschließend werden verschiedene Implikationen des Multifacetten-Rasch-Modells für die Evaluation von Leistungsbeurteilungen diskutiert.

Download Full-text

Positive Associations Primacy in the IAT

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000106 ◽

2011 ◽

Vol 58 (5) ◽

pp. 376-384 ◽

Cited By ~ 10

Author(s):

Pasquale Anselmi ◽

Michelangelo Vianello ◽

Egidio Robusto

Keyword(s):

Implicit Association Test ◽

Rasch Measurement ◽

Normal Weight ◽

Association Test ◽

Primacy Effect ◽

Implicit Association ◽

Obese People ◽

Measurement Analysis ◽

Implicit Prejudice

Two studies investigated the different contribution of positive and negative associations to the size of the Implicit Association Test (IAT) effect. A Many-Facet Rasch Measurement analysis was applied for the purpose. Across different IATs (Race and Weight) and different groups of respondents (White, Normal weight, and Obese people) we observed that positive words increase the IAT effect whereas negative words tend to decrease it. Results suggest that the IAT is influenced by a positive associations primacy effect. As a consequence, we argue that researchers should be careful when interpreting IAT effects as a measure of implicit prejudice.

Download Full-text

Review of Generalizability Theory: Inferences and Practical Applications. New Directions for Testing and Measurement, No. 18, June 1983.

Contemporary Psychology ◽

10.1037/023065 ◽

1984 ◽

Vol 29 (7) ◽

pp. 597-597

Author(s):

William L. Hays

Keyword(s):

Generalizability Theory ◽

Practical Applications ◽

New Directions ◽

Testing And Measurement

Download Full-text