Angoff and Nedelsky Standard Setting Procedures: Implications for the Validity of Proficiency Test Score Interpretation

1982 ◽  
Vol 42 (1) ◽  
pp. 247-255 ◽  
Author(s):  
Peter Behuniak ◽  
Francis X. Archambault ◽  
Robert K. Gable
2016 ◽  
Vol 41 (6 (Suppl. 2)) ◽  
pp. S74-S82 ◽  
Author(s):  
Bruno D. Zumbo

A critical step in the development and use of tests of physical fitness for employment purposes (e.g., fitness for duty) is to establish 1 or more cut points, dividing the test score range into 2 or more ordered categories reflecting, for example, fail/pass decisions. Over the last 3 decades elaborated theories and methods have evolved focusing on the process of establishing 1 or more cut-scores on a test. This elaborated process is widely referred to as “standard-setting”. As such, the validity of the test score interpretation hinges on the standard-setting, which embodies the purpose and rules according to which the test results are interpreted. The purpose of this paper is to provide an overview of standard-setting methodology. The essential features, key definitions and concepts, and various novel methods of informing standard-setting will be described. The focus is on foundational issues with an eye toward informing best practices with new methodology. Throughout, a case is made that in terms of best practices, establishing a test standard involves, in good part, setting a cut-score and can be conceptualized as evidence/data-based policy making that is essentially tied to test validity and an evidential trail.


2021 ◽  
pp. 026553222110107
Author(s):  
Simon Davidson

This paper investigates what matters to medical domain experts when setting standards on a language for specific purposes (LSP) English proficiency test: the Occupational English Test’s (OET) writing sub-test. The study explores what standard-setting participants value when making performance judgements about test candidates’ writing responses, and the extent to which their decisions are language-based and align with the OET writing sub-test criteria. Qualitative data is a relatively under-utilized component of standard setting and this type of commentary was garnered to gain a better understanding of the basis for performance decisions. Eighteen doctors were recruited for standard-setting workshops. To gain further insight, verbal reports in the form of a think-aloud protocol (TAP) were employed with five of the 18 participants. The doctors’ comments were thematically coded and the analysis showed that participants’ standard-setting judgements often aligned with the OET writing sub-test criteria. An overarching theme, ‘Audience Recognition’, was also identified as valuable to participants. A minority of decisions were swayed by features outside the OET’s communicative construct (e.g., clinical competency). Yet, overall, findings indicated that domain experts were undeniably focused on textual features associated with what the test is designed to assess and their views were vitally important in the standard-setting process.


1996 ◽  
Vol 39 (4) ◽  
pp. 697-713 ◽  
Author(s):  
Marilyn E. Demorest ◽  
Lynne E. Bernstein ◽  
Gale P. DeHaven

Ninety-six adults with normal hearing viewed three types of recorded speechreading materials (consonant-vowel nonsense syllables, isolated words, and sentences) on 2 days. Responses to nonsense syllables were scored for syllables correct and syllable groups correct; responses to words and sentences were scored in terms of words correct, phonemes correct, and an estimate of visual distance between the stimulus and the response. Generalizability analysis was used to quantify sources of variability in performance. Subjects and test items were important sources of variability for all three types of materials; effects of talker and day of testing varied but were comparatively small. For each type of material, alternative models of test construction and test-score interpretation were evaluated through estimation of generalizability coefficients as a function of test length. Performance on nonsense syllables correlated about .50 with both word and sentence measures, whereas correlations between words and sentences typically exceeded .80.


1999 ◽  
Vol 15 (3) ◽  
pp. 258-269 ◽  
Author(s):  
Norbert K. Tanzer ◽  
Catherine Q.E. Sim

Summary: To facilitate the development of valid multicultural/multilingual tests, the International Test Commission (ITC) prepared the ITC Guidelines on Test Adaptations. This paper reviews the current version (cf. Van de Vijver & Hambleton, 1996 ), which consists of 22 guidelines on recommended practices pertaining to context, development, administration, documentation, and test-score interpretation, by identifying key principles in test adaptations and comparing them to a content analysis of the ITC Guidelines. The content analysis revealed a number of inconsistencies and ambiguities in a few guidelines, and proposals for reformulating them are given. A checklist to supplement the more narrative guidelines would also be helpful. Nevertheless, the review clearly demonstrates that the ITC Guidelines on Test Adaptations address key principles in test adaptations and constitute a significant standard or “code of conduct” in this field.


2014 ◽  
Author(s):  
Keith A. Markus ◽  
Denny Borsboom

Sign in / Sign up

Export Citation Format

Share Document