Developing rating scales for the assessment of second language performance
Abstract The two most common approaches to rating second language performance pose problems of reliability and validity. An alternative method utilizes rating scales that are empirically derived from samples of learner performance; these scales define boundaries between adjacent score levels rather than provide normative descriptions of ideal performances; the rating process requires making two or three binary choices about a language performance being rated. A procedure, that consists of a series of five explicit tasks, is used to construct a rating scale. The scale is designed for use with a specific population and a specific test task. A group of primary school ESL teachers used this procedure to make two speaking tests, including elicitation tasks and rating scales, for use in their school district. The tests were administered to 255 sixth grade learners. The scales were found to be highly accurate for scoring short speech samples, and were quite efficient in time required for scale development and rater training. Scales exhibit content relevance in the instructional setting. Development of this type of scale is recommended for use in high-stakes assessment.