Developing rating scales for the assessment of second language performance

Abstract The two most common approaches to rating second language performance pose problems of reliability and validity. An alternative method utilizes rating scales that are empirically derived from samples of learner performance; these scales define boundaries between adjacent score levels rather than provide normative descriptions of ideal performances; the rating process requires making two or three binary choices about a language performance being rated. A procedure, that consists of a series of five explicit tasks, is used to construct a rating scale. The scale is designed for use with a specific population and a specific test task. A group of primary school ESL teachers used this procedure to make two speaking tests, including elicitation tasks and rating scales, for use in their school district. The tests were administered to 255 sixth grade learners. The scales were found to be highly accurate for scoring short speech samples, and were quite efficient in time required for scale development and rater training. Scales exhibit content relevance in the instructional setting. Development of this type of scale is recommended for use in high-stakes assessment.

Download Full-text

Listening to the Voices of Rating Scale Developers: Identifying Salient Features for Second Language Performance Assessment

Canadian Modern Language Review/ La Revue canadienne des langues vivantes ◽

10.3138/cmlr.56.4.555 ◽

2000 ◽

Vol 56 (4) ◽

pp. 555-584 ◽

Cited By ~ 17

Author(s):

Carolyn Turner

Keyword(s):

Second Language ◽

Performance Assessment ◽

Rating Scale ◽

Language Performance ◽

Salient Features ◽

Second Language Performance

Download Full-text

Development and reliability of a standard rating system for outcome measurement of foot and ankle disorders II: interclinician andintraclinician reliability and validity of the newly established standard rating scales and Japanese Orthopaedic Association rating scale

Journal of Orthopaedic Science ◽

10.1007/s00776-005-0937-1 ◽

2005 ◽

Vol 10 (5) ◽

pp. 466-474 ◽

Cited By ~ 96

Author(s):

Hisateru Niki ◽

Haruhito Aoki ◽

Suguru Inokuchi ◽

Satoru Ozeki ◽

Mitsuo Kinoshita ◽

...

Keyword(s):

Rating Scales ◽

Rating Scale ◽

Outcome Measurement ◽

Japanese Orthopaedic Association ◽

Reliability And Validity ◽

Rating System ◽

Foot And Ankle

Download Full-text

Circadian rhythms and second language performance

Studies in Second Language Learning and Teaching ◽

10.14746/ssllt.2017.7.1.3 ◽

2017 ◽

Vol 7 (1) ◽

pp. 47-60

Author(s):

Kees De Bot ◽

Fang Fang

Keyword(s):

Second Language ◽

Language Learning ◽

Lexical Access ◽

Language Processing ◽

Language Performance ◽

Intraindividual Variation ◽

The Hours ◽

L1 And L2 ◽

The Impact ◽

Second Language Performance

Human behavior is not constant over the hours of the day, and there are considerable individual differences. Some people raise early and go to bed early and have their peek performance early in the day (“larks”) while others tend to go to bed late and get up late and have their best performance later in the day (“owls”). In this contribution we report on three projects on the role of chronotype (CT) in language processing and learning. The first study (de Bot, 2013) reports on the impact of CT on language learning aptitude and word learning. The second project was reported in Fang (2015) and looks at CT and executive functions, in particular inhibition as measured by variants of the Stroop test. The third project aimed at assessing lexical access in L1 and L2 at preferred and non-preferred times of the day. The data suggest that there are effects of CT on language learning and processing. There is a small effect of CT on language aptitude and a stronger effect of CT on lexical access in the first and second language. The lack of significance for other tasks is mainly caused by the large interindividual and intraindividual variation.

Download Full-text

Comments on Wesche: Second language performance testing: the Ontario Test of ESL as an example

Language Testing ◽

10.1177/026553228700400104 ◽

1987 ◽

Vol 4 (1) ◽

pp. 48-50

Author(s):

lan Seaton

Keyword(s):

Second Language ◽

Performance Testing ◽

Language Performance ◽

Second Language Performance

Download Full-text

THE EFFECTS OF DIFFERENT LENGTHS OF TIME FOR PLANNING ON SECOND LANGUAGE PERFORMANCE

Studies in Second Language Acquisition ◽

10.1017/s0272263198001041 ◽

1998 ◽

Vol 20 (1) ◽

pp. 83-108 ◽

Cited By ~ 148

Author(s):

Uta Mehnert

Keyword(s):

Second Language ◽

Time Complexity ◽

Control Group ◽

Language Performance ◽

Planning Time ◽

Speech Performance ◽

Planning Condition ◽

Lexical Density ◽

Time Accuracy ◽

Second Language Performance

This article reports on a study that investigated the effect of different amounts of planning time on the speech performance of L2 speakers. Subjects were 4 groups of learners of German (31 in total) performing 2 tasks each. The tasks varied in the degree of structure they contained and the familiarity of information they tapped. The control group had no planning time available; the 3 experimental groups had 1, 5, and 10 minutes of planning time, respectively, before they started speaking. Results show fluency and lexical density of speech increase as a function of planning time. Accuracy of speech improved with only 1 minute planning but did not increase with more planning time. Complexity of speech was significantly higher for the 10-minute planning condition only. No significant differences were found for the effect of planning on the different tasks. This study employed various general and specific constructs for measuring fluency, complexity, and accuracy of speech. The interrelationships and qualities of these measures are also investigated and discussed.

Download Full-text

Task Design and Second Language Performance: The Effect of Narrative Type on Learner Output

Language Learning ◽

10.1111/j.1467-9922.2011.00642.x ◽

2011 ◽

Vol 61 ◽

pp. 37-72 ◽

Cited By ~ 22

Author(s):

Parvaneh Tavakoli ◽

Pauline Foster

Keyword(s):

Second Language ◽

Task Design ◽

Language Performance ◽

Second Language Performance

Download Full-text

The Tversky Condition Applied to Rating Scales

Psychological Reports ◽

10.2466/pr0.1996.78.3.891 ◽

1996 ◽

Vol 78 (3) ◽

pp. 891-898

Author(s):

Michael S. Trevisan ◽

F. Leon Paulson

Keyword(s):

Rating Scales ◽

Empirical Investigation ◽

Rating Scale ◽

Reliability And Validity ◽

Response Category ◽

Third Grade ◽

Testing Time ◽

Response Format ◽

Third Grade Students ◽

Reliability Coefficients

This study is the first empirical investigation of the 1964 Tversky condition applied to rating scales. The Tversky condition posits that the 3-response format will be optimum if testing time is proportional to the length of the test. To this end, 2-, 3-, 4-, and 5-response category forms of a 10-item measure of attitudes in science were randomly administered to 241 third grade students. Reliability and validity were computed for each form. No significant differences were found among the reliability coefficients or among the validity coefficients. The Tversky condition was not confirmed for rating scales. These findings are consistent with results from other studies regarding the lack of substantial differences among reliability and validity coefficients as the number of response categories in a rating scale are varied.

Download Full-text

Developments in Validity Research in Second Language Performance Testing

The Buckingham Journal of Language and Linguistics ◽

10.5750/bjll.v2i0.15 ◽

2010 ◽

Vol 2 ◽

pp. 61-68

Author(s):

Hacer Hande Uysal

Keyword(s):

Second Language ◽

Critical Evaluation ◽

Performance Testing ◽

Language Performance ◽

Future Directions ◽

Theoretical Approaches ◽

Critical Approaches ◽

Assessment Performance ◽

Second Language Performance ◽

Validity Research

The present paper aims to provide a short historical overview of the theoretical developments in validity research in second language performance testing. A comparative description and critical evaluation of different views such as the “Trinitarian approach” versus the construct validity model; “uniform approach,” versus “unified approach” as well as alternative and critical approaches to validation in L2 performance testing are presented. These various theoretical approaches are introduced in terms of their definitions of the validity concept, their suggested requirements for the validity research, and their attitudes towards reliability and theory while making interpretations of test scores. The paper also focuses on the current problems with the applicability of these theoretical approaches, and discusses future directions in validity research. Key words: Second language assessment, performance assessment, validity, reliability, validity research

Download Full-text