Revisiting rating scale development for rater-mediated language performance assessments: Modelling construct and contextual choices made by scale developers

2021 ◽  
pp. 026553222199405
Author(s):  
Ute Knoch ◽  
Bart Deygers ◽  
Apichat Khamboonruang

Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing research for over a decade. In this paper we refine the dominant model of rating scale development by drawing on a corpus of 36 studies identified in a systematic review. We present a model showing the different sources of scale construct in the corpus. In the discussion, we argue that rating scale designers, just like test developers more broadly, need to start by determining the purpose of the test, the relevant policies that guide test development and score use, and the intended score use when considering the design choices available to them. These include considering the impact of such sources on the generalizability of the scores, the precision of the post-test predictions that can be made about test takers’ future performances and scoring reliability. The most important contributions of the model are that it gives rating scale developers a framework to consider prior to starting scale development and validation activities.

2019 ◽  
Vol 24 (3) ◽  
pp. 242-246
Author(s):  
Alberto R. M. Martinez ◽  
Melina P. Martins ◽  
Carlos R. Martins ◽  
Ingrid Faber ◽  
Thiago J. R. Rezende ◽  
...  

Author(s):  
Jiuliang Li ◽  
Qian Wang

AbstractSummary writing is essential for academic success, and has attracted renewed interest in academic research and large-scale language test. However, less attention has been paid to the development and evaluation of the scoring scales of summary writing. This study reports on the validation of a summary rubric that represented an approach to scale development with limited resources out of consideration for practicality. Participants were 83 students and three raters. Diagnostic evaluation of the scale components and categories was based on raters’ perception of their use and the scores of students’ summaries which were analyzed using multifaceted Rasch measurement (MFRM). Correlation analysis revealed significant relationships among the scoring components, but the coefficients among some of the components were over high. MFRM analysis provided evidence in support of the usefulness of the scoring rubric, but also suggested the need of a refinement of the components and categories. According to the raters, the rubric was ambiguous in addressing some crucial text features. This study has implications for summarization task design, scoring scale development and validation in particular.


2021 ◽  
pp. 003022282110162
Author(s):  
Hakan Cengiz ◽  
Omer Torlak

Although it has been widely discussed in the literature, no scale has yet been developed to measure the consumption aspect of death. This study aims to develop a domain-specific death-related status consumption (DRSC) scale to bridge this gap in the field. Results reveal the following three dimensions of the scale: conspicuousness, planning, and showing respect. In four studies, which collate the views of 1,302 participants, both students and adults, the DRSC demonstrates internal consistency and validity across cultures (Turkey, the U.S., and culturally diverse sample). The importance of such a scale for the field is discussed.


Sign in / Sign up

Export Citation Format

Share Document