Development and Validation of a Cognitive Diagnostic Assessment with Ordered Multiple-Choice Items for Addition of Time

Author(s):  
Huan Chin ◽  
Cheng Meng Chew ◽  
Hooi Lian Lim ◽  
Lei Mee Thien
2006 ◽  
Vol 11 (1) ◽  
pp. 33-63 ◽  
Author(s):  
Derek Briggs ◽  
Alicia Alonzo ◽  
Cheryl Schwab ◽  
Mark Wilson

2021 ◽  
pp. 026553222199547
Author(s):  
Shangchao Min ◽  
Lianzhen He

In this study, we present the development of individualized feedback for a large-scale listening assessment by combining standard setting and cognitive diagnostic assessment (CDA) approaches. We used the performance data from 3358 students’ item-level responses to a field test of a national EFL test primarily intended for tertiary-level EFL learners. The results showed that proficiency classifications and subskill mastery classifications were generally of acceptable reliability, and the two kinds of classifications were in alignment with each other at individual and group levels. The outcome of the study is a set of descriptors that describe each test taker’s ability to understand certain level of oral texts and his or her cognitive performance. The current study, by illustrating the feasibility of combining standard setting and CDA approaches to produce individualized feedback, contributes to the enhancement of score reporting and addresses the long-standing criticism that large-scale language assessments fail to provide individualized feedback to link assessment with instruction.


Psychometrika ◽  
2021 ◽  
Author(s):  
Qian Wu ◽  
Monique Vanerum ◽  
Anouk Agten ◽  
Andrés Christiansen ◽  
Frank Vandenabeele ◽  
...  

1987 ◽  
Vol 47 (2) ◽  
pp. 513-522 ◽  
Author(s):  
Steven V. Owen ◽  
Robin D. Froman

1990 ◽  
Vol 1990 (1) ◽  
pp. i-29 ◽  
Author(s):  
Randy Elliot Bennett ◽  
Donald A. Rock ◽  
Minhwei Wang

2021 ◽  
pp. 001316442098810
Author(s):  
Stefanie A. Wind ◽  
Yuan Ge

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for detecting rater biases, or differential rater functioning (DRF). The purpose of this study is to illustrate and explore the sensitivity of DRF indices in realistic sparse rating designs that have been documented in the literature that include different types and levels of connectivity among raters and students. The results indicated that it is possible to detect DRF in sparse rating designs, but the sensitivity of DRF indices varies across designs. We consider the implications of our findings for practice related to monitoring raters in performance assessments.


Sign in / Sign up

Export Citation Format

Share Document