scholarly journals Examining Rater Performance on the CELBAN Speaking: A Many-Facets Rasch Measurement Analysis

2020 ◽  
Vol 23 (2) ◽  
pp. 73-95
Author(s):  
Peiyu Wang ◽  
Karen Coetzee ◽  
Andrea Strachan ◽  
Sandra Monteiro ◽  
Liying Cheng

Internationally educated nurses’ (IENs) English language proficiency is critical to professional licensure as communication is a key competency for safe practice. The Canadian English Language Benchmark Assessment for Nurses (CELBAN) is Canada’s only Canadian Language Benchmarks (CLB) referenced examination used in the context of healthcare regulation. This high-stakes assessment claims proof of proficiency for IENs seeking licensure in Canada and a measure of public safety for nursing regulators. Understanding the quality of rater performance when examination results are used for high-stakes decisions is crucial to maintaining speaking test quality as it involves judgement, and thus requires strong reliability evidence (Koizumi et al., 2017). This study examined rater performance on the CELBAN Speaking component using a Many-Facets Rasch Measurement (MFRM). Specifically, this study identified CELBAN rater reliability in terms of consistency and severity, rating bias, and use of rating scale. The study was based on a sample of 115 raters across eight test sites in Canada and results on 2698 examinations across four parallel versions. Findings demonstrated relatively high inter-rater reliability and intra-rater reliability, and that CLB-based speaking descriptors (CLB 6-9) provided sufficient information for raters to discriminate examinees’ oral proficiency. There was no influence of test site or test version, offering validity evidence to support test use for high-stakes purposes. Grammar, among the eight speaking criteria, was identified as the most difficult criterion on the scale, and the one demonstrating most rater bias. This study highlights the value of MFRM analysis in rater performance research with implications for rater training. This study is one of the first research studies using MFRM with a CLB-referenced high-stakes assessment within the Canadian context.

2012 ◽  
Vol 2012 ◽  
pp. 1-10 ◽  
Author(s):  
Hossein Bozorgian

Current English-as-a-second and foreign-language (ESL/EFL) research has encouraged to treat each communicative macroskill separately due to space constraint, but the interrelationship among these skills (listening, speaking, reading, and writing) is not paid due attention. This study attempts to examine first the existing relationship among the four dominant skills, second the potential impact of reading background on the overall language proficiency, and finally the relationship between listening and overall language proficiency as listening is considered an overlooked/passive skill in the pedagogy of the second/foreign language classroom. However, the literature in language learning has revealed that listening skill has salient importance in both first and second language learning. The purpose of this study is to investigate the role of each of four skills in EFL learning and their existing interrelationships in an EFL setting. The outcome of 701 Iranian applicants undertaking International English Language Testing System (IELTS) in Tehran demonstrates that all communicative macroskills have varied correlations from moderate (reading and writing) to high (listening and reading). The findings also show that the applicants’ reading history assisted them in better performing at high stakes tests, and what is more, listening skill was strongly correlated with the overall language proficiency.


1993 ◽  
Vol 2 (3) ◽  
pp. 31-35 ◽  
Author(s):  
Sidney M. Barefoot ◽  
Joseph H. Bochner ◽  
Barbara Ann Johnson ◽  
Beth Ann vom Eigen

The purpose of this study was to investigate the utility of a measure of communication efficacy, one that explicitly encompasses features of both speech and language. Toward this end the construct of comprehensibility, which has been used in the field of second-language acquisition, was adapted. Comprehensibility, operationally defined as the extent to which a listener understands utterances produced by a speaker in a communication context, was studied in relation to various dimensions of communication efficacy. Four observers evaluated the comprehensibility of utterances produced by 41 deaf young adults, using a nine-point rating scale. The reliability of the comprehensibility ratings was determined, and the ratings were studied in relation to independent assessments of the subjects’ speech intelligibility, English language proficiency, speech recognition, reading comprehension, and hearing loss. The results of this investigation indicate that comprehensibility can be evaluated reliably and that comprehensibility is associated with both speech intelligibility and language proficiency. The implications of these findings for the clinical assessment of speech and language are discussed.


2008 ◽  
Vol 78 (2) ◽  
pp. 260-329 ◽  
Author(s):  
Ronald W. Solórzano

This article discusses the issues and implications of high stakes tests on English language learners (ELLs). As ELLs are being included in all high stakes assessments tied to accountability efforts (e.g., No Child Left Behind), it is crucial that issues related to the tests be critically evaluated relative to their use. In this case, academic achievement tests are analyzed relative to their norming samples and validity to determine their usefulness to ELLs. Also, commonly used language proficiency tests are examined relative to definitions of proficiency, technical quality, alignment with criteria for language classification and reclassification, and their academic predictive validity. Based on the synthesis of the literature, the author concludes that high stakes tests as currently constructed are inappropriate for ELLs, and most disturbing is their continued use for high stakes decisions that have adverse consequences. The author provides recommendations for addressing the issues related to high stakes tests and ELLs.


Author(s):  
Talip Karanfil ◽  
Steve Neufeld

High-stakes and high-volume English language proficiency tests typically rely on multiple-choice questions (MCQs) to assess reading and listening skills. Due to the Covid-19 pandemic, more institutions are using MCQs via online assessment platforms, which facilitate shuffling the order of options within test items to minimize cheating. There is scant research on the role that order and sequence of options plays in MCQs, so this study examined the results of a paper-based, high-stakes English proficiency test administered in two versions. Each version had identical three-option MCQs but with different ordering of options. The test-takers were chosen to ensure a very similar profile of language ability and level for the groups who took the two versions. The findings indicate that one in four questions exhibited significantly different levels of difficulty and discrimination between the two versions. The study identifies order dominance and sequence priming as two factors that influence the outcomes of MCQs, both of which can accentuate or diminish the power of attraction of the correct and incorrect options. These factors should be carefully considered when designing MCQs in high-stakes language proficiency tests and shuffling of options in either paper-based or computer-based testing.


2016 ◽  
Vol 9 (5) ◽  
pp. 147 ◽  
Author(s):  
Smitha Dev ◽  
Sura Qiqieh

<p class="apa">The present study aims to find out the relationship between English Language proficiency, self-esteem, and academic achievement of the students in Abu Dhabi University (ADU). The variables were analyzed using ‘t’ test, chi-squire and Pearson’s product moment correlation. In addition, Self-rating scale, Self-esteem inventory and Language proficiency tests were used to measure the variables. The data were collected from 200 male and female students from Abu Dhabi University. The study could not find out any positive relationship among the variables. It is also revealed that language fluency (IELTS) has no direct impact on the ADU students’ self-esteem scores and academic achievement (GPA).</p>


2019 ◽  
Vol 10 (1) ◽  
pp. 177
Author(s):  
Asli Lidice Gokturk Saglam ◽  
Hossein Farhady

This article reports on a mixed-method study that examined the washback of a local integrated theme-based high-stakes English language proficiency test that is used in a university English for Academic Purposes (EAP) program in Turkey. The assumption behind employing an integrated theme-based test, which resembles authentic language use, was that it would bring about a positive washback on learning (Leki, Cumming & Silva, 2008: Leki & Carson 1997). The data were collected from both focus-group interviews after the instruction and pre- and post- proficiency test scores of 147 EFL students in the Preparatory English Language Program (PEP). Test of Readiness for Academic English (TRACE) was administered at the beginning and at the end of a 4-month English language instruction period. Repeated measure ANOVA and inductive analysis of the transcribed interview data were used for analyzing quantitative and qualitative data respectively. The findings indicated that the test had both positive and negative washback on the learning. Most students considered that using source-based information and their notes taken during the listening task into their writing raised their awareness in terms of generating, organizing and linking ideas as well as modelling vocabulary and sentence structures. However, the test also exerted negative washback upon learning since students were inclined to prioritize test-oriented practice. The implications of the study suggest that a theme-based integrated proficiency exam may elicit positive washback on learning that could be used for validity evidence in EAP contexts and lead to more appropriate language assessment. The procedures are detailed, the findings are presented and discussed, the applications and implications for teachers and test designers are explained, and some suggestions are made for further research.


2020 ◽  
Vol 37 (4) ◽  
pp. 523-549
Author(s):  
You-Min Lin ◽  
Michelle Y. Chen

This study examined the writing score and writing feature changes of 562 repeat test takers who took the Canadian English Language Proficiency Index Program–General (CELPIP–General) test at least three times, with a short (30–40 day) interval between the first and second attempts and a longer (90–180 day) interval between the first and third attempts. Analysis was conducted to uncover whether changes occurred at different testing durations (short vs. long) and whether the observed changes varied across repeater’s initial proficiency groups (low, mid, high). The writing scores measured by CELPIP bands showed great stability over the 6-month period, but the trends of development differed by proficiency group. Low proficiency test takers were more likely to have faster observable score gains, compared to the medium proficiency group, whereas high proficiency repeaters may not maintain their score levels at later attempts. Writing quality was analyzed using natural language processing (NLP) tools. Results suggested that for all proficiency groups, lexical features were more likely to improve over the 6-month period, with some measures showing improvement at 1 month; features in cohesion and syntactic sophistication, however, did not change significantly.


2020 ◽  
Vol 23 (2) ◽  
pp. 1-19
Author(s):  
Michelle Chen ◽  
Jennifer J. Flasko

Seeking evidence to support content validity is essential to test validation. This is especially the case in contexts where test scores are interpreted in relation to external proficiency standards and where new test content is constantly being produced to meet test administration and security demands. In this paper, we describe a modified scale- anchoring approach to assessing the alignment between the Canadian English Language Proficiency Index Program (CELPIP) test and the Canadian Language Benchmarks (CLB), the proficiency framework to which the test scores are linked. We discuss how proficiency frameworks such as the CLB can be used to support the content validation of large-scale standardized tests through an evaluation of the alignment between the test content and the performance standards. By sharing both the positive implications and challenges of working with the CLB in high-stakes language test validation, we hope to help raise the profile of this national language framework among scholars and practitioners.


2005 ◽  
Vol 13 (2) ◽  
pp. 129-146 ◽  
Author(s):  
Thomas R. O’Neill ◽  
Richard J. Tannenbaum ◽  
Jennifer Tiffen

When nurses who are educated internationally immigrate to the United States, they are expected to have English language proficiency in order to function as a competent nurse. The purpose of this research was to provide sufficient information to the National Council of State Boards of Nursing (NCSBN) to make a defensible recommended passing standard for English proficiency. This standard was based upon the Test of English as a Foreign Language (TOEFL™). A large panel of nurses and nurse regulators (N = 25) was convened to determine how much English proficiency is required to be minimally competent as an entry-level nurse. Two standard setting procedures, the Simulated Minimally Competent Candidate (SMCC) procedure and the Examinee Paper Selection Method, were combined to produce recommendations for each panelist. In conjunction with collateral information, these recommendations were reviewed by the NCSBN Examination Committee, which decided upon an NCSBN recommended standard, a TOEFL score of 220. Because the adoption of this standard rests entirely with the individual state, NCSBN has little more to do with implementing the standard, other than answering questions and providing documentation about the standard.


Sign in / Sign up

Export Citation Format

Share Document