scholarly journals Statistics for Classroom Language Assessment: Using Numbers Meaningfully

HOW ◽  
2020 ◽  
Vol 27 (2) ◽  
pp. 135-155
Author(s):  
Frank Giraldo

Large-scale language testing uses statistical information to account for the quality of an assessment system. In this reflection article, I explain how basic statistics can be used meaningfully in the context of classroom language assessment. The paper explores a series of statistical calculations that can be used to examine test scores and assessment decisions in the language classroom. Therefore, interpretations for criterion-referenced assessment underlie the paper. Finally, I discuss limitations and include recommendations for teachers to use statistics.

2021 ◽  
Vol 3 (1) ◽  
pp. 120-130
Author(s):  
Ildikó Csépes

Language teachers’ assessment knowledge and skills have received considerable attention from language assessment researchers over the past few decades (Davison & Leung, 2009; Hill & McNamara, 2012; Rea-Dickins, 2001; Taylor, 2013). This seems to be linked to the increased professionalism expected of them in classroom-based assessments. However, teachers seem to face a number of challenges, including how large-scale standardized language exams influence their classroom assessment practices. Teachers’ assessment literacy, therefore, needs to be examined in order to explain their assessment decisions. In this paper, we review the concept of (language) assessment literacy, how it has evolved and how it is conceptualized currently. Recent interpretations seem to reflect a multidimensional, dynamic and situated view of (language) assessment literacy. Implications for teacher education are also highlighted by presenting research findings from studies that explored teachers’ and teacher candidates’ assessment literacy in various educational contexts. As a result, we can identify some common patterns in classroom assessment practices as well as context-specific training needs. Finally, we make a recommendation for tackling some of the challenges language teachers are facing in relation to classroom-based assessment in the Hungarian context.


2021 ◽  
Vol 12 ◽  
Author(s):  
Don Yao ◽  
Matthew P. Wallace

It is not uncommon for immigration-seekers to be actively involved in taking various language tests for immigration purposes. Given the large-scale and high-stakes nature those language tests possess, the validity issues (e.g., appropriate score-based interpretations and decisions) associated with them are of great importance as test scores may play a gate-keeping role in immigration. Though interest in investigating the validity of language tests for immigration purposes is becoming prevalent, there has to be a systematic review of the research foci and results of this body of research. To address this need, the current paper critically reviewed 11 validation studies on language assessment for immigration over the last two decades to identify what has been focused on and what has been overlooked in the empirical research and to discuss current research interests and future research trends. Assessment Use Argument (AUA) framework of Bachman and Palmer (2010), comprising four inferences (i.e., assessment records, interpretations, decisions, and consequences), was adopted to collect and examine evidence of test validity. Results showed the consequences inference received the most investigations focusing on immigration-seekers’ and policymakers’ perceptions on test consequences, while the decisions inference was the least probed stressing immigration-seekers’ attitude towards the impartiality of decision-making. It is recommended that further studies could explore more kinds of stakeholders (e.g., test developers) in terms of their perceptions on the test and investigate more about the fairness of decision-making based on test scores. Additionally, the current AUA framework includes only positive and negative consequences that an assessment may engender but does not take compounded consequences into account. It is suggested that further research could enrich the framework. The paper sheds some light on the field of language assessment for immigration and brings about theoretical, practical, and political implications for different kinds of stakeholders (e.g., researchers, test developers, and policymakers).


Author(s):  
Umed Bokiev ◽  
Arshad Abd. Samad

Washback refers to the influence of language assessment on teaching and learning. In contrast to the wealth of studies involving external large-scale language examinations, scant research has been conducted to explore the influence of internal language assessment on instruction, particularly in the context of a university foundation programme. This qualitative study investigated the washback effects of an English language assessment system (ELAS) on the teaching and learning of English in a Malaysian university foundation programme. Apart from an in-depth analysis of official documents on the ELAS, we conducted individual semi-structured interviews with three curriculum and assessment developers, three English language instructors, four students and four alumni of the foundation programme and analysed the collected data using Miles and Huberman’s (1994) framework for qualitative data analysis. Findings indicated that the ELAS, with its different assessment forms, exerted an overall positive washback on various aspects of English teaching and learning. Yet, a number of factors related to the assessment, teachers, students as well as context mediated the extent of washback experienced. Based on the findings of the study, we put forward a few recommendations on how to encourage positive washback.


2019 ◽  
Vol 79 (4) ◽  
pp. 773-795
Author(s):  
Jue Wang ◽  
George Engelhard

The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and these ratings reflect a cumulative response process. However, raters may also be influenced by their personal preferences in providing ratings, and these ratings may reflect a noncumulative or unfolding response process. The goal of rater training in rater-mediated assessments is to stress impersonal judgments represented by scoring rubrics and to minimize the personal preferences that may represent construct-irrelevant variance in the assessment system. In this study, we explore the use of unfolding models as a framework for evaluating the quality of ratings in rater-mediated assessments. Data from a large-scale assessment of writing in the United States are used to illustrate our approach. The results suggest that unfolding models offer a useful way to evaluate rater-mediated assessments in order to initially explore the judgmental processes underlying the ratings. The data also indicate that there are significant relationships between some essay features (e.g., word count, syntactic simplicity, word concreteness, and verb cohesion) and essay orderings based on the personal preferences of raters. The implications of unfolding models for theory and practice in rater-mediated assessments are discussed.


Author(s):  
A. Babirad

Cerebrovascular diseases are a problem of the world today, and according to the forecast, the problem of the near future arises. The main risk factors for the development of ischemic disorders of the cerebral circulation include oblique and aging, arterial hypertension, smoking, diabetes mellitus and heart disease. An effective strategy for the prevention of cerebrovascular events is based on the implementation of large-scale risk control measures, including the use of antiagregant and anticoagulant therapy, invasive interventions such as atheromectomy, angioplasty and stenting. In this connection, the efforts of neurologists, cardiologists, angiosurgery, endocrinologists and other specialists are the basis for achieving an acceptable clinical outcome. A review of the SF-36 method for assessing the quality of life in patients with the effects of transient ischemic stroke is presented. The assessment of quality of life is recognized in world medical practice and research, an indicator that is also used to assess the quality of the health system and in general sociological research.


2018 ◽  
Vol 3 (1) ◽  
pp. 1-8
Author(s):  
Desy Damayanti ◽  
Adin Fauzi ◽  
Azizatul Mahfida Inayati

Among some components of effective language classroom, learning materials indisputably play a focal role. They improve the quality of language teaching; facilitate teachers in doing their duties, and lead students to a higher level of understanding in learning. This research aims to discuss the notion of materials in language teaching. It made use of works of literature to outline the importance of materials in language teaching, and to analyze kinds of materials, which are relevant to language teaching. The analysis resulted in the classification of materials into two broad categories namely (1) created materials, which include course book, audio materials, and video materials; and (2) authentic materials, which cover authentic texts, movie/film, radio broadcasting, television program, graphs, maps, tables, and charts. This paper serves as an invaluable resource to facilitate language teachers in selecting appropriate materials for effective language teaching.


Author(s):  
A. V. Ponomarev

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems). 


2020 ◽  
Vol 103 (11) ◽  
pp. 1194-1199

Objective: To develop and validate a Thai version of the Wisconsin Quality of Life (TH WISQoL) Questionnaire. Materials and Methods: The authors developed the TH WISQoL Questionnaire based on a standard multi-step process. Subsequently, the authors recruited patients with kidney stone and requested them to complete the TH WISQoL and a validated Thai version of the 36-Item Short Form Survey (TH SF-36). The authors calculated the internal consistency and interdomain correlation of TH WISQoL and compared the convergent validity between the two instruments. Results: Thirty kidney stone patients completed the TH WISQoL and the TH SF-36. The TH WISQoL showed acceptable internal consistency for all domains (Cronbach’s alpha 0.768 to 0.909). Interdomain correlation was high for most domains (r=0.698 to 0.779), except for the correlation between Vitality and Disease domains, which showed a moderate correlation (r=0.575). For convergent validity, TH WISQoL demonstrated a good overall correlation to TH SF-36, (r=0.796, p<0.05). Conclusion: The TH WISQoL is valid and reliable for evaluating the quality of life of Thai patients with kidney stone. A further large-scale multi-center study is warranted to confirm its applicability in Thailand. Keywords: Quality of life, Kidney stone, Validation, Outcome measurement


Author(s):  
Jeasik Cho

This book provides the qualitative research community with some insight on how to evaluate the quality of qualitative research. This topic has gained little attention during the past few decades. We, qualitative researchers, read journal articles, serve on masters’ and doctoral committees, and also make decisions on whether conference proposals, manuscripts, or large-scale grant proposals should be accepted or rejected. It is assumed that various perspectives or criteria, depending on various paradigms, theories, or fields of discipline, have been used in assessing the quality of qualitative research. Nonetheless, until now, no textbook has been specifically devoted to exploring theories, practices, and reflections associated with the evaluation of qualitative research. This book constructs a typology of evaluating qualitative research, examines actual information from websites and qualitative journal editors, and reflects on some challenges that are currently encountered by the qualitative research community. Many different kinds of journals’ review guidelines and available assessment tools are collected and analyzed. Consequently, core criteria that stand out among these evaluation tools are presented. Readers are invited to join the author to confidently proclaim: “Fortunately, there are commonly agreed, bold standards for evaluating the goodness of qualitative research in the academic research community. These standards are a part of what is generally called ‘scientific research.’ ”


SLEEP ◽  
2020 ◽  
Author(s):  
Luca Menghini ◽  
Nicola Cellini ◽  
Aimee Goldstone ◽  
Fiona C Baker ◽  
Massimiliano de Zambotti

Abstract Sleep-tracking devices, particularly within the consumer sleep technology (CST) space, are increasingly used in both research and clinical settings, providing new opportunities for large-scale data collection in highly ecological conditions. Due to the fast pace of the CST industry combined with the lack of a standardized framework to evaluate the performance of sleep trackers, their accuracy and reliability in measuring sleep remains largely unknown. Here, we provide a step-by-step analytical framework for evaluating the performance of sleep trackers (including standard actigraphy), as compared with gold-standard polysomnography (PSG) or other reference methods. The analytical guidelines are based on recent recommendations for evaluating and using CST from our group and others (de Zambotti and colleagues; Depner and colleagues), and include raw data organization as well as critical analytical procedures, including discrepancy analysis, Bland–Altman plots, and epoch-by-epoch analysis. Analytical steps are accompanied by open-source R functions (depicted at https://sri-human-sleep.github.io/sleep-trackers-performance/AnalyticalPipeline_v1.0.0.html). In addition, an empirical sample dataset is used to describe and discuss the main outcomes of the proposed pipeline. The guidelines and the accompanying functions are aimed at standardizing the testing of CSTs performance, to not only increase the replicability of validation studies, but also to provide ready-to-use tools to researchers and clinicians. All in all, this work can help to increase the efficiency, interpretation, and quality of validation studies, and to improve the informed adoption of CST in research and clinical settings.


Sign in / Sign up

Export Citation Format

Share Document