Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems

AbstractWriting in language tests is regarded as an important indicator for assessing language skills of test takers. As Chinese language tests become popular, scoring a large number of essays becomes a heavy and expensive task for the organizers of these tests. In the past several years, some efforts have been made to develop automated simplified Chinese essay scoring systems, reducing both costs and evaluation time. In this paper, we introduce a system called SCESS (automated Simplified Chinese Essay Scoring System) based on Weighted Finite State Automata (WFSA) and using Incremental Latent Semantic Analysis (ILSA) to deal with a large number of essays. First, SCESS uses ann-gram language model to construct a WFSA to perform text pre-processing. At this stage, the system integrates a Confusing-Character Table, a Part-Of-Speech Table, beam search and heuristic search to perform automated word segmentation and correction of essays. Experimental results show that this pre-processing procedure is effective, with a Recall Rate of 88.50%, a Detection Precision of 92.31% and a Correction Precision of 88.46%. After text pre-processing, SCESS uses ILSA to perform automated essay scoring. We have carried out experiments to compare the ILSA method with the traditional LSA method on the corpora of essays from the MHK test (the Chinese proficiency test for minorities). Experimental results indicate that ILSA has a significant advantage over LSA, in terms of both running time and memory usage. Furthermore, experimental results also show that SCESS is quite effective with a scoring performance of 89.50%.

Download Full-text

Similarity measures in automated essay scoring systems: A ten-year review

Education and Information Technologies ◽

10.1007/s10639-021-10838-z ◽

2022 ◽

Author(s):

Vidasha Ramnarain-Seetohul ◽

Vandana Bassoo ◽

Yasmine Rosunally

Keyword(s):

Similarity Measures ◽

Scoring Systems ◽

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

Automated Essay Scoring Systems

Handbook of Research on New Media Literacy at the K-12 Level ◽

10.4018/978-1-60566-120-9.ch048 ◽

2009 ◽

pp. 777-793

Author(s):

Dougal Hutchison

Keyword(s):

Scoring Systems ◽

Computer Programs ◽

Automated Essay Scoring ◽

Future Developments ◽

Essay Scoring ◽

Structural Aspects

This chapter gives a relatively non-technical introduction to computer programs for marking of essays, generally known as Automated Essay Scoring (AES) systems. It identifies four stages in the process, which may be distinguished as training, summarising mechanical and structural aspects, describing content, and scoring, and describes how these are carried out in a number of commercially available programs. It considers how the validity of the process may be assessed, and reviews some of the evidence on how successful they are. It also discusses some of the ways in which they may fall down and describes some research investigating this. The chapter concludes with a discussion of possible future developments, and offers a number of searching questions for administrators considering the possibility of introducing AES in their own schools.

Download Full-text

Appraising the scoring performance of automated essay scoring systems—Some additional considerations: Which essays? Which human raters? Which scores?

Applied Measurement in Education ◽

10.1080/08957347.2018.1464449 ◽

2018 ◽

Vol 31 (3) ◽

pp. 233-240

Author(s):

Kevin Raczynski ◽

Allan Cohen

Keyword(s):

Scoring Systems ◽

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

Validity of Automated Essay Scoring Systems

Automated Essay Scoring ◽

10.4324/9781410606860-19 ◽

2003 ◽

pp. 156-176

Keyword(s):

Scoring Systems ◽

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

Latest Trends in Automated Essay-Scoring Systems

Kodo Keiryogaku (The Japanese Journal of Behaviormetrics) ◽

10.2333/jbhmk.31.67 ◽

2004 ◽

Vol 31 (2) ◽

pp. 67-87 ◽

Cited By ~ 1

Author(s):

Tsunenori ISHIOKA

Keyword(s):

Scoring Systems ◽

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

Automated language essay scoring systems: a literature review

PeerJ Computer Science ◽

10.7717/peerj-cs.208 ◽

2019 ◽

Vol 5 ◽

pp. e208 ◽

Cited By ~ 1

Author(s):

Mohamed Abdellatif Hussein ◽

Hesham Hassan ◽

Mohammad Nassef

Keyword(s):

Literature Review ◽

Language Processing ◽

English Language ◽

Scoring Systems ◽

Training Data ◽

Machine Learning Techniques ◽

Automated Essay Scoring ◽

Essay Scoring ◽

Essay Grading ◽

Structured Literature Review

Background Writing composition is a significant factor for measuring test-takers’ ability in any language exam. However, the assessment (scoring) of these writing compositions or essays is a very challenging process in terms of reliability and time. The need for objective and quick scores has raised the need for a computer system that can automatically grade essay questions targeting specific prompts. Automated Essay Scoring (AES) systems are used to overcome the challenges of scoring writing tasks by using Natural Language Processing (NLP) and machine learning techniques. The purpose of this paper is to review the literature for the AES systems used for grading the essay questions. Methodology We have reviewed the existing literature using Google Scholar, EBSCO and ERIC to search for the terms “AES”, “Automated Essay Scoring”, “Automated Essay Grading”, or “Automatic Essay” for essays written in English language. Two categories have been identified: handcrafted features and automatically featured AES systems. The systems of the former category are closely bonded to the quality of the designed features. On the other hand, the systems of the latter category are based on the automatic learning of the features and relations between an essay and its score without any handcrafted features. We reviewed the systems of the two categories in terms of system primary focus, technique(s) used in the system, the need for training data, instructional application (feedback system), and the correlation between e-scores and human scores. The paper includes three main sections. First, we present a structured literature review of the available Handcrafted Features AES systems. Second, we present a structured literature review of the available Automatic Featuring AES systems. Finally, we draw a set of discussions and conclusions. Results AES models have been found to utilize a broad range of manually-tuned shallow and deep linguistic features. AES systems have many strengths in reducing labor-intensive marking activities, ensuring a consistent application of scoring criteria, and ensuring the objectivity of scoring. Although many techniques have been implemented to improve the AES systems, three primary challenges have been identified. The challenges are lacking of the sense of the rater as a person, the potential that the systems can be deceived into giving a lower or higher score to an essay than it deserves, and the limited ability to assess the creativity of the ideas and propositions and evaluate their practicality. Many techniques have only been used to address the first two challenges.

Download Full-text

Peer Review #2 of "Automated language essay scoring systems: a literature review (v0.2)"

10.7287/peerj-cs.208v0.2/reviews/2 ◽

2019 ◽

Keyword(s):

Literature Review ◽

Peer Review ◽

Scoring Systems ◽

Essay Scoring

Download Full-text

International Automated Essay Scoring Systems: An overview

International Journal of Computer & Organization Trends ◽

10.14445/22492593/ijcot-v35p302 ◽

2016 ◽

Vol 35 (1) ◽

pp. 8-11

Author(s):

Li Guo ◽

Keyword(s):

Scoring Systems ◽

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

Evaluating China’s Automated Essay Scoring System iWrite

Journal of Educational Computing Research ◽

10.1177/0735633119881472 ◽

2019 ◽

Vol 58 (4) ◽

pp. 771-790

Author(s):

Leyi Qian ◽

Yali Zhao ◽

Yan Cheng

Keyword(s):

Scoring System ◽

Corrective Feedback ◽

Scoring Systems ◽

Writing Quality ◽

Automated Essay Scoring ◽

Writing Evaluation ◽

The Past ◽

L2 Learners ◽

Essay Scoring

Automated writing scoring can not only provide holistic scores but also instant and corrective feedback on L2 learners’ writing quality. It has been increasing in use throughout China and internationally. Given the advantages, the past several years has witnessed the emergence and growth of writing evaluation products in China. To the best of our knowledge, no previous studies have touched upon the validity of China’s automated essay scoring systems. By drawing on the four major categories of argument for validity framework proposed by Kane—scoring, generalization, extrapolation, and implication, this article aims to evaluate the performance of one of the China’s automated essay scoring systems—iWrite against human scores. The results show that iWrite fails to be a valid tool to assess L2 writings and predict human scores. Therefore, iWrite currently should be restricted to nonconsequential uses and cannot be employed as an alternative to or a substitute for human raters.

Download Full-text