A Comprehensive Review of Automated Essay Scoring (AES) Research and Development

Chun Then Lim; Chih How Bong; Wee Sian Wong; Nung Kion Lee

doi:10.47836/pjst.29.3.27

A Comprehensive Review of Automated Essay Scoring (AES) Research and Development

Pertanika Journal of Science and Technology ◽

10.47836/pjst.29.3.27 ◽

2021 ◽

Vol 29 (3) ◽

Author(s):

Chun Then Lim ◽

Chih How Bong ◽

Wee Sian Wong ◽

Nung Kion Lee

Keyword(s):

Weighted Kappa ◽

Educational Institutions ◽

Current Development ◽

Automated Essay Scoring ◽

Comprehensive Review ◽

Hybrid Framework ◽

Critical Issues ◽

Essay Scoring ◽

Evaluation Metric ◽

Content Similarity

Automated Essay Scoring (AES) is a service or software that can predictively grade essay based on a pre-trained computational model. It has gained a lot of research interest in educational institutions as it expedites the process and reduces the effort of human raters in grading the essays as close to humans’ decisions. Despite the strong appeal, its implementation varies widely according to researchers’ preferences. This critical review examines various AES development milestones specifically on different methodologies and attributes used in deriving essay scores. To generalize existing AES systems according to their constructs, we attempted to fit all of them into three frameworks which are content similarity, machine learning and hybrid. In addition, we presented and compared various common evaluation metrics in measuring the efficiency of AES and proposed Quadratic Weighted Kappa (QWK) as standard evaluation metric since it corrects the agreement purely by chance when estimate the degree of agreement between two raters. In conclusion, the paper proposes hybrid framework standard as the potential upcoming AES framework as it capable to aggregate both style and content to predict essay grades Thus, the main objective of this study is to discuss various critical issues pertaining to the current development of AES which yielded our recommendations on the future AES development.

Download Full-text

More efficient processes for creating automated essay scoring frameworks: A demonstration of two algorithms

Language Testing ◽

10.1177/0265532220937830 ◽

2020 ◽

pp. 026553222093783

Author(s):

Jinnie Shin ◽

Mark J. Gierl

Keyword(s):

Machine Learning ◽

Language Processing ◽

Model Development ◽

Weighted Kappa ◽

Neural Model ◽

Support Vector ◽

High Stakes ◽

Automated Essay Scoring ◽

Educational Assessments ◽

Essay Scoring

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness and the performance of two AES frameworks, each based on machine learning with deep language features, or complex language features, and deep neural algorithms. More specifically, support vector machines (SVMs) in conjunction with Coh-Metrix features were used for a traditional AES model development, and the convolutional neural networks (CNNs) approach was used for more contemporary deep-neural model development. Then, the strengths and weaknesses of the traditional and contemporary models under different circumstances (e.g., types of the rubric, length of the essay, and the essay type) were tested. The results were evaluated using the quadratic weighted kappa (QWK) score and compared with the agreement between the human raters. The results indicated that the CNNs model performs better, meaning that it produced more comparable results to the human raters than the Coh-Metrix + SVMs model. Moreover, the CNNs model also achieved state-of-the-art performance in most of the essay sets with a high average QWK score.

Download Full-text

A review of deep-neural automated essay scoring models

Behaviormetrika ◽

10.1007/s41237-021-00142-y ◽

2021 ◽

Author(s):

Masaki Uto

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Main Idea ◽

Feature Engineering ◽

Automated Essay Scoring ◽

Comprehensive Review ◽

The Past ◽

Comprehensive Survey ◽

Essay Scoring ◽

Different Characteristics

AbstractAutomated essay scoring (AES) is the task of automatically assigning scores to essays as an alternative to grading by humans. Although traditional AES models typically rely on manually designed features, deep neural network (DNN)-based AES models that obviate the need for feature engineering have recently attracted increased attention. Various DNN-AES models with different characteristics have been proposed over the past few years. To our knowledge, however, no study has provided a comprehensive review of DNN-AES models while introducing each model in detail. Therefore, this review presents a comprehensive survey of DNN-AES models, describing the main idea and detailed architecture of each model. We classify the AES task into four types and introduce existing DNN-AES models according to this classification.

Download Full-text

Automated Chinese Essay Scoring using Pre-Trained Language Models

10.5121/csit.2021.111901 ◽

2021 ◽

Author(s):

Lulu Dong ◽

Lin Li ◽

HongChao Ma ◽

YeLing Liang

Keyword(s):

Linear Regression ◽

Language Processing ◽

State Of The Art ◽

Weighted Kappa ◽

Language Models ◽

Automated Essay Scoring ◽

Limited Scale ◽

Feature Based ◽

Essay Scoring ◽

Significant Application

Automated Essay Scoring (AES) aims to assign a proper score to an essay written by a given prompt, which is a significant application of Natural Language Processing (NLP) in the education area. In this work, we focus on solving the Chinese AES problem by Pre-trained Language Models (PLMs) including state-of-the-art PLMs BERT and ERNIE. A Chinese essay dataset has been built up in this work, by which we conduct extensive AES experiments. Our PLMs-based AES models acquire 68.70% in Quadratic Weighted Kappa (QWK), which outperform classic feature-based linear regression AES model. The results show that our methods effectively alleviate the dependence on manual features and improve the portability of AES models. Furthermore, we acquire well-performed AES models with a limited scale of the dataset, which solves the lack of datasets in Chinese AES.

Download Full-text

Automated Essay Scoring: A Human's Review

PsycCRITIQUES ◽

10.1037/04098s ◽

2004 ◽

Vol 49 (Supplement 14) ◽

Author(s):

Steven E. Stemler

Keyword(s):

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

Learning writing skills using feedback from automated essay scoring

PsycEXTRA Dataset ◽

10.1037/e520562012-907 ◽

2009 ◽

Author(s):

Ronald T. Kellogg ◽

Alison P. Whiteford ◽

Thomas Quinlan

Keyword(s):

Writing Skills ◽

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

Tantangan Pondok Pesantren dalam Menghadapi Era Bonus Demografi

Al-Riwayah Jurnal Kependidikan ◽

10.32489/al-riwayah.8 ◽

2018 ◽

Vol 10 (2) ◽

pp. 409-434

Author(s):

Ibnu Chudzaifah

Keyword(s):

Human Resources ◽

Boarding School ◽

Educational Institutions ◽

Human Beings ◽

Current Development ◽

The Face ◽

The Times

Pondok Pesantren is one of the Islamic educational institutions that aim to form human beings who have noble character, so that created a human who has a balance between physical and spiritual. Some educational institutions offer various models of learning to balance the current development so that its existence is still recognized by the community. While boarding school in dealing with the development of the times, has a commitment to make new innovations by presenting the pattern of education that can give birth to a reliable Human Resources. Especially pesantren currently has a challenging enough weight in facing the era of "Demographic Bonus". Demographic bonus is a phenomenon in which the structure of the population greatly benefits the community from the side of development in various sectors, because the productive age is more than the non productive age. This means that the dependency burden will decrease with the ratio of 64 percent of the productive age population to bear only 34 percent of the nonproductive age population. With all kinds of scholarships and skills given to students, students are expected to compete in all fields, especially in the face of Indonesia gold in 2020 to 2035.

Download Full-text

EVALD – a Pioneer Application for Automated Essay Scoring in Czech

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2019-0004 ◽

2019 ◽

Vol 113 (1) ◽

pp. 9-30

Author(s):

Kateřina Rysová ◽

Magdaléna Rysová ◽

Michal Novák ◽

Jiří Mírovský ◽

Eva Hajičová

Keyword(s):

Czech Republic ◽

Native Speakers ◽

Threshold Level ◽

Automated Essay Scoring ◽

The Czech Republic ◽

Linguistic Differences ◽

Supervised Training ◽

Language Data ◽

Overall Performance ◽

Essay Scoring

Abstract In the paper, we present EVALD applications (Evaluator of Discourse) for automated essay scoring. EVALD is the first tool of this type for Czech. It evaluates texts written by both native and non-native speakers of Czech. We describe first the history and the present in the automatic essay scoring, which is illustrated by examples of systems for other languages, mainly for English. Then we focus on the methodology of creating the EVALD applications and describe datasets used for testing as well as supervised training that EVALD builds on. Furthermore, we analyze in detail a sample of newly acquired language data – texts written by non-native speakers reaching the threshold level of the Czech language acquisition required e.g. for the permanent residence in the Czech Republic – and we focus on linguistic differences between the available text levels. We present the feature set used by EVALD and – based on the analysis – we extend it with new spelling features. Finally, we evaluate the overall performance of various variants of EVALD and provide the analysis of collected results.

Download Full-text

General Models for Automated Essay Scoring: Exploring an Alternative to the Status Quo

Journal of Educational Computing Research ◽

10.2190/19jk-ump5-12ee-4xwe ◽

2005 ◽

Vol 33 (1) ◽

pp. 101-113 ◽

Cited By ~ 1

Author(s):

P. Adam Kelly

Keyword(s):

Empirical Evidence ◽

Writing Assessment ◽

Model Performance ◽

Status Quo ◽

Automated Essay Scoring ◽

The Status ◽

Essay Scoring ◽

Better Than

Powers, Burstein, Chodorow, Fowles, and Kukich (2002) suggested that automated essay scoring (AES) may benefit from the use of “general” scoring models designed to score essays irrespective of the prompt for which an essay was written. They reasoned that such models may enhance score credibility by signifying that an AES system measures the same writing characteristics across all essays. They reported empirical evidence that general scoring models performed nearly as well in agreeing with human readers as did prompt-specific models, the “status quo” for most AES systems. In this study, general and prompt-specific models were again compared, but this time, general models performed as well as or better than prompt-specific models. Moreover, general models measured the same writing characteristics across all essays, while prompt-specific models measured writing characteristics idiosyncratic to the prompt. Further comparison of model performance across two different writing tasks and writing assessment programs bolstered the case for general models.

Download Full-text

Exploring Automated Essay Scoring for Nonnative English Speakers

Proceedings of the Conference EUROPHRAS 2017 - Computational and Corpus-based Phraseology: Recent Advances and Interdisciplinary Approaches, Volume II (short papers, posters and student workshop papers) ◽

10.26615/978-2-9701095-2-5_004 ◽

2017 ◽

Author(s):

Amber Nigam ◽

Keyword(s):

Nonnative English Speakers ◽

English Speakers ◽

Automated Essay Scoring ◽

Essay Scoring

Download Full-text

A Study for the Development of Automated Essay Scoring (AES) in Malaysian English Test Environment

International Journal of Innovative Computing ◽

10.11113/ijic.v9n1.220 ◽

2019 ◽

Vol 9 (1) ◽

Author(s):

Wee Sian Wong ◽

Chih How Bong

Keyword(s):

Reliability And Validity ◽

Theoretical Aspect ◽

Local Context ◽

Test Environment ◽

Automated Essay Scoring ◽

Scoring Rubrics ◽

Essay Scoring ◽

Assessment Environment ◽

Future Work ◽

Assessment Context

Automated Essay Scoring (AES) is the use of specialized computer programs to assign grades to essays written in an educational assessment context. It is developed to overcome time, cost, and reliability issues in writing assessment. Most of the contemporary AES are “western” proprietary product, designed for native English speakers, where the source code is not made available to public and the assessment criteria may tend to be associated with the scoring rubrics of a particular English test context. Therefore, such AES may not be appropriate to be directly adopted in Malaysia context. There is no actual software development work found in building an AES for Malaysian English test environment. As such, this work is carried out as the study for formulating the requirement of a local AES, targeted for Malaysia's essay assessment environment. In our work, we assessed a well-known AES called LightSide for determining its suitability in our local context. We use various Machine Learning technique provided by LightSide to predict the score of Malaysian University English Test (MUET) essays; and compare its performance, i.e. the percentage of exact agreement of LightSide with the human score of the essays. Besides, we review and discuss the theoretical aspect of the AES, i.e. its state-of-the-art, reliability and validity requirement. The finding in this paper will be used as the basis of our future work in developing a local AES, namely Intelligent Essay Grader (IEG), for Malaysian English test environment.

Download Full-text