Introduction

2006 ◽  
Vol 12 (2) ◽  
pp. 109-113
Author(s):  
JILL BURSTEIN ◽  
CLAUDIA LEACOCK

Researchers and developers of educational software have experimented with natural language processing (NLP) capabilities and related technologies since the 1960's. Automated essay scoring was perhaps the first application of this kind (Page 1966). Over a decade later, Writer's Workbench, a text-editing application, was developed as a tool for classroom teachers (MacDonald, Frase, Gingrich and Keenan 1982). Intelligent tutoring applications, though more in the spirit of artificial intelligence, were also being developed during this time (Carbonell 1970; Brown, Burton and Bell 1974; Stevens and Collins 1977; Burton and Brown 1982; Clancy 1987).

Essay writing examination is commonly used learning activity in all levels of education and disciplines. It is advantageous in evaluating the student’s learning outcomes because it gives them the chance to exhibit their knowledge and skills freely. For these reasons, a lot of researchers turned their interest in Automated essay scoring (AES) is one of the most remarkable innovations in text mining using Natural Language Processing and Machine learning algorithms. The purpose of this study is to develop an automated essay scoring that uses ontology and Natural Language Processing. Different learning algorithms showed agreeing prediction outcomes but still regression algorithm with the proper features incorporated with it may produce more accurate essay score. This study aims to increase the accuracy, reliability and validity of the AES by implementing the Gradient ridge regression with the domain ontology and other features. Linear regression, linear lasso regression and ridge regression were also used in conjunction with the different features that was extracted. The different features extracted are the domain concepts, average word length, orthography (spelling mistakes), grammar and sentiment score. The first dataset used is the ASAP dataset from Kaggle website is used to train and test different machine learning algorithms that is consist of linear regression, linear lasso regression, ridge regression and gradient boosting regression together with the different features identified. The second dataset used is the one extracted from the student’s essay exam in Human Computer Interaction course. The results show that the Gradient Boosting Regression has the highest variance and kappa scores. However, we can tell that there are similarities when it comes to performances for Linear, Ridge and Lasso regressions due to the dataset used which is ASAP. Furthermore, the results were evaluated using Cohen Weighted Kappa (CWA) score and compared the agreement between the human raters. The CWA result is 0.659 that can be interpreted as Strong level of agreement between the Human Grader and the automated essay score. Therefore, the proposed AES has 64-81% reliability level.


Author(s):  
Zixuan Ke ◽  
Vincent Ng

Despite being investigated for over 50 years, the task of automated essay scoring is far from being solved. Nevertheless, it continues to draw a lot of attention in the natural language processing community in part because of its commercial and educational values as well as the associated research challenges. This paper presents an overview of the major milestones made in automated essay scoring research since its inception.


2021 ◽  
pp. 1-13
Author(s):  
Lamiae Benhayoun ◽  
Daniel Lang

BACKGROUND: The renewed advent of Artificial Intelligence (AI) is inducing profound changes in the classic categories of technology professions and is creating the need for new specific skills. OBJECTIVE: Identify the gaps in terms of skills between academic training on AI in French engineering and Business Schools, and the requirements of the labour market. METHOD: Extraction of AI training contents from the schools’ websites and scraping of a job advertisements’ website. Then, analysis based on a text mining approach with a Python code for Natural Language Processing. RESULTS: Categorization of occupations related to AI. Characterization of three classes of skills for the AI market: Technical, Soft and Interdisciplinary. Skills’ gaps concern some professional certifications and the mastery of specific tools, research abilities, and awareness of ethical and regulatory dimensions of AI. CONCLUSIONS: A deep analysis using algorithms for Natural Language Processing. Results that provide a better understanding of the AI capability components at the individual and the organizational levels. A study that can help shape educational programs to respond to the AI market requirements.


AI Magazine ◽  
2013 ◽  
Vol 34 (3) ◽  
pp. 42-54 ◽  
Author(s):  
Vasile Rus ◽  
Sidney D’Mello ◽  
Xiangen Hu ◽  
Arthur Graesser

We report recent advances in intelligent tutoring systems with conversational dialogue. We highlight progress in terms of macro and microadaptivity. Macroadaptivity refers to a system’s capability to select appropriate instructional tasks for the learner to work on. Microadaptivity refers to a system’s capability to adapt its scaffolding while the learner is working on a particular task. The advances in macro and microadaptivity that are presented here were made possible by the use of learning progressions, deeper dialogue and natural language processing techniques, and by the use of affect-enabled components. Learning progressions and deeper dialogue and natural language processing techniques are key features of DeepTutor, the first intelligent tutoring system based on learning progressions. These improvements extend the bandwidth of possibilities for tailoring instruction to each individual student which is needed for maximizing engagement and ultimately learning.


Author(s):  
Seonho Kim ◽  
Jungjoon Kim ◽  
Hong-Woo Chun

Interest in research involving health-medical information analysis based on artificial intelligence, especially for deep learning techniques, has recently been increasing. Most of the research in this field has been focused on searching for new knowledge for predicting and diagnosing disease by revealing the relation between disease and various information features of data. These features are extracted by analyzing various clinical pathology data, such as EHR (electronic health records), and academic literature using the techniques of data analysis, natural language processing, etc. However, still needed are more research and interest in applying the latest advanced artificial intelligence-based data analysis technique to bio-signal data, which are continuous physiological records, such as EEG (electroencephalography) and ECG (electrocardiogram). Unlike the other types of data, applying deep learning to bio-signal data, which is in the form of time series of real numbers, has many issues that need to be resolved in preprocessing, learning, and analysis. Such issues include leaving feature selection, learning parts that are black boxes, difficulties in recognizing and identifying effective features, high computational complexities, etc. In this paper, to solve these issues, we provide an encoding-based Wave2vec time series classifier model, which combines signal-processing and deep learning-based natural language processing techniques. To demonstrate its advantages, we provide the results of three experiments conducted with EEG data of the University of California Irvine, which are a real-world benchmark bio-signal dataset. After converting the bio-signals (in the form of waves), which are a real number time series, into a sequence of symbols or a sequence of wavelet patterns that are converted into symbols, through encoding, the proposed model vectorizes the symbols by learning the sequence using deep learning-based natural language processing. The models of each class can be constructed through learning from the vectorized wavelet patterns and training data. The implemented models can be used for prediction and diagnosis of diseases by classifying the new data. The proposed method enhanced data readability and intuition of feature selection and learning processes by converting the time series of real number data into sequences of symbols. In addition, it facilitates intuitive and easy recognition, and identification of influential patterns. Furthermore, real-time large-capacity data analysis is facilitated, which is essential in the development of real-time analysis diagnosis systems, by drastically reducing the complexity of calculation without deterioration of analysis performance by data simplification through the encoding process.


Sign in / Sign up

Export Citation Format

Share Document