Automatic Question Generation and Evaluation

Generation of questions from an extract is a very tedious task for humans and an even tougher one for machines. In Automatic Question Generation (AQG), it is extremely important to examine the ways in which this can be achieved with sufficient levels of accuracy and efficiency. The way in which this can be taken ahead is by using Natural Language Processing (NLP) to process the input and to work with it for AQG. Using NLP with question generation algorithms the system can generate the questions for a better understanding of the text document. The input is pre-processed before actually moving in for the question generation process. The questions formed are first checked for proper context satisfaction with the context of the input to avoid invalid or unanswerable question generation. It is then preprocessed using various NLP-based mechanisms like tokenization, named entity recognition(NER) tagging, parts of speech(POS) tagging, etc. The question generation system consists of a machine learning classification-based Fill in the blank(FIB) generator that also generates multiple choices and a rule-based approach to generate Wh-type questions. It also consists of a question evaluator where the user can evaluate the generated question. The results of these evaluations can help in improving our system further. Also, evaluation of Wh questions has been done using the BLEU score to determine whether the automatically generated questions resemble closely the human-generated ones. This system can be used in various places to help ease the question generation and also at self-evaluator systems where the students can assess themselves so as to determine their conceptual understanding. Apart from educational use, it would also be helpful in building chatbot-based applications. This work can help improve the overall understanding of the level to which the concept given is understood by the candidate and the ways in which it can be understood more properly. We have taken a simple yet effective approach to generate the questions. Our evaluation results show that our model works well on simpler sentences.

Download Full-text

An Automatic Question Generation System using Rule-Based Approach in Bloom’s Taxonomy

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666191113143335 ◽

2019 ◽

Vol 13 ◽

Author(s):

G Deena ◽

K Raja ◽

K Kannan

Keyword(s):

Language Processing ◽

Learning Process ◽

Question Generation ◽

Test Question ◽

Rule Based ◽

Part Of Speech ◽

Core Idea ◽

Rule Based Approach ◽

Teaching Learning ◽

Automatic Question Generation

: In this competing world, education has become part of everyday life. The process of imparting the knowledge to the learner through education is the core idea in the Teaching-Learning Process (TLP). An assessment is one way to identify the learner’s weak spot of the area under discussion. An assessment question has higher preferences in judging the learner's skill. In manual preparation, the questions are not assured in excellence and fairness to assess the learner’s cognitive skill. Question generation is the most important part of the teaching-learning process. It is clearly understood that generating the test question is the toughest part. Methods: Proposed an Automatic Question Generation (AQG) system which automatically generates the assessment questions dynamically from the input file. Objective: The Proposed system is to generate the test questions that are mapped with blooms taxonomy to determine the learner’s cognitive level. The cloze type questions are generated using the tag part-of-speech and random function. Rule-based approaches and Natural Language Processing (NLP) techniques are implemented to generate the procedural question of the lowest blooms cognitive levels. Analysis: The outputs are dynamic in nature to create a different set of questions at each execution. Here, input paragraph is selected from computer science domain and their output efficiency are measured using the precision and recall.

Download Full-text

CWPC_BiAtt: Character–Word–Position Combined BiLSTM-Attention for Chinese Named Entity Recognition

Information ◽

10.3390/info11010045 ◽

2020 ◽

Vol 11 (1) ◽

pp. 45 ◽

Cited By ~ 1

Author(s):

Shardrom Johnson ◽

Sherlock Shen ◽

Yuanchen Liu

Keyword(s):

Language Processing ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Attention Mechanism ◽

Entity Recognition ◽

Position Information ◽

Named Entity ◽

Pos Tagging ◽

Word Position

Usually taken as linguistic features by Part-Of-Speech (POS) tagging, Named Entity Recognition (NER) is a major task in Natural Language Processing (NLP). In this paper, we put forward a new comprehensive-embedding, considering three aspects, namely character-embedding, word-embedding, and pos-embedding stitched in the order we give, and thus get their dependencies, based on which we propose a new Character–Word–Position Combined BiLSTM-Attention (CWPC_BiAtt) for the Chinese NER task. Comprehensive-embedding via the Bidirectional Llong Short-Term Memory (BiLSTM) layer can get the connection between the historical and future information, and then employ the attention mechanism to capture the connection between the content of the sentence at the current position and that at any location. Finally, we utilize Conditional Random Field (CRF) to decode the entire tagging sequence. Experiments show that CWPC_BiAtt model we proposed is well qualified for the NER task on Microsoft Research Asia (MSRA) dataset and Weibo NER corpus. A high precision and recall were obtained, which verified the stability of the model. Position-embedding in comprehensive-embedding can compensate for attention-mechanism to provide position information for the disordered sequence, which shows that comprehensive-embedding has completeness. Looking at the entire model, our proposed CWPC_BiAtt has three distinct characteristics: completeness, simplicity, and stability. Our proposed CWPC_BiAtt model achieved the highest F-score, achieving the state-of-the-art performance in the MSRA dataset and Weibo NER corpus.

Download Full-text

Location Named-Entity Recognition using Rule-Based Approach for Balinese Texts

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v09.i03.p15 ◽

2021 ◽

Vol 9 (3) ◽

pp. 435

Author(s):

Ni Putu Ayu Sherly Anggita S ◽

Ngurah Agus Sanjaya ER

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Main Task ◽

Rule Based ◽

Personal Names ◽

Named Entity ◽

Proposed Model ◽

Rule Based Approach ◽

F Measure

In Natural Language Processing (NLP), Named Recognition Entity (NER) is a sub-discussion widely used for research. The NER’s main task is to help identify and detect the entity-named in the sentence, such as personal names, locations, organizations, and many other entities. In this paper, we present a Location NER system for Balinese texts using a rule-based approach. NER in the Balinese document is an essential and challenging task because there is no research on this. The rule-based approach using human-made rules to extract entity name is one of the most famous ways to extract entity names as well as machine learning. The system aims to identify proper names in the corpus and classify them into locations class. Precision, recall, and F-measure used for the evaluation. Our results show that our proposed model is trustworthy enough, having average recall, precision, and f-measure values for the specific location entity, respectively, 0.935, 0.936, and 0.92. These results prove that our system is capable of recognizing named-entities of Balinese texts.

Download Full-text

Viability of Neural Networks for Core Technologies for Resource-Scarce Languages

Information ◽

10.3390/info11010041 ◽

2020 ◽

Vol 11 (1) ◽

pp. 41

Author(s):

Melinda Loubser ◽

Martin J. Puttkammer

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

South African ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Neural Network Models ◽

African Languages ◽

Pos Tagging

In this paper, the viability of neural network implementations of core technologies (the focus of this paper is on text technologies) for 10 resource-scarce South African languages is evaluated. Neural networks are increasingly being used in place of other machine learning methods for many natural language processing tasks with good results. However, in the South African context, where most languages are resource-scarce, very little research has been done on neural network implementations of core language technologies. In this paper, we address this gap by evaluating neural network implementations of four core technologies for ten South African languages. The technologies we address are part of speech tagging, named entity recognition, compound analysis and lemmatization. Neural architectures that performed well on similar tasks in other settings were implemented for each task and the performance was assessed in comparison with currently used machine learning implementations of each technology. The neural network models evaluated perform better than the baselines for compound analysis, are viable and comparable to the baseline on most languages for POS tagging and NER, and are viable, but not on par with the baseline, for Afrikaans lemmatization.

Download Full-text

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

Applied Sciences ◽

10.3390/app112311119 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11119

Author(s):

Van-Hai Vu ◽

Quang-Phuoc Nguyen ◽

Ebipatei Victoria Tunyan ◽

Cheol-Young Ock

Keyword(s):

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Word Sense Disambiguation ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Sense ◽

Pos Tagging ◽

Part Of Speech ◽

Learning Machine

With the recent evolution of deep learning, machine translation (MT) models and systems are being steadily improved. However, research on MT in low-resource languages such as Vietnamese and Korean is still very limited. In recent years, a state-of-the-art context-based embedding model introduced by Google, bidirectional encoder representations for transformers (BERT), has begun to appear in the neural MT (NMT) models in different ways to enhance the accuracy of MT systems. The BERT model for Vietnamese has been developed and significantly improved in natural language processing (NLP) tasks, such as part-of-speech (POS), named-entity recognition, dependency parsing, and natural language inference. Our research experimented with applying the Vietnamese BERT model to provide POS tagging and morphological analysis (MA) for Vietnamese sentences,, and applying word-sense disambiguation (WSD) for Korean sentences in our Vietnamese–Korean bilingual corpus. In the Vietnamese–Korean NMT system, with contextual embedding, the BERT model for Vietnamese is concurrently connected to both encoder layers and decoder layers in the NMT model. Experimental results assessed through BLEU, METEOR, and TER metrics show that contextual embedding significantly improves the quality of Vietnamese–Korean NMT.

Download Full-text

A Comparison of Architectures and Pretraining Methods for Contextualized Multilingual Word Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6443 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9090-9097

Author(s):

Niels Van der Heijden ◽

Samira Abnar ◽

Ekaterina Shutova

Keyword(s):

Language Processing ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Word Embeddings ◽

Named Entity ◽

Pos Tagging ◽

Part Of Speech ◽

Joint Training ◽

Comprehensive Comparison

The lack of annotated data in many languages is a well-known challenge within the field of multilingual natural language processing (NLP). Therefore, many recent studies focus on zero-shot transfer learning and joint training across languages to overcome data scarcity for low-resource languages. In this work we (i) perform a comprehensive comparison of state-of-the-art multilingual word and sentence encoders on the tasks of named entity recognition (NER) and part of speech (POS) tagging; and (ii) propose a new method for creating multilingual contextualized word embeddings, compare it to multiple baselines and show that it performs at or above state-of-the-art level in zero-shot transfer settings. Finally, we show that our method allows for better knowledge sharing across languages in a joint training setting.

Download Full-text

Towards the Construction of a Gold Standard Biomedical Corpus for the Romanian Language

Data ◽

10.3390/data3040053 ◽

2018 ◽

Vol 3 (4) ◽

pp. 53 ◽

Cited By ~ 1

Author(s):

Maria Mitrofan ◽

Verginica Barbu Mititelu ◽

Grigorina Mitrofan

Keyword(s):

Language Processing ◽

Gold Standard ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Resources ◽

Named Entities ◽

Named Entity ◽

Pos Tagging ◽

Part Of Speech ◽

Biomedical Named Entity Recognition

Gold standard corpora (GSCs) are essential for the supervised training and evaluation of systems that perform natural language processing (NLP) tasks. Currently, most of the resources used in biomedical NLP tasks are mainly in English. Little effort has been reported for other languages including Romanian and, thus, access to such language resources is poor. In this paper, we present the construction of the first morphologically and terminologically annotated biomedical corpus of the Romanian language (MoNERo), meant to serve as a gold standard for biomedical part-of-speech (POS) tagging and biomedical named entity recognition (bioNER). It contains 14,012 tokens distributed in three medical subdomains: cardiology, diabetes and endocrinology, extracted from books, journals and blogposts. In order to automatically annotate the corpus with POS tags, we used a Romanian tag set which has 715 labels, while diseases, anatomy, procedures and chemicals and drugs labels were manually annotated for bioNER with a Cohen Kappa coefficient of 92.8% and revealed the occurrence of 1877 medical named entities. The automatic annotation of the corpus has been manually checked. The corpus is publicly available and can be used to facilitate the development of NLP algorithms for the Romanian language.

Download Full-text

South China Sea Conflicts Classification Using Named Entity Recognition (NER) and Part-of-Speech (POS) Tagging

International Journal of Innovative Computing ◽

10.11113/ijic.v10n1.255 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Nur Rafeeqkha Sulaiman ◽

Maheyzah Md Siraj

Keyword(s):

South China Sea ◽

Language Processing ◽

South China ◽

Named Entity Recognition ◽

Online News ◽

Entity Recognition ◽

China Sea ◽

Named Entity ◽

Pos Tagging ◽

Part Of Speech

Internet connects everyone to everything globally. The existence of Internet eases people in completing daily tasks. Thanks to Internet, information is being digitalized and spread openly to the public. Online news articles not only provide us with useful and reliable information and reports, it also eases information extraction and gathering for research purposes especially in Natural Language Processing (NLP) and Machine Learning (ML). The topics regarding the South China Sea have been popular lately due to the rise of conflicts between several countries claim on the islands in the sea. Gathering data through Internet and online sources proves to be easy, but to process a huge amount data and to identify only useful information manually takes a longer time to complete. Extracting important features from a text document can be done by using one or a combination of feature extraction methods. Relevant information and the classification of news articles in relation to the conflicts in South China Sea need to be done. In this paper, a model is proposed to use Named Entity Recognition (NER) that search for and classifies important information regarding to the conflicts. In order to do that, a combination of Part-of-Speech (POS) and NER are needed to extract type of conflicts from the news. This study also claims to classify news by using Conditional Random Field (CRF) algorithm and Multinomial Naïve Bayes (MNB) as classification methods by training and testing the data.

Download Full-text

ABSA: Computational Measurement Analysis Approach for Prognosticated Aspect Extraction System

TEM Journal ◽

10.18421/tem101-11 ◽

2021 ◽

pp. 82-94

Author(s):

Maganti Syamala ◽

N.J. Nalini

Keyword(s):

Language Processing ◽

Named Entity Recognition ◽

Machine Learning Algorithms ◽

Entity Recognition ◽

Aspect Extraction ◽

Named Entity ◽

Measurement Analysis ◽

Proposed Model ◽

Research Problems ◽

Rule Based Approach

Aspect based sentient analysis (ABSA) is identified as one of the current research problems in Natural Language Processing (NLP). Traditional ABSA requires manual aspect assignment for aspect extraction and sentiment analysis. In this paper, to automate the process, a domain-independent dynamic ABSA model by the fusion of Efficient Named Entity Recognition (E-NER) guided dependency parsing technique with Neural Networks (NN) is proposed. The extracted aspects and sentiment terms by E-NER are trained to a Convolutional Neural Network (CNN) using Word embedding’s technique. Aspect categorybased polarity prediction is evaluated using NLTK Vader Sentiment package. The proposed model was compared to traditional rule-based approach, and the proposed dynamic model proved to yield better results by 17% when validated in terms of correctly classified instances, accuracy, precision, recall and F-Score using machine learning algorithms.

Download Full-text

Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition

10.3115/v1/p14-5003 ◽

2014 ◽

Cited By ~ 25

Author(s):

Jana Straková ◽

Milan Straka ◽

Jan Hajič

Keyword(s):

Open Source ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Pos Tagging

Download Full-text