A Pretrained Language Model-Based Data Augmentation Method for Korean Question-Answering Systems

Woojin Cho; Hyukjoon Lee

doi:10.5626/ktcp.2021.27.12.563

Language model-based sentence classification for opinion question answering systems

2009 International Multiconference on Computer Science and Information Technology ◽

10.1109/imcsit.2009.5352718 ◽

2009 ◽

Author(s):

Saeedeh Momtazi ◽

Dietrich Klakow

Keyword(s):

Question Answering ◽

Language Model ◽

Model Based ◽

Question Answering Systems ◽

Sentence Classification

A word clustering approach for language model-based sentence retrieval in question answering systems

Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09 ◽

10.1145/1645953.1646263 ◽

2009 ◽

Cited By ~ 8

Author(s):

Saeedeh Momtazi ◽

Dietrich Klakow

Keyword(s):

Question Answering ◽

Language Model ◽

Model Based ◽

Sentence Retrieval ◽

Question Answering Systems ◽

Clustering Approach ◽

Word Clustering

A Review on Question Generation from Natural Language Text

ACM Transactions on Information Systems ◽

10.1145/3468889 ◽

2022 ◽

Vol 40 (1) ◽

pp. 1-43

Author(s):

Ruqing Zhang ◽

Jiafeng Guo ◽

Lu Chen ◽

Yixing Fan ◽

Xueqi Cheng

Keyword(s):

Natural Language ◽

Question Answering ◽

Data Augmentation ◽

Text Structure ◽

Current Status ◽

Question Generation ◽

Natural Language Text ◽

Question Answering Systems ◽

The Right ◽

Language Text

Question generation is an important yet challenging problem in Artificial Intelligence (AI), which aims to generate natural and relevant questions from various input formats, e.g., natural language text, structure database, knowledge base, and image. In this article, we focus on question generation from natural language text, which has received tremendous interest in recent years due to the widespread applications such as data augmentation for question answering systems. During the past decades, many different question generation models have been proposed, from traditional rule-based methods to advanced neural network-based methods. Since there have been a large variety of research works proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we try to provide a more comprehensive taxonomy of question generation tasks from three different perspectives, i.e., the types of the input context text, the target answer, and the generated question. We take a deep look into existing models from different dimensions to analyze their underlying ideas, major design principles, and training strategies We compare these models through benchmark tasks to obtain an empirical understanding of the existing techniques. Moreover, we discuss what is missing in the current literature and what are the promising and desired future directions.

Self-Supervised Contextual Data Augmentation for Natural Language Processing

Symmetry ◽

10.3390/sym11111393 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1393

Author(s):

Dongju Park ◽

Chang Wook Ahn

Keyword(s):

Supervised Learning ◽

Language Processing ◽

Recurrent Neural Networks ◽

Question Answering ◽

Data Augmentation ◽

Language Model ◽

Contextual Data ◽

External Data ◽

Label Information ◽

Benchmark Datasets

In this paper, we propose a novel data augmentation method with respect to the target context of the data via self-supervised learning. Instead of looking for the exact synonyms of masked words, the proposed method finds words that can replace the original words considering the context. For self-supervised learning, we can employ the masked language model (MLM), which masks a specific word within a sentence and obtains the original word. The MLM learns the context of a sentence through asymmetrical inputs and outputs. However, without using the existing MLM, we propose a label-masked language model (LMLM) that can include label information for the mask tokens used in the MLM to effectively use the MLM in data with label information. The augmentation method performs self-supervised learning using LMLM and then implements data augmentation through the trained model. We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets. In addition, since the proposed method does not use external data, it can eliminate the time spent collecting external data, or pre-training using external data.

Exploring Conditional Language Model Based Data Augmentation Approaches for Hate Speech Classification

10.1007/978-3-030-83527-9_12 ◽

2021 ◽

pp. 135-146

Author(s):

Ashwin Geet D’Sa ◽

Irina Illina ◽

Dominique Fohr ◽

Dietrich Klakow ◽

Dana Ruiter

Keyword(s):

Data Augmentation ◽

Hate Speech ◽

Language Model ◽

Model Based ◽

Speech Classification

Neural language model based training data augmentation for weakly supervised early rumor detection

Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining ◽

10.1145/3341161.3342892 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sooji Han ◽

Jie Gao ◽

Fabio Ciravegna

Keyword(s):

Data Augmentation ◽

Language Model ◽

Training Data ◽

Model Based ◽

Weakly Supervised ◽

Rumor Detection

Research on Automatic Question Answering of Generative Knowledge Graph Based on Pointer Network

Information ◽

10.3390/info12030136 ◽

2021 ◽

Vol 12 (3) ◽

pp. 136

Author(s):

Shuang Liu ◽

Nannan Tan ◽

Yaqian Ge ◽

Niko Lukač

Keyword(s):

Knowledge Base ◽

Word Frequency ◽

Language Processing ◽

Question Answering ◽

Language Model ◽

Word List ◽

Superior Performance ◽

Knowledge Graph ◽

Semantic Features ◽

Question Answering Systems

Question-answering systems based on knowledge graphs are extremely challenging tasks in the field of natural language processing. Most of the existing Chinese Knowledge Base Question Answering(KBQA) can only return the knowledge stored in the knowledge base by extractive methods. Nevertheless, this processing does not conform to the reading habits and cannot solve the Out-of-vocabulary(OOV) problem. In this paper, a new generative question answering method based on knowledge graph is proposed, including three parts of knowledge vocabulary construction, data pre-processing, and answer generation. In the word list construction, BiLSTM-CRF is used to identify the entity in the source text, finding the triples contained in the entity, counting the word frequency, and constructing it. In the part of data pre-processing, a pre-trained language model BERT combining word frequency semantic features is adopted to obtain word vectors. In the answer generation part, one combination of a vocabulary constructed by the knowledge graph and a pointer generator network(PGN) is proposed to point to the corresponding entity for generating answer. The experimental results show that the proposed method can achieve superior performance on WebQA datasets than other methods.

Question Answering with Character-Level LSTM Encoders and Model-Based Data Augmentation

Lecture Notes in Computer Science - Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data ◽

10.1007/978-3-319-69005-6_25 ◽

2017 ◽

pp. 295-305 ◽

Cited By ~ 1

Author(s):

Run-Ze Wang ◽

Chen-Di Zhan ◽

Zhen-Hua Ling

Keyword(s):

Question Answering ◽

Data Augmentation ◽

Model Based

An Efficient Semantic Analysis Technique for the Question Answering Systems

Journal of Engineering and Applied Sciences ◽

10.36478/jeasci.2019.8289.8292 ◽

2019 ◽

Vol 14 (22) ◽

pp. 8289-8292

Author(s):

Ibrahim Mahmoud Ibrahim Alturani ◽

Mohd Pouzi Bin Hamzah

Keyword(s):

Question Answering ◽

Semantic Analysis ◽

Analysis Technique ◽

Question Answering Systems

Language-model-based ranking in entity-relation graphs

Proceedings of the First International Workshop on Keyword Search on Structured Data - KEYS '09 ◽

10.1145/1557670.1557686 ◽

2009 ◽

Author(s):

Shady Elbassuoni ◽

Maya Ramanath ◽

Gerhard Weikum

Keyword(s):

Language Model ◽

Model Based