A Pre-Training Based Personalized Dialogue Generation Model with Persona-Sparse Data

Yinhe Zheng; Rongsheng Zhang; Minlie Huang; Xiaoxi Mao

doi:10.1609/aaai.v34i05.6518

A Pre-Training Based Personalized Dialogue Generation Model with Persona-Sparse Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6518 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9693-9700

Author(s):

Yinhe Zheng ◽

Rongsheng Zhang ◽

Minlie Huang ◽

Xiaoxi Mao

Keyword(s):

State Of The Art ◽

Language Model ◽

Sparse Data ◽

Generation Model ◽

Dialogue Systems ◽

Inference Process ◽

Training Process ◽

Natural Languages ◽

Dialogue Model ◽

Proposed Model

Endowing dialogue systems with personas is essential to deliver more human-like conversations. However, this problem is still far from well explored due to the difficulties of both embodying personalities in natural languages and the persona sparsity issue observed in most dialogue corpora. This paper proposes a pre-training based personalized dialogue model that can generate coherent responses using persona-sparse dialogue data. In this method, a pre-trained language model is used to initialize an encoder and decoder, and personal attribute embeddings are devised to model richer dialogue contexts by encoding speakers' personas together with dialogue histories. Further, to incorporate the target persona in the decoding process and to balance its contribution, an attention routing structure is devised in the decoder to merge features extracted from the target persona and dialogue contexts using dynamically predicted weights. Our model can utilize persona-sparse dialogues in a unified manner during the training process, and can also control the amount of persona-related features to exhibit during the inference process. Both automatic and manual evaluation demonstrates that the proposed model outperforms state-of-the-art methods for generating more coherent and persona consistent responses with persona-sparse data.

Download Full-text

Strong Influence of Responses in Training Dialogue Response Generator

Applied Sciences ◽

10.3390/app11167415 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7415

Author(s):

So-Eon Kim ◽

Yeon-Soo Lim ◽

Seong-Bae Park

Keyword(s):

Open Problem ◽

Strong Influence ◽

State Of The Art ◽

Language Model ◽

Generation Model ◽

Commonsense Knowledge ◽

Proposed Model ◽

Baseline Response

The sequence-to-sequence model is a widely used model for dialogue response generators, but it tends to generate safe responses for most input queries. Since safe responses are unattractive and boring, a number of efforts have been made to make the generator produce diverse responses, but generating diverse responses is yet an open problem. As a solution to this problem, this paper proposes a novel response generator, Response Generator with Response Weight (RGRW). The proposed response generator is a transformer-based sequence-to-sequence model of which the encoder is a pre-trained Bidirectional Encoder Representations from Transformers (BERT) and the decoder is a variant of Generative Pre-Training of a language model-2 (GPT-2). Since the attention on the response is not reflected enough at the transformer-based sequence-to-sequence model, the proposed generator enhances the influence of a response by the response weight, which determines the importance of each token in a query with respect to the response. Then, the decoder of the generator processes the response weight as well as a query encoding to generate a diverse response. The effectiveness of RGRW is proven by showing that it generates more diverse and informative responses than the baseline response generator by focusing more on the tokens that are important for generating the response. Additionally, the proposed model overwhelms the Commonsense Knowledge-Aware Dialogue generation model (ConKADI), which is a state-of-the-art model.

Download Full-text

Exemplar Guided Neural Dialogue Generation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/498 ◽

2020 ◽

Author(s):

Hengyi Cai ◽

Hongshen Chen ◽

Yonghao Song ◽

Xiaofang Zhao ◽

Dawei Yin

Keyword(s):

Large Scale ◽

State Of The Art ◽

Training Data ◽

Small Subset ◽

Generation Model ◽

Retrieval Model ◽

Training Set ◽

Dialogue Model ◽

Quantitative Metrics ◽

The Given

Humans benefit from previous experiences when taking actions. Similarly, related examples from the training data also provide exemplary information for neural dialogue models when responding to a given input message. However, effectively fusing such exemplary information into dialogue generation is non-trivial: useful exemplars are required to be not only literally-similar, but also topic-related with the given context. Noisy exemplars impair the neural dialogue models understanding the conversation topics and even corrupt the response generation. To address the issues, we propose an exemplar guided neural dialogue generation model where exemplar responses are retrieved in terms of both the text similarity and the topic proximity through a two-stage exemplar retrieval model. In the first stage, a small subset of conversations is retrieved from a training set given a dialogue context. These candidate exemplars are then finely ranked regarding the topical proximity to choose the best-matched exemplar response. To further induce the neural dialogue generation model consulting the exemplar response and the conversation topics more faithfully, we introduce a multi-source sampling mechanism to provide the dialogue model with both local exemplary semantics and global topical guidance during decoding. Empirical evaluations on a large-scale conversation dataset show that the proposed approach significantly outperforms the state-of-the-art in terms of both the quantitative metrics and human evaluations.

Download Full-text

COREFERENT PAIRS DETECTION IN UKRAINIAN TEXTS USING A CONVOLUTIONAL NEURAL NETWORK

Visnyk Universytetu “Ukraina” ◽

10.36994/2707-4110-2019-2-23-25 ◽

2019 ◽

Author(s):

Sergiy Pogorilyy ◽

Artem Kramov

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Language Processing ◽

State Of The Art ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Coreference Resolution ◽

Training Process ◽

Natural Languages

The detection of coreferent pairs within a text is one of the basic tasks in the area of natural language processing (NLP). The state‑ of‑ the‑ art methods of coreference resolution are based on machine learning algorithms. The key idea of the methods is to detect certain regularities between the semantic or grammatical features of text entities. In the paper, the comparative analysis of current methods of coreference resolution in English and Ukrainian texts has been performed. The key disadvantage of many methods consists in the interpretation of coreference resolution as a classification problem. The result of coreferent pairs detection is the set of groups in which elements refer to a common entity. Therefore it is advisable to consider the coreference resolution as a clusterization task. The method of coreference resolution using the set of filtering sieves and a convolutional neural network has been suggested. The set of filtering sieves to find candidates for coreferent pairs formation has been implemented. The training process of a multichannel convolutional neural network on a marked Ukrainian corpus has been performed. The usage of a multichannel structure allows analyzing of the different components of text units: semantic, lexical, and grammatical features of words and sentences. Furthermore, it is possible to process input data with unfixed size (words or sentences of a text) using a convolutional layer. The output result of the method is the set of clusters. In order to form clusters, it is necessary to take into account the previous steps of the model’s workflow. Nevertheless, such an approach contradicts the traditional methodology of machine learning. Thus, the training process of the network has been performed using the SEARN algorithm that allows the solving of tasks with unfixed output structures using a classifier model. An experimental examination of the method on the corpus of Ukrainian news has been performed. In order to estimate the accuracy of the method the corresponding common metrics for clusterization tasks have been calculated. The results obtained can indicate that the suggested method can be used to find coreferent pairs within Ukrainian texts. The method can be also easily adapted and applied to other natural languages.

Download Full-text

Polite Dialogue Generation Without Parallel Data

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00027 ◽

2018 ◽

Vol 6 ◽

pp. 373-389 ◽

Cited By ~ 3

Author(s):

Tong Niu ◽

Mohit Bansal

Keyword(s):

State Of The Art ◽

Language Model ◽

Test Time ◽

Conversational Agents ◽

Fusion Model ◽

Dialogue Model ◽

Human Evaluation ◽

Parallel Data ◽

Weakly Supervised ◽

Reinforcement Learning Model

Stylistic dialogue response generation, with valuable applications in personality-based conversational agents, is a challenging task because the response needs to be fluent, contextually-relevant, as well as paralinguistically accurate. Moreover, parallel datasets for regular-to-stylistic pairs are usually unavailable. We present three weakly-supervised models that can generate diverse, polite (or rude) dialogue responses without parallel data. Our late fusion model (Fusion) merges the decoder of an encoder-attention-decoder dialogue model with a language model trained on stand-alone polite utterances. Our label-finetuning (LFT) model prepends to each source sequence a politeness-score scaled label (predicted by our state-of-the-art politeness classifier) during training, and at test time is able to generate polite, neutral, and rude responses by simply scaling the label embedding by the corresponding score. Our reinforcement learning model (Polite-RL) encourages politeness generation by assigning rewards proportional to the politeness classifier score of the sampled response. We also present two retrievalbased, polite dialogue model baselines. Human evaluation validates that while the Fusion and the retrieval-based models achieve politeness with poorer context-relevance, the LFT and Polite-RL models can produce significantly more polite responses without sacrificing dialogue quality.

Download Full-text

Leveraging Linguistic Coordination in Reranking N-Best Candidates For End-to-End Response Selection Using BERT

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128491 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Mingzhi Yu ◽

Diane Litman

Keyword(s):

Response Selection ◽

State Of The Art ◽

Language Model ◽

Linguistic Theory ◽

Dialogue Systems ◽

Best Response ◽

End To End ◽

Selection Tasks

Retrieval-based dialogue systems select the best response from many candidates. Although many state-of-the-art models have shown promising performance in dialogue response selection tasks, there is still quite a gap between R@1 and R@10 performance. To address this, we propose to leverage linguistic coordination (a phenomenon that individuals tend to develop similar linguistic behaviors in conversation) to rerank the N-best candidates produced by BERT, a state-of-the-art pre-trained language model. Our results show an improvement in R@1 compared to BERT baselines, demonstrating the utility of repairing machine-generated outputs by leveraging a linguistic theory.

Download Full-text

Learning from Easy to Complex: Adaptive Multi-Curricula Learning for Neural Dialogue Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6244 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7472-7479

Author(s):

Hengyi Cai ◽

Hongshen Chen ◽

Cheng Zhang ◽

Yonghao Song ◽

Xiaofang Zhao ◽

...

Keyword(s):

State Of The Art ◽

Data Driven ◽

Generation Model ◽

Dialogue Systems ◽

Multiple Perspectives ◽

Learning Efficiency ◽

Learning Framework ◽

Current State ◽

Efficiency And Effectiveness ◽

Complex Adaptive

Current state-of-the-art neural dialogue systems are mainly data-driven and are trained on human-generated responses. However, due to the subjectivity and open-ended nature of human conversations, the complexity of training dialogues varies greatly. The noise and uneven complexity of query-response pairs impede the learning efficiency and effects of the neural dialogue generation models. What is more, so far, there are no unified dialogue complexity measurements, and the dialogue complexity embodies multiple aspects of attributes—specificity, repetitiveness, relevance, etc. Inspired by human behaviors of learning to converse, where children learn from easy dialogues to complex ones and dynamically adjust their learning progress, in this paper, we first analyze five dialogue attributes to measure the dialogue complexity in multiple perspectives on three publicly available corpora. Then, we propose an adaptive multi-curricula learning framework to schedule a committee of the organized curricula. The framework is established upon the reinforcement learning paradigm, which automatically chooses different curricula at the evolving learning process according to the learning status of the neural dialogue generation model. Extensive experiments conducted on five state-of-the-art models demonstrate its learning efficiency and effectiveness with respect to 13 automatic evaluation metrics and human judgments.

Download Full-text

CERG: Chinese Emotional Response Generator with Retrieval Method

Research ◽

10.34133/2020/2616410 ◽

2020 ◽

Vol 2020 ◽

pp. 1-8

Author(s):

Yangyang Zhou ◽

Fuji Ren

Keyword(s):

Emotional Response ◽

Dialogue Systems ◽

Inference Process ◽

Dialogue System ◽

Data Set ◽

Retrieval Method ◽

Input Text ◽

Proposed Model ◽

Semantic Relevance ◽

Task Oriented

The dialogue system has always been one of the important topics in the domain of artificial intelligence. So far, most of the mature dialogue systems are task-oriented based, while non-task-oriented dialogue systems still have a lot of room for improvement. We propose a data-driven non-task-oriented dialogue generator “CERG” based on neural networks. This model has the emotion recognition capability and can generate corresponding responses. The data set we adopt comes from the NTCIR-14 STC-3 CECG subtask, which contains more than 1.7 million Chinese Weibo post-response pairs and 6 emotion categories. We try to concatenate the post and the response with the emotion, then mask the response part of the input text character by character to emulate the encoder-decoder framework. We use the improved transformer blocks as the core to build the model and add regularization methods to alleviate the problems of overcorrection and exposure bias. We introduce the retrieval method to the inference process to improve the semantic relevance of generated responses. The results of the manual evaluation show that our proposed model can make different responses to different emotions to improve the human-computer interaction experience. This model can be applied to lots of domains, such as automatic reply robots of social application.

Download Full-text

EmoElicitor: An Open Domain Response Generation Model with User Emotional Reaction Awareness

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/503 ◽

2020 ◽

Cited By ~ 1

Author(s):

Shifeng Li ◽

Shi Feng ◽

Daling Wang ◽

Kaisong Song ◽

Yifei Zhang ◽

...

Keyword(s):

Latent Variable ◽

Language Model ◽

Emotional Reaction ◽

Variational Model ◽

Generation Model ◽

Dialogue Systems ◽

Dialogue System ◽

Ultimate Concern ◽

Specific Emotion ◽

Better Than

Generating emotional responses is crucial for building human-like dialogue systems. However, existing studies have focused only on generating responses by controlling the agents' emotions, while the feelings of the users, which are the ultimate concern of a dialogue system, have been neglected. In this paper, we propose a novel variational model named EmoElicitor to generate appropriate responses that can elicit user's specific emotion. We incorporate the next-round utterance after the response into the posterior network to enrich the context, and we decompose single latent variable into several sequential ones to guide response generation with the help of a pre-trained language model. Extensive experiments conducted on real-world dataset show that EmoElicitor not only performs better than the baselines in term of diversity and semantic similarity, but also can elicit emotion with higher accuracy.

Download Full-text

A Bichannel Transformer with Context Encoding for Document-Driven Conversation Generation in Social Media

Complexity ◽

10.1155/2020/3710104 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Yuanyuan Cai ◽

Min Zuo ◽

Qingchuan Zhang ◽

Haitao Xiong ◽

Ke Li

Keyword(s):

Social Media ◽

Sequence Learning ◽

State Of The Art ◽

Attention Mechanism ◽

Dialogue Systems ◽

Human Judgment ◽

Long Distance ◽

Proposed Model ◽

Crucial Information ◽

Social Intercourse

Along with the development of social media on the internet, dialogue systems are becoming more and more intelligent to meet users’ needs for communication, emotion, and social intercourse. Previous studies usually use sequence-to-sequence learning with recurrent neural networks for response generation. However, recurrent-based learning models heavily suffer from the problem of long-distance dependencies in sequences. Moreover, some models neglect crucial information in the dialogue contexts, which leads to uninformative and inflexible responses. To address these issues, we present a bichannel transformer with context encoding (BCTCE) for document-driven conversation. This conversational generator consists of a context encoder, an utterance encoder, and a decoder with attention mechanism. The encoders aim to learn the distributed representation of input texts. The multihop attention mechanism is used in BCTCE to capture the interaction between documents and dialogues. We evaluate the proposed BCTCE by both automatic evaluation and human judgment. The experimental results on the dataset CMU_DoG indicate that the proposed model yields significant improvements over the state-of-the-art baselines on most of the evaluation metrics, and the generated responses of BCTCE are more informative and more relevant to dialogues than baselines.

Download Full-text

Impact of Using Bidirectional Encoder Representations from Transformers (BERT) Models for Arabic Dialogue Acts Identification

Ingénierie des systèmes d information ◽

10.18280/isi.260506 ◽

2021 ◽

Vol 26 (5) ◽

pp. 469-475

Author(s):

Alaa Joukhadar ◽

Nada Ghneim ◽

Ghaida Rebdawi

Keyword(s):

Language Processing ◽

State Of The Art ◽

Language Model ◽

Correct Identification ◽

Dialogue Systems ◽

Morphological Segmentation ◽

Representation Model ◽

Dialogue Acts ◽

Language Representation ◽

The Impact

In Human-Computer dialogue systems, the correct identification of the intent underlying a speaker's utterance is crucial to the success of a dialogue. Several researches have studied the Dialogue Act Classification (DAC) task to identify Dialogue Acts (DA) for different languages. Recently, the emergence of Bidirectional Encoder Representations from Transformers (BERT) models, enabled establishing state-of-the-art results for a variety of natural language processing tasks in different languages. Very few researches have been done in the Arabic Dialogue acts identification task. The BERT representation model has not been studied yet in Arabic Dialogue acts detection task. In this paper, we propose a model using BERT language representation to identify Arabic Dialogue Acts. We explore the impact of using different BERT models: AraBERT Original (v0.1, v1), AraBERT Base (v0.2, and v2) and AraBERT Large (v0.2, and v2), which are pretrained on different Arabic corpora (different in size, morphological segmentation, language model window, …). The comparison was performed on two available Arabic datasets. Using AraBERTv0.2-base model for dialogue representations outperformed all other pretrained models. Moreover, we compared the performance of AraBERTv0.2-base model to the state-of-the-art approaches applied on the two datasets. The comparison showed that this representation model outperformed the performance both state-of-the-art models.

Download Full-text