Generating Persona Consistent Dialogues by Exploiting Natural Language Inference

Consistency is one of the major challenges faced by dialogue agents. A human-like dialogue agent should not only respond naturally, but also maintain a consistent persona. In this paper, we exploit the advantages of natural language inference (NLI) technique to address the issue of generating persona consistent dialogues. Different from existing work that re-ranks the retrieved responses through an NLI model, we cast the task as a reinforcement learning problem and propose to exploit the NLI signals from response-persona pairs as rewards for the process of dialogue generation. Specifically, our generator employs an attention-based encoder-decoder to generate persona-based responses. Our evaluator consists of two components: an adversarially trained naturalness module and an NLI based consistency module. Moreover, we use another well-performed NLI model in the evaluation of persona-consistency. Experimental results on both human and automatic metrics, including the model-based consistency evaluation, demonstrate that the proposed approach outperforms strong generative baselines, especially in the persona-consistency of generated responses.

Download Full-text

Model-Based Multi-Objective Reinforcement Learning by a Reward Occurrence Probability Vector

Advanced Robotics and Intelligent Automation in Manufacturing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-1382-8.ch010 ◽

2020 ◽

pp. 269-295

Author(s):

Tomohiro Yamaguchi ◽

Shota Nagahama ◽

Yoshihiro Ichikawa ◽

Yoshimichi Honma ◽

Keiki Takadama

Keyword(s):

Reinforcement Learning ◽

Experimental Results ◽

Previous Model ◽

Pareto Optimal ◽

Occurrence Probability ◽

Contrast Model ◽

Multi Objective ◽

Model Based ◽

Model Free ◽

Optimal Policies

This chapter describes solving multi-objective reinforcement learning (MORL) problems where there are multiple conflicting objectives with unknown weights. Previous model-free MORL methods take large number of calculations to collect a Pareto optimal set for each V/Q-value vector. In contrast, model-based MORL can reduce such a calculation cost than model-free MORLs. However, previous model-based MORL method is for only deterministic environments. To solve them, this chapter proposes a novel model-based MORL method by a reward occurrence probability (ROP) vector with unknown weights. The experimental results are reported under the stochastic learning environments with up to 10 states, 3 actions, and 3 reward rules. The experimental results show that the proposed method collects all Pareto optimal policies, and it took about 214 seconds (10 states, 3 actions, 3 rewards) for total learning time. In future research directions, the ways to speed up methods and how to use non-optimal policies are discussed.

Download Full-text

Incremental Learning of Planning Actions in Model-Based Reinforcement Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/443 ◽

2019 ◽

Author(s):

Jun Hao Alvin Ng ◽

Ronald P. A. Petrick

Keyword(s):

Reinforcement Learning ◽

Intrinsic Motivation ◽

Incremental Learning ◽

Domain Model ◽

Experimental Results ◽

The Novel ◽

Model Based ◽

Domain Models ◽

Novel Concept ◽

Planning Problems

The soundness and optimality of a plan depends on the correctness of the domain model. Specifying complete domain models can be difficult when interactions between an agent and its environment are complex. We propose a model-based reinforcement learning (MBRL) approach to solve planning problems with unknown models. The model is learned incrementally over episodes using only experiences from the current episode which suits non-stationary environments. We introduce the novel concept of reliability as an intrinsic motivation for MBRL, and a method to learn from failure to prevent repeated instances of similar failures. Our motivation is to improve the learning efficiency and goal-directedness of MBRL. We evaluate our work with experimental results for three planning domains.

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Generalized analytical model based on harmonic coupling for hybrid plasmonic modes: comparison with numerical and experimental results

Optics Express ◽

10.1364/oe.23.027376 ◽

2015 ◽

Vol 23 (21) ◽

pp. 27376 ◽

Cited By ~ 8

Author(s):

Mitradeep Sarkar ◽

Jean-François Bryche ◽

Julien Moreau ◽

Mondher Besbes ◽

Grégory Barbillon ◽

...

Keyword(s):

Analytical Model ◽

Experimental Results ◽

Model Based

Download Full-text

Bestärkendes Lernen mittels Offline-Trajektorienplanung basierend auf iterativ approximierten Modellen

at - Automatisierungstechnik ◽

10.1515/auto-2020-0024 ◽

2020 ◽

Vol 68 (8) ◽

pp. 612-624

Author(s):

Max Pritzkoleit ◽

Robert Heedt ◽

Carsten Knoll ◽

Klaus Röbenack

Keyword(s):

Reinforcement Learning ◽

Neuronale Netze ◽

Model Based ◽

Künstliche Neuronale Netze

ZusammenfassungIn diesem Beitrag nutzen wir Künstliche Neuronale Netze (KNN) zur Approximation der Dynamik nichtlinearer (mechanischer) Systeme. Diese iterativ approximierten neuronalen Systemmodelle werden in einer Offline-Trajektorienplanung verwendet, um eine optimale Rückführung zu bestimmen, welche auf das reale System angewandt wird. Dieser Ansatz des modellbasierten bestärkenden Lernens (engl. model-based reinforcement learning (RL)) wird am Aufschwingen des Einfachwagenpendels zunächst simulativ evaluiert und zeigt gegenüber modellfreien RL-Ansätzen eine signifikante Verbesserung der Dateneffizienz. Weiterhin zeigen wir Experimentalergebnisse an einem Versuchsstand, wobei der vorgestellte Algorithmus innerhalb weniger Versuche in der Lage ist, eine für das System optimale Rückführung hinreichend gut zu approximieren.

Download Full-text