scholarly journals Hierarchical Reinforcement Learning for Open-Domain Dialog

2020 ◽  
Vol 34 (05) ◽  
pp. 8741-8748
Author(s):  
Abdelrhman Saleh ◽  
Natasha Jaques ◽  
Asma Ghandeharioun ◽  
Judy Shen ◽  
Rosalind Picard

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning (HRL), VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements – in terms of both human evaluation and automatic metrics – over state-of-the-art dialog models, including Transformers.

2014 ◽  
Vol 21 (3) ◽  
pp. 391-435 ◽  
Author(s):  
NINA DETHLEFS ◽  
HERIBERTO CUAYÁHUITL

AbstractNatural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach forsituated Natural Language Generationin dialogue that is based onhierarchical reinforcement learningand learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off betweenefficiencyanddetailin giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.


2018 ◽  
Vol 8 (12) ◽  
pp. 2453 ◽  
Author(s):  
Christian Arzate Cruz ◽  
Jorge Ramirez Uresti

The creation of believable behaviors for Non-Player Characters (NPCs) is key to improve the players’ experience while playing a game. To achieve this objective, we need to design NPCs that appear to be controlled by a human player. In this paper, we propose a hierarchical reinforcement learning framework for believable bots (HRLB⌃2). This novel approach has been designed so it can overcome two main challenges currently faced in the creation of human-like NPCs. The first difficulty is exploring domains with high-dimensional state–action spaces, while satisfying constraints imposed by traits that characterize human-like behavior. The second problem is generating behavior diversity, by also adapting to the opponent’s playing style. We evaluated the effectiveness of our framework in the domain of the 2D fighting game named Street Fighter IV. The results of our tests demonstrate that our bot behaves in a human-like manner.


Author(s):  
Johan Ferret ◽  
Raphael Marinier ◽  
Matthieu Geist ◽  
Olivier Pietquin

The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents. Despite the apparent promises, transfer in RL is still an open and little exploited research area. In this paper, we take a brand-new perspective about transfer: we suggest that the ability to assign credit unveils structural invariants in the tasks that can be transferred to make RL more sample-efficient. Our main contribution is SECRET, a novel approach to transfer learning for RL that uses a backward-view credit assignment mechanism based on a self-attentive architecture. Two aspects are key to its generality: it learns to assign credit as a separate offline supervised process and exclusively modifies the reward function. Consequently, it can be supplemented by transfer methods that do not modify the reward function and it can be plugged on top of any RL algorithm.


Author(s):  
Guojing Zhou ◽  
Hamoon Azizsoltani ◽  
Markel Sanz Ausin ◽  
Tiffany Barnes ◽  
Min Chi

In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. In recent years, there is growing interest in applying data-driven techniques for adaptive decision making that can dynamically tailor students' learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. An empirical classroom study shows that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy.


1973 ◽  
Vol 7 (3) ◽  
pp. 189-191
Author(s):  
Philip R. Wood ◽  
Sister M. Einodor

Since 1968 fourteen group homes for chronic patients from the Ararat Mental Hospital and Training Centre have been developed. These have been highly successful in that patients who otherwise would not have been able to leave hospital, have now been able to do so. Of the 58 patients who have been discharged to then, none have had to be readmitted permanently, and only four have been admitted for short-term care. Considering that the majority of patients have been transferred to us for long term institutional care we believe that the overall results are very satisfactory.


2020 ◽  
Vol 51 (3) ◽  
pp. 544-560 ◽  
Author(s):  
Kimberly A. Murphy ◽  
Emily A. Diehm

Purpose Morphological interventions promote gains in morphological knowledge and in other oral and written language skills (e.g., phonological awareness, vocabulary, reading, and spelling), yet we have a limited understanding of critical intervention features. In this clinical focus article, we describe a relatively novel approach to teaching morphology that considers its role as the key organizing principle of English orthography. We also present a clinical example of such an intervention delivered during a summer camp at a university speech and hearing clinic. Method Graduate speech-language pathology students provided a 6-week morphology-focused orthographic intervention to children in first through fourth grade ( n = 10) who demonstrated word-level reading and spelling difficulties. The intervention focused children's attention on morphological families, teaching how morphology is interrelated with phonology and etymology in English orthography. Results Comparing pre- and posttest scores, children demonstrated improvement in reading and/or spelling abilities, with the largest gains observed in spelling affixes within polymorphemic words. Children and their caregivers reacted positively to the intervention. Therefore, data from the camp offer preliminary support for teaching morphology within the context of written words, and the intervention appears to be a feasible approach for simultaneously increasing morphological knowledge, reading, and spelling. Conclusion Children with word-level reading and spelling difficulties may benefit from a morphology-focused orthographic intervention, such as the one described here. Research on the approach is warranted, and clinicians are encouraged to explore its possible effectiveness in their practice. Supplemental Material https://doi.org/10.23641/asha.12290687


Author(s):  
Ritesh Noothigattu ◽  
Djallel Bouneffouf ◽  
Nicholas Mattei ◽  
Rachita Chandra ◽  
Piyush Madan ◽  
...  

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.


Author(s):  
Ethan Kleinberg

This article attempts to understand Levinas as a reader of Jewish texts, with particular attention paid to his Talmudic commentaries. To do so, the entangled relation between oral and written texts is explored; one must be able to properly “read” but also “write,” and there is the related issue of the methodology and training to be able to do so properly. Levinas offers commentary on each issue. Several interpretations of Talmudic texts and an important discussion of reading Scripture are analyzed in order to elucidate Levinas’s reading strategies, what this tells us about his relation to the larger tradition of Talmudic commentary, and Levinas’s particular historical moment, especially the role of the Holocaust for his approach to reading the Talmud and traditional texts.


2018 ◽  
Vol 51 (18) ◽  
pp. 31-36 ◽  
Author(s):  
Yuan Wang ◽  
Kirubakaran Velswamy ◽  
Biao Huang

Sign in / Sign up

Export Citation Format

Share Document