Hierarchical Reinforcement Learning for Open-Domain Dialog

Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning (HRL), VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements – in terms of both human evaluation and automatic metrics – over state-of-the-art dialog models, including Transformers.

Download Full-text

Hierarchical reinforcement learning for situated natural language generation

Natural Language Engineering ◽

10.1017/s1351324913000375 ◽

2014 ◽

Vol 21 (3) ◽

pp. 391-435 ◽

Cited By ~ 9

Author(s):

NINA DETHLEFS ◽

HERIBERTO CUAYÁHUITL

Keyword(s):

Reinforcement Learning ◽

Natural Language ◽

Natural Language Generation ◽

Evaluation Study ◽

Sufficient Information ◽

Language Generation ◽

Hierarchical Reinforcement Learning ◽

Novel Approach ◽

Large State Space ◽

Performance Results

AbstractNatural Language Generation systems in interactive settings often face a multitude of choices, given that the communicative effect of each utterance they generate depends crucially on the interplay between its physical circumstances, addressee and interaction history. This is particularly true in interactive and situated settings. In this paper we present a novel approach forsituated Natural Language Generationin dialogue that is based onhierarchical reinforcement learningand learns the best utterance for a context by optimisation through trial and error. The model is trained from human–human corpus data and learns particularly to balance the trade-off betweenefficiencyanddetailin giving instructions: the user needs to be given sufficient information to execute their task, but without exceeding their cognitive load. We present results from simulation and a task-based human evaluation study comparing two different versions of hierarchical reinforcement learning: One operates using a hierarchy of policies with a large state space and local knowledge, and the other additionally shares knowledge across generation subtasks to enhance performance. Results show that sharing knowledge across subtasks achieves better performance than learning in isolation, leading to smoother and more successful interactions that are better perceived by human users.

Download Full-text

HRLB⌃2: A Reinforcement Learning Based Framework for Believable Bots

Applied Sciences ◽

10.3390/app8122453 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2453 ◽

Cited By ~ 5

Author(s):

Christian Arzate Cruz ◽

Jorge Ramirez Uresti

Keyword(s):

Reinforcement Learning ◽

High Dimensional ◽

State Action ◽

Hierarchical Reinforcement Learning ◽

Learning Framework ◽

Novel Approach ◽

The Creation ◽

Action Spaces ◽

Human Player

The creation of believable behaviors for Non-Player Characters (NPCs) is key to improve the players’ experience while playing a game. To achieve this objective, we need to design NPCs that appear to be controlled by a human player. In this paper, we propose a hierarchical reinforcement learning framework for believable bots (HRLB⌃2). This novel approach has been designed so it can overcome two main challenges currently faced in the creation of human-like NPCs. The first difficulty is exploring domains with high-dimensional state–action spaces, while satisfying constraints imposed by traits that characterize human-like behavior. The second problem is generating behavior diversity, by also adapting to the opponent’s playing style. We evaluated the effectiveness of our framework in the domain of the 2D fighting game named Street Fighter IV. The results of our tests demonstrate that our bot behaves in a human-like manner.

Download Full-text

Self-Attentional Credit Assignment for Transfer in Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/368 ◽

2020 ◽

Author(s):

Johan Ferret ◽

Raphael Marinier ◽

Matthieu Geist ◽

Olivier Pietquin

Keyword(s):

Reinforcement Learning ◽

Transfer Learning ◽

Research Area ◽

Credit Assignment ◽

Learning Agents ◽

Reward Function ◽

Novel Approach ◽

Novel Environments ◽

New Perspective ◽

Structural Invariants

The ability to transfer knowledge to novel environments and tasks is a sensible desiderata for general learning agents. Despite the apparent promises, transfer in RL is still an open and little exploited research area. In this paper, we take a brand-new perspective about transfer: we suggest that the ability to assign credit unveils structural invariants in the tasks that can be transferred to make RL more sample-efficient. Our main contribution is SECRET, a novel approach to transfer learning for RL that uses a backward-view credit assignment mechanism based on a self-attentive architecture. Two aspects are key to its generality: it learns to assign credit as a separate offline supervised process and exclusively modifies the reward function. Consequently, it can be supplemented by transfer methods that do not modify the reward function and it can be plugged on top of any RL algorithm.

Download Full-text

Hierarchical Reinforcement Learning for Pedagogical Policy Induction (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/647 ◽

2020 ◽

Author(s):

Guojing Zhou ◽

Hamoon Azizsoltani ◽

Markel Sanz Ausin ◽

Tiffany Barnes ◽

Min Chi

Keyword(s):

Reinforcement Learning ◽

Intelligent Tutoring Systems ◽

Data Driven ◽

Tutoring Systems ◽

Hierarchical Reinforcement Learning ◽

Pedagogical Decisions ◽

E Learning ◽

Long Term Impact ◽

Adaptive Decision Making

In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. In recent years, there is growing interest in applying data-driven techniques for adaptive decision making that can dynamically tailor students' learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. An empirical classroom study shows that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy.

Download Full-text

Management of Group Homes

Australian & New Zealand Journal of Psychiatry ◽

10.3109/00048677309159745 ◽

1973 ◽

Vol 7 (3) ◽

pp. 189-191

Author(s):

Philip R. Wood ◽

Sister M. Einodor

Keyword(s):

Mental Hospital ◽

Institutional Care ◽

Group Homes ◽

Short Term ◽

Chronic Patients ◽

Term Care ◽

Training Centre ◽

And Training ◽

Do So

Since 1968 fourteen group homes for chronic patients from the Ararat Mental Hospital and Training Centre have been developed. These have been highly successful in that patients who otherwise would not have been able to leave hospital, have now been able to do so. Of the 58 patients who have been discharged to then, none have had to be readmitted permanently, and only four have been admitted for short-term care. Considering that the majority of patients have been transferred to us for long term institutional care we believe that the overall results are very satisfactory.

Download Full-text

Collecting Words: A Clinical Example of a Morphology-Focused Orthographic Intervention

Language Speech and Hearing Services in Schools ◽

10.1044/2020_lshss-19-00050 ◽

2020 ◽

Vol 51 (3) ◽

pp. 544-560 ◽

Cited By ~ 3

Author(s):

Kimberly A. Murphy ◽

Emily A. Diehm

Keyword(s):

Written Language ◽

Morphological Knowledge ◽

English Orthography ◽

Word Level ◽

Novel Approach ◽

Language Pathology ◽

The One ◽

Critical Intervention ◽

Reading And Spelling ◽

Reading And Spelling Difficulties

Purpose Morphological interventions promote gains in morphological knowledge and in other oral and written language skills (e.g., phonological awareness, vocabulary, reading, and spelling), yet we have a limited understanding of critical intervention features. In this clinical focus article, we describe a relatively novel approach to teaching morphology that considers its role as the key organizing principle of English orthography. We also present a clinical example of such an intervention delivered during a summer camp at a university speech and hearing clinic. Method Graduate speech-language pathology students provided a 6-week morphology-focused orthographic intervention to children in first through fourth grade ( n = 10) who demonstrated word-level reading and spelling difficulties. The intervention focused children's attention on morphological families, teaching how morphology is interrelated with phonology and etymology in English orthography. Results Comparing pre- and posttest scores, children demonstrated improvement in reading and/or spelling abilities, with the largest gains observed in spelling affixes within polymorphemic words. Children and their caregivers reacted positively to the intervention. Therefore, data from the camp offer preliminary support for teaching morphology within the context of written words, and the intervention appears to be a feasible approach for simultaneously increasing morphological knowledge, reading, and spelling. Conclusion Children with word-level reading and spelling difficulties may benefit from a morphology-focused orthographic intervention, such as the one described here. Research on the approach is warranted, and clinicians are encouraged to explore its possible effectiveness in their practice. Supplemental Material https://doi.org/10.23641/asha.12290687

Download Full-text

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/891 ◽

2019 ◽

Author(s):

Ritesh Noothigattu ◽

Djallel Bouneffouf ◽

Nicholas Mattei ◽

Rachita Chandra ◽

Piyush Madan ◽

...

Keyword(s):

Reinforcement Learning ◽

Ethical Values ◽

Large Role ◽

Learning To Learn ◽

Inverse Reinforcement Learning ◽

Time Step ◽

Novel Approach

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

Download Full-text

Medium and Long-Term Stochastic Optimization of Hybrid Pumped Storage Reservoir via Reinforcement Learning Method

International Journal of Scientific and Research Publications (IJSRP) ◽

10.29322/ijsrp.8.11.2018.p8309 ◽

2018 ◽

Vol 8 (11) ◽

Author(s):

Daniel Eliote Mbanze ◽

Li Wenwu ◽

Zhang Xueying

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Learning Method ◽

Storage Reservoir ◽

Pumped Storage

Download Full-text

Levinas as a Reader of Jewish Texts

The Oxford Handbook of Levinas ◽

10.1093/oxfordhb/9780190455934.013.22 ◽

2018 ◽

pp. 442-458

Author(s):

Ethan Kleinberg

Keyword(s):

Reading Strategies ◽

Related Issue ◽

Written Texts ◽

The Holocaust ◽

Historical Moment ◽

And Training ◽

Do So

This article attempts to understand Levinas as a reader of Jewish texts, with particular attention paid to his Talmudic commentaries. To do so, the entangled relation between oral and written texts is explored; one must be able to properly “read” but also “write,” and there is the related issue of the methodology and training to be able to do so properly. Levinas offers commentary on each issue. Several interpretations of Talmudic texts and an important discussion of reading Scripture are analyzed in order to elucidate Levinas’s reading strategies, what this tells us about his relation to the larger tradition of Talmudic commentary, and Levinas’s particular historical moment, especially the role of the Holocaust for his approach to reading the Talmud and traditional texts.

Download Full-text

A Novel Approach to Feedback Control with Deep Reinforcement Learning

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2018.09.241 ◽

2018 ◽

Vol 51 (18) ◽

pp. 31-36 ◽

Cited By ~ 3

Author(s):

Yuan Wang ◽

Kirubakaran Velswamy ◽

Biao Huang

Keyword(s):

Reinforcement Learning ◽

Feedback Control ◽

Novel Approach

Download Full-text