Rethinking Supervised Learning and Reinforcement Learning in Task-Oriented Dialogue Systems

Despite its significant effectiveness in adversarial training approaches to multidomain task-oriented dialogue systems, adversarial inverse reinforcement learning of the dialogue policy frequently fails to balance the performance of the reward estimator and policy generator. During the optimization process, the reward estimator frequently overwhelms the policy generator, resulting in excessively uninformative gradients. We propose the variational reward estimator bottleneck (VRB), which is a novel and effective regularization strategy that aims to constrain unproductive information flows between inputs and the reward estimator. The VRB focuses on capturing discriminative features by exploiting information bottleneck on mutual information. Quantitative analysis on a multidomain task-oriented dialogue dataset demonstrates that the VRB significantly outperforms previous studies.

Download Full-text

Transfer reinforcement learning for task-oriented dialogue systems

10.14711/thesis-991012596567703412 ◽

2018 ◽

Author(s):

Kaixiang Mo

Keyword(s):

Reinforcement Learning ◽

Dialogue Systems ◽

Task Oriented

Download Full-text

Combining Reinforcement Learning with Supervised Learning for Sepsis Treatment

10.1145/3426020.3426077 ◽

2020 ◽

Author(s):

Thanh Cong Do ◽

Hyung Jeong Yang ◽

Seok Bong Yoo ◽

In-Jae Oh

Keyword(s):

Reinforcement Learning ◽

Supervised Learning

Download Full-text

Spoken Language Understanding for Task-oriented Dialogue Systems with Augmented Memory Networks

10.18653/v1/2021.naacl-main.63 ◽

2021 ◽

Author(s):

Jie Wu ◽

Ian Harris ◽

Hongzhi Zhao

Keyword(s):

Spoken Language ◽

Dialogue Systems ◽

Language Understanding ◽

Spoken Language Understanding ◽

Task Oriented

Download Full-text

A strategy learning model for autonomous agents based on classification

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2015-0035 ◽

2015 ◽

Vol 25 (3) ◽

pp. 471-482 ◽

Cited By ~ 7

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Learning Process ◽

Autonomous Agents ◽

Good Alternative ◽

Learning Model ◽

Learning Method ◽

Complex Environments ◽

Agent Based ◽

Proposed Model

AbstractIn this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

Download Full-text

The method of building expectation model in task-oriented dialogue systems and its realization algorithms

International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 ◽

10.1109/nlpke.2003.1275890 ◽

2004 ◽

Author(s):

Bei Liu ◽

Limin Du ◽

Shuiyuan Yu

Keyword(s):

Dialogue Systems ◽

Expectation Model ◽

Task Oriented

Download Full-text

A MULTILINGUAL APPROACH TO TASK-ORIENTED MAN-MACHINE DIALOGUE BY VOICE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001488000339 ◽

1988 ◽

Vol 02 (03) ◽

pp. 573-588

Author(s):

PHILIPPE MORIN ◽

JEAN-PAUL HATON ◽

JEAN-MARIE PIERREL ◽

GUENTHER RUSKE ◽

WALTER WEIGEL

Keyword(s):

Speech Recognition ◽

Electronic Mail ◽

Dialogue Systems ◽

Human Speech ◽

Artificial Languages ◽

Man Machine Dialogue ◽

Machine Communication ◽

Task Oriented ◽

Multilingual Approach ◽

Multimedia Interfaces

In the framework of man-machine communication, oral dialogue has a particular place since human speech presents several advantages when used either alone or in multimedia interfaces. The last decade has witnessed a proliferation of research into speech recognition and understanding, but few systems have been defined with a view to managing and understanding an actual man-machine dialogue. The PARTNER system that we describe in this paper proposes a solution in the case of task oriented dialogue with the use of artificial languages. A description of the essential characteristics of dialogue systems is followed by a presentation of the architecture and the principles of the PARTNER system. Finally, we present the most recent results obtained in the oral management of electronic mail in French and German.

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

End-to-end optimization of goal-driven and visually grounded dialogue systems

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/385 ◽

2017 ◽

Cited By ~ 16

Author(s):

Florian Strub ◽

Harm de Vries ◽

Jérémie Mary ◽

Bilal Piot ◽

Aaron Courville ◽

...

Keyword(s):

Gradient Algorithm ◽

Planning Problem ◽

Dialogue Systems ◽

Generation Task ◽

Policy Gradient ◽

Specific Object ◽

Complex Picture ◽

History Of ◽

End To End ◽

Task Oriented

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision may fail to correctly render the planning problem inherent to dialogue as well as its contextual and grounded nature. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on the question generation task from the dataset GuessWhat?! containing 120k dialogues and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

Download Full-text

I4R: Promoting Deep Reinforcement Learning by the Indicator for Expressive Representations

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/370 ◽

2020 ◽

Author(s):

Xufang Luo ◽

Qi Meng ◽

Di He ◽

Wei Chen ◽

Yunhong Wang

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Singular Values ◽

Regularization Term ◽

Policy Gradient

Learning expressive representations is always crucial for well-performed policies in deep reinforcement learning (DRL). Different from supervised learning, in DRL, accurate targets are not always available, and some inputs with different actions only have tiny differences, which stimulates the demand for learning expressive representations. In this paper, firstly, we empirically compare the representations of DRL models with different performances. We observe that the representations of a better state extractor (SE) are more scattered than a worse one when they are visualized. Thus, we investigate the singular values of representation matrix, and find that, better SEs always correspond to smaller differences among these singular values. Next, based on such observations, we define an indicator of the representations for DRL model, which is the Number of Significant Singular Values (NSSV) of a representation matrix. Then, we propose I4R algorithm, to improve DRL algorithms by adding the corresponding regularization term to enhance the NSSV. Finally, we apply I4R to both policy gradient and value based algorithms on Atari games, and the results show the superiority of our proposed method.

Download Full-text