A Comprehensive Reinforcement Learning Framework for Dialogue Management Optimization

2012 ◽  
Vol 6 (8) ◽  
pp. 891-902 ◽  
Author(s):  
Lucie Daubigney ◽  
Matthieu Geist ◽  
Senthilkumar Chandramohan ◽  
Olivier Pietquin
2011 ◽  
Vol 7 (3) ◽  
pp. 1-21 ◽  
Author(s):  
Olivier Pietquin ◽  
Matthieu Geist ◽  
Senthilkumar Chandramohan ◽  
Hervé Frezza-Buet

Author(s):  
Tulika Saha ◽  
Dhawal Gupta ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Building Virtual Agents capable of carrying out complex queries of the user involving multiple intents of a domain is quite a challenge, because it demands that the agent manages several subtasks simultaneously. This article presents a universal Deep Reinforcement Learning framework that can synthesize dialogue managers capable of working in a task-oriented dialogue system encompassing various intents pertaining to a domain. The conversation between agent and user is broken down into hierarchies, to segregate subtasks pertinent to different intents. The concept of Hierarchical Reinforcement Learning, particularly options , is used to learn policies in different hierarchies that operates in distinct time steps to fulfill the user query successfully. The dialogue manager comprises top-level intent meta-policy to select among subtasks or options and a low-level controller policy to pick primitive actions to communicate with the user to complete the subtask provided to it by the top-level policy in varying intents of a domain. The proposed dialogue management module has been trained in a way such that it can be reused for any language for which it has been developed with little to no supervision. The developed system has been demonstrated for “Air Travel” and “Restaurant” domain in English and Hindi languages. Empirical results determine the robustness and efficacy of the learned dialogue policy as it outperforms several baselines and a state-of-the-art system.


2011 ◽  
Vol 2011 ◽  
pp. 1-12 ◽  
Author(s):  
Karim El-Laithy ◽  
Martin Bogdan

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.


2021 ◽  
pp. 1-1
Author(s):  
Syed Khurram Mahmud ◽  
Yuanwei Liu ◽  
Yue Chen ◽  
Kok Keong Chai

2021 ◽  
Vol 12 (6) ◽  
pp. 1-23
Author(s):  
Shuo Tao ◽  
Jingang Jiang ◽  
Defu Lian ◽  
Kai Zheng ◽  
Enhong Chen

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.


Sign in / Sign up

Export Citation Format

Share Document