An End-to-End Neural Dialog State Tracking for Task-Oriented Dialogs

Author(s):  
A-Yeong Kim ◽  
Tae-Hyeong Kim ◽  
Hyun-Je Song ◽  
Seong-Bae Park
2018 ◽  
Vol 2018 ◽  
pp. 1-11
Author(s):  
A-Yeong Kim ◽  
Hyun-Je Song ◽  
Seong-Bae Park

Dialog state tracking in a spoken dialog system is the task that tracks the flow of a dialog and identifies accurately what a user wants from the utterance. Since the success of a dialog is influenced by the ability of the system to catch the requirements of the user, accurate state tracking is important for spoken dialog systems. This paper proposes a two-step neural dialog state tracker which is composed of an informativeness classifier and a neural tracker. The informativeness classifier which is implemented by a CNN first filters out noninformative utterances in a dialog. Then, the neural tracker estimates dialog states from the remaining informative utterances. The tracker adopts the attention mechanism and the hierarchical softmax for its performance and fast training. To prove the effectiveness of the proposed model, we do experiments on dialog state tracking in the human-human task-oriented dialogs with the standard DSTC4 data set. Our experimental results prove the effectiveness of the proposed model by showing that the proposed model outperforms the neural trackers without the informativeness classifier, the attention mechanism, or the hierarchical softmax.


2020 ◽  
Vol 34 (05) ◽  
pp. 8107-8114
Author(s):  
Adarsh Kumar ◽  
Peter Ku ◽  
Anuj Goyal ◽  
Angeliki Metallinou ◽  
Dilek Hakkani-Tur

Task oriented dialog agents provide a natural language interface for users to complete their goal. Dialog State Tracking (DST), which is often a core component of these systems, tracks the system's understanding of the user's goal throughout the conversation. To enable accurate multi-domain DST, the model needs to encode dependencies between past utterances and slot semantics and understand the dialog context, including long-range cross-domain references. We introduce a novel architecture for this task to encode the conversation history and slot semantics more robustly by using attention mechanisms at multiple granularities. In particular, we use cross-attention to model relationships between the context and slots at different semantic levels and self-attention to resolve cross-domain coreferences. In addition, our proposed architecture does not rely on knowing the domain ontologies beforehand and can also be used in a zero-shot setting for new domains or unseen slot values. Our model improves the joint goal accuracy by 5% (absolute) in the full-data setting and by up to 2% (absolute) in the zero-shot setting over the present state-of-the-art on the MultiWoZ 2.1 dataset.


Author(s):  
Rudolf Kadlec ◽  
Miroslav Vodolan ◽  
Jindrich Libovicky ◽  
Jan Macek ◽  
Jan Kleindienst

Author(s):  
Seokhwan Kim ◽  
Luis Fernando D’Haro ◽  
Rafael E. Banchs ◽  
Jason D. Williams ◽  
Matthew Henderson

Author(s):  
Bowen Zhang ◽  
Xiaofei Xu ◽  
Xutao Li ◽  
Yunming Ye ◽  
Xiaojun Chen ◽  
...  
Keyword(s):  

Author(s):  
Silin Gao ◽  
Ryuichi Takanobu ◽  
Wei Peng ◽  
Qun Liu ◽  
Minlie Huang

Author(s):  
Florian Strub ◽  
Harm de Vries ◽  
Jérémie Mary ◽  
Bilal Piot ◽  
Aaron Courville ◽  
...  

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision may fail to correctly render the planning problem inherent to dialogue as well as its contextual and grounded nature. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on the question generation task from the dataset GuessWhat?! containing 120k dialogues and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.


Sign in / Sign up

Export Citation Format

Share Document