scholarly journals Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset

2020 ◽  
Vol 34 (05) ◽  
pp. 8689-8696
Author(s):  
Abhinav Rastogi ◽  
Xiaoxue Zang ◽  
Srinivas Sunkara ◽  
Raghav Gupta ◽  
Pranav Khaitan

Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a model for dialogue state tracking capable of zero-shot generalization to new APIs, while remaining competitive in the regular setting.

2019 ◽  
Vol 1 (2) ◽  
pp. 176-186
Author(s):  
Yadi Lao ◽  
Weijie Liu ◽  
Sheng Gao ◽  
Si Li

One of the major challenges to build a task-oriented dialogue system is that dialogue state transition frequently happens between multiple domains such as booking hotels or restaurants. Recently, the encoderdecoder model based on the end-to-end neural network has become an attractive approach to meet this challenge. However, it usually requires a sufficiently large amount of training data and it is not flexible to handle dialogue state transition. This paper addresses these problems by proposing a simple but practical framework called Multi-Domain KB-BOT (MDKB-BOT), which leverages both neural networks and rule-based strategy in natural language understanding (NLU) and dialogue management (DM). Experiments on the data set of the Chinese Human-Computer Dialogue Technology Evaluation Campaign show that MDKB-BOT achieves competitive performance on several evaluation metrics, including task completion rate and user satisfaction.


Author(s):  
Mengshi Yu ◽  
Jian Liu ◽  
Yufeng Chen ◽  
Jinan Xu ◽  
Yujie Zhang

With task-oriented dialogue systems being widely applied in everyday life, slot filling, the essential component of task-oriented dialogue systems, is required to be quickly adapted to new domains that contain domain-specific slots with few or no training data. Previous methods for slot filling usually adopt sequence labeling framework, which, however, often has limited ability when dealing with the domain-specific slots. In this paper, we take a new perspective on cross-domain slot filling by framing it as a machine reading comprehension (MRC) problem. Our approach firstly transforms slot names into well-designed queries, which contain rich informative prior knowledge and are very helpful for the detection of domain-specific slots. In addition, we utilize the large-scale MRC dataset for pre-training, which further alleviates the data scarcity problem. Experimental results on SNIPS and ATIS datasets show that our approach consistently outperforms the existing state-of-the-art methods by a large margin.


2020 ◽  
Vol 8 ◽  
pp. 281-295
Author(s):  
Qi Zhu ◽  
Kaili Huang ◽  
Zheng Zhang ◽  
Xiaoyan Zhu ◽  
Minlie Huang

To advance multi-domain (cross-domain) dialogue modeling as well as alleviate the shortage of Chinese task-oriented datasets, we propose CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts on both user and system sides. About 60% of the dialogues have cross-domain user goals that favor inter-domain dependency and encourage natural transition across domains in conversation. We also provide a user simulator and several benchmark models for pipelined task-oriented dialogue systems, which will facilitate researchers to compare and evaluate their models on this corpus. The large size and rich annotation of CrossWOZ make it suitable to investigate a variety of tasks in cross-domain dialogue modeling, such as dialogue state tracking, policy learning, user simulation, etc.


Author(s):  
Xiaoxiao Sun ◽  
Liyi Chen ◽  
Jufeng Yang

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.


2021 ◽  
Vol 12 (2) ◽  
pp. 1-33
Author(s):  
Mauajama Firdaus ◽  
Nidhi Thakur ◽  
Asif Ekbal

Multimodality in dialogue systems has opened up new frontiers for the creation of robust conversational agents. Any multimodal system aims at bridging the gap between language and vision by leveraging diverse and often complementary information from image, audio, and video, as well as text. For every task-oriented dialog system, different aspects of the product or service are crucial for satisfying the user’s demands. Based upon the aspect, the user decides upon selecting the product or service. The ability to generate responses with the specified aspects in a goal-oriented dialogue setup facilitates user satisfaction by fulfilling the user’s goals. Therefore, in our current work, we propose the task of aspect controlled response generation in a multimodal task-oriented dialog system. We employ a multimodal hierarchical memory network for generating responses that utilize information from both text and images. As there was no readily available data for building such multimodal systems, we create a Multi-Domain Multi-Modal Dialog (MDMMD++) dataset. The dataset comprises the conversations having both text and images belonging to the four different domains, such as hotels, restaurants, electronics, and furniture. Quantitative and qualitative analysis on the newly created MDMMD++ dataset shows that the proposed methodology outperforms the baseline models for the proposed task of aspect controlled response generation.


Author(s):  
Khaldoon H. Alhussayni ◽  
Alexander Zamyatin ◽  
S. Eman Alshamery

<div><p>Dialog state tracking (DST) plays a critical role in cycle life of a task-oriented dialogue system. DST represents the goals of the consumer at each step by dialogue and describes such objectives as a conceptual structure comprising slot-value pairs and dialogue actions that specifically improve the performance and effectiveness of dialogue systems. DST faces several challenges: diversity of linguistics, dynamic social context and the dissemination of the state of dialogue over candidate values both in slot values and in dialogue acts determined in ontology. In many turns during the dialogue, users indirectly refer to the previous utterances, and that produce a challenge to distinguishing and use of related dialogue history, Recent methods used and popular for that are ineffective. In this paper, we propose a dialogue historical context self-Attention framework for DST that recognizes relevant historical context by including previous user utterance beside current user utterances and previous system actions where specific slot-value piers variations and uses that together with weighted system utterance to outperform existing models by recognizing the related context and the relevance of a system utterance. For the evaluation of the proposed model the WoZ dataset was used. The implementation was attempted with the prior user utterance as a dialogue encoder and second by the additional score combined with all the candidate slot-value pairs in the context of previous user utterances and current utterances. The proposed model obtained 0.8 per cent better results than all state-of-the-art methods in the combined precision of the target, but this is not the turnaround challenge for the submission.</p></div>


2021 ◽  
Vol 9 ◽  
pp. 557-569
Author(s):  
Lizi Liao ◽  
Le Hong Long ◽  
Yunshan Ma ◽  
Wenqiang Lei ◽  
Tat-Seng Chua

Abstract Tracking dialogue states to better interpret user goals and feed downstream policy learning is a bottleneck in dialogue management. Common practice has been to treat it as a problem of classifying dialogue content into a set of pre-defined slot-value pairs, or generating values for different slots given the dialogue history. Both have limitations on considering dependencies that occur on dialogues, and are lacking of reasoning capabilities. This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data. Empirical results demonstrate that our method outperforms the state-of-the-art methods in terms of joint belief accuracy for MultiWOZ 2.1, a large-scale human--human dialogue dataset across multiple domains.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0241271
Author(s):  
Mauajama Firdaus ◽  
Arunav Pratap Shandeelya ◽  
Asif Ekbal

Multimodal dialogue system, due to its many-fold applications, has gained much attention to the researchers and developers in recent times. With the release of large-scale multimodal dialog dataset Saha et al. 2018 on the fashion domain, it has been possible to investigate the dialogue systems having both textual and visual modalities. Response generation is an essential aspect of every dialogue system, and making the responses diverse is an important problem. For any goal-oriented conversational agent, the system’s responses must be informative, diverse and polite, that may lead to better user experiences. In this paper, we propose an end-to-end neural framework for generating varied responses in a multimodal dialogue setup capturing information from both the text and image. Multimodal encoder with co-attention between the text and image is used for focusing on the different modalities to obtain better contextual information. For effective information sharing across the modalities, we combine the information of text and images using the BLOCK fusion technique that helps in learning an improved multimodal representation. We employ stochastic beam search with Gumble Top K-tricks to achieve diversified responses while preserving the content and politeness in the responses. Experimental results show that our proposed approach performs significantly better compared to the existing and baseline methods in terms of distinct metrics, and thereby generates more diverse responses that are informative, interesting and polite without any loss of information. Empirical evaluation also reveals that images, while used along with the text, improve the efficiency of the model in generating diversified responses.


Author(s):  
Jaana Parviainen ◽  
Juho Rantala

AbstractMany experts have emphasised that chatbots are not sufficiently mature to be able to technically diagnose patient conditions or replace the judgements of health professionals. The COVID-19 pandemic, however, has significantly increased the utilisation of health-oriented chatbots, for instance, as a conversational interface to answer questions, recommend care options, check symptoms and complete tasks such as booking appointments. In this paper, we take a proactive approach and consider how the emergence of task-oriented chatbots as partially automated consulting systems can influence clinical practices and expert–client relationships. We suggest the need for new approaches in professional ethics as the large-scale deployment of artificial intelligence may revolutionise professional decision-making and client–expert interaction in healthcare organisations. We argue that the implementation of chatbots amplifies the project of rationality and automation in clinical practice and alters traditional decision-making practices based on epistemic probability and prudence. This article contributes to the discussion on the ethical challenges posed by chatbots from the perspective of healthcare professional ethics.


2020 ◽  
Vol 27 ◽  
Author(s):  
Zaheer Ullah Khan ◽  
Dechang Pi

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via n-segmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2DConvolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. Conclusion : In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.


Sign in / Sign up

Export Citation Format

Share Document