scholarly journals Schema-Guided Multi-Domain Dialogue State Tracking with Graph Attention Neural Networks

2020 ◽  
Vol 34 (05) ◽  
pp. 7521-7528 ◽  
Author(s):  
Lu Chen ◽  
Boer Lv ◽  
Chi Wang ◽  
Su Zhu ◽  
Bowen Tan ◽  
...  

Dialogue state tracking (DST) aims at estimating the current dialogue state given all the preceding conversation. For multi-domain DST, the data sparsity problem is also a major obstacle due to the increased number of state candidates. Existing approaches generally predict the value for each slot independently and do not consider slot relations, which may aggravate the data sparsity problem. In this paper, we propose a Schema-guided multi-domain dialogue State Tracker with graph attention networks (SST) that predicts dialogue states from dialogue utterances and schema graphs which contain slot relations in edges. We also introduce a graph attention matching network to fuse information from utterances and graphs, and a recurrent graph attention network to control state updating. Experiment results show that our approach obtains new state-of-the-art performance on both MultiWOZ 2.0 and MultiWOZ 2.1 benchmarks.

Author(s):  
Jiafeng Cheng ◽  
Qianqian Wang ◽  
Zhiqiang Tao ◽  
Deyan Xie ◽  
Quanxue Gao

Graph neural networks (GNNs) have made considerable achievements in processing graph-structured data. However, existing methods can not allocate learnable weights to different nodes in the neighborhood and lack of robustness on account of neglecting both node attributes and graph reconstruction. Moreover, most of multi-view GNNs mainly focus on the case of multiple graphs, while designing GNNs for solving graph-structured data of multi-view attributes is still under-explored. In this paper, we propose a novel Multi-View Attribute Graph Convolution Networks (MAGCN) model for the clustering task. MAGCN is designed with two-pathway encoders that map graph embedding features and learn the view-consistency information. Specifically, the first pathway develops multi-view attribute graph attention networks to reduce the noise/redundancy and learn the graph embedding features for each multi-view graph data. The second pathway develops consistent embedding encoders to capture the geometric relationship and probability distribution consistency among different views, which adaptively finds a consistent clustering embedding space for multi-view attributes. Experiments on three benchmark graph datasets show the superiority of our method compared with several state-of-the-art algorithms.


Author(s):  
Maosheng Guo ◽  
Yu Zhang ◽  
Ting Liu

Natural Language Inference (NLI) is an active research area, where numerous approaches based on recurrent neural networks (RNNs), convolutional neural networks (CNNs), and self-attention networks (SANs) has been proposed. Although obtaining impressive performance, previous recurrent approaches are hard to train in parallel; convolutional models tend to cost more parameters, while self-attention networks are not good at capturing local dependency of texts. To address this problem, we introduce a Gaussian prior to selfattention mechanism, for better modeling the local structure of sentences. Then we propose an efficient RNN/CNN-free architecture named Gaussian Transformer for NLI, which consists of encoding blocks modeling both local and global dependency, high-order interaction blocks collecting the evidence of multi-step inference, and a lightweight comparison block saving lots of parameters. Experiments show that our model achieves new state-of-the-art performance on both SNLI and MultiNLI benchmarks with significantly fewer parameters and considerably less training time. Besides, evaluation using the Hard NLI datasets demonstrates that our approach is less affected by the undesirable annotation artifacts.


2020 ◽  
Vol 34 (01) ◽  
pp. 303-311 ◽  
Author(s):  
Sicheng Zhao ◽  
Yunsheng Ma ◽  
Yang Gu ◽  
Jufeng Yang ◽  
Tengfei Xing ◽  
...  

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.


2020 ◽  
Vol 34 (05) ◽  
pp. 8544-8551 ◽  
Author(s):  
Giannis Nikolentzos ◽  
Antoine Tixier ◽  
Michalis Vazirgiannis

Graph neural networks have recently emerged as a very effective framework for processing graph-structured data. These models have achieved state-of-the-art performance in many tasks. Most graph neural networks can be described in terms of message passing, vertex update, and readout functions. In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD). We also propose several hierarchical variants of MPAD. Experiments conducted on 10 standard text classification datasets show that our architectures are competitive with the state-of-the-art. Ablation studies reveal further insights about the impact of the different components on performance. Code is publicly available at: https://github.com/giannisnik/mpad.


2019 ◽  
Vol 9 (18) ◽  
pp. 3836 ◽  
Author(s):  
J.-A. González ◽  
L.-F. Hurtado ◽  
E. Segarra ◽  
F. García-Granada ◽  
E. Sanchis

In this paper, we present an approach to Spanish talk shows summarization. Our approach is based on the use of Siamese Neural Networks on the transcription of the show audios. Specifically, we propose to use Hierarchical Attention Networks to select the most relevant sentences for each speaker about a given topic in the show, in order to summarize his opinion about the topic. We train these networks in a siamese way to determine whether a summary is appropriate or not. Previous evaluation of this approach on summarization task of English newspapers achieved performances similar to other state-of-the-art systems. In the absence of enough transcribed or recognized speech data to train our system for talk show summarization in Spanish, we acquire a large corpus of document-summary pairs from Spanish newspapers and we use it to train our system. We choose this newspapers domain due to its high similarity with the topics addressed in talk shows. A preliminary evaluation of our summarization system on Spanish TV programs shows the adequacy of the proposal.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6808
Author(s):  
Jianqiang Xiao ◽  
Dianbo Ma ◽  
Satoshi Yamane

Despite recent stereo matching algorithms achieving significant results on public benchmarks, the problem of requiring heavy computation remains unsolved. Most works focus on designing an architecture to reduce the computational complexity, while we take aim at optimizing 3D convolution kernels on the Pyramid Stereo Matching Network (PSMNet) for solving the problem. In this paper, we design a series of comparative experiments exploring the performance of well-known convolution kernels on PSMNet. Our model saves the computational complexity from 256.66G MAdd (Multiply-Add operations) to 69.03G MAdd (198.47G MAdd to 10.84G MAdd for only considering 3D convolutional neural networks) without losing accuracy. On Scene Flow and KITTI 2015 datasets, our model achieves results comparable to the state-of-the-art with a low computational cost.


2021 ◽  
Vol 11 (16) ◽  
pp. 7377
Author(s):  
Carlos Betancourt ◽  
Wen-Hui Chen

This work presents an application of self-attention networks for cryptocurrency trading. Cryptocurrencies are extremely volatile and unpredictable. Thus, cryptocurrency trading is challenging and involves higher risks than trading traditional financial assets such as stocks. To overcome the aforementioned problems, we propose a deep reinforcement learning (DRL) approach for cryptocurrency trading. The proposed trading system contains a self-attention network trained using an actor-critic DRL algorithm. Cryptocurrency markets contain hundreds of assets, allowing greater investment diversification, which can be accomplished if all the assets are analyzed against one another. Self-attention networks are suitable for dealing with the problem because the attention mechanism can process long sequences of data and focus on the most relevant parts of the inputs. Transaction fees are also considered in formulating the studied problem. Systems that perform trades in high frequencies cannot overlook this issue, since, after many trades, small fees can add up to significant expenses. To validate the proposed approach, a DRL environment is built using data from an important cryptocurrency market. We test our method against a state-of-the-art baseline in two different experiments. The experimental results show the proposed approach can obtain higher daily profits and has several advantages over existing methods.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 839
Author(s):  
Patrick Thiam ◽  
Hans A. Kestler ◽  
Friedhelm Schwenker

Several approaches have been proposed for the analysis of pain-related facial expressions. These approaches range from common classification architectures based on a set of carefully designed handcrafted features, to deep neural networks characterised by an autonomous extraction of relevant facial descriptors and simultaneous optimisation of a classification architecture. In the current work, an end-to-end approach based on attention networks for the analysis and recognition of pain-related facial expressions is proposed. The method combines both spatial and temporal aspects of facial expressions through a weighted aggregation of attention-based neural networks’ outputs, based on sequences of Motion History Images (MHIs) and Optical Flow Images (OFIs). Each input stream is fed into a specific attention network consisting of a Convolutional Neural Network (CNN) coupled to a Bidirectional Long Short-Term Memory (BiLSTM) Recurrent Neural Network (RNN). An attention mechanism generates a single weighted representation of each input stream (MHI sequence and OFI sequence), which is subsequently used to perform specific classification tasks. Simultaneously, a weighted aggregation of the classification scores specific to each input stream is performed to generate a final classification output. The assessment conducted on both the BioVid Heat Pain Database (Part A) and SenseEmotion Database points at the relevance of the proposed approach, as its classification performance is on par with state-of-the-art classification approaches proposed in the literature.


2020 ◽  
Vol 34 (05) ◽  
pp. 7095-7102
Author(s):  
Shuo Chen ◽  
Ewa Andrejczuk ◽  
Zhiguang Cao ◽  
Jie Zhang

In the ad hoc teamwork setting, a team of agents needs to perform a task without prior coordination. The most advanced approach learns policies based on previous experiences and reuses one of the policies to interact with new teammates. However, the selected policy in many cases is sub-optimal. Switching between policies to adapt to new teammates' behaviour takes time, which threatens the successful performance of a task. In this paper, we propose AATEAM – a method that uses the attention-based neural networks to cope with new teammates' behaviour in real-time. We train one attention network per teammate type. The attention networks learn both to extract the temporal correlations from the sequence of states (i.e. contexts) and the mapping from contexts to actions. Each attention network also learns to predict a future state given the current context and its output action. The prediction accuracies help to determine which actions the ad hoc agent should take. We perform extensive experiments to show the effectiveness of our method.


Author(s):  
Tianming Wang ◽  
Xiaojun Wan

Modeling discourse coherence is an important problem in natural language generation and understanding. Sentence ordering, the goal of which is to organize a set of sentences into a coherent text, is a commonly used task to learn and evaluate the model. In this paper, we propose a novel hierarchical attention network that captures word clues and dependencies between sentences to address this problem. Our model outperforms prior methods and achieves state-of-the-art performance on several datasets in different domains. Furthermore, our experiments demonstrate that the model performs very well even though adding noisy sentences into the set, which shows the robustness and effectiveness of the model. Visualization analysis and case study show that our model captures the structure and pattern of coherent texts not only by simple word clues but also by consecution in context.


Sign in / Sign up

Export Citation Format

Share Document