Self-Attention ConvLSTM for Spatiotemporal Prediction

Zhihui Lin; Maomao Li; Zhuobin Zheng; Yangyang Cheng; Chun Yuan

doi:10.1609/aaai.v34i07.6819

Self-Attention ConvLSTM for Spatiotemporal Prediction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6819 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11531-11538

Author(s):

Zhihui Lin ◽

Maomao Li ◽

Zhuobin Zheng ◽

Yangyang Cheng ◽

Chun Yuan

Keyword(s):

Long Range ◽

State Of The Art ◽

The Self ◽

Time Step ◽

Traffic Flow Prediction ◽

Spatial Features ◽

Gating Mechanism ◽

Spatial Dependencies ◽

Previous State ◽

Global And Local

Spatiotemporal prediction is challenging due to the complex dynamic motion and appearance changes. Existing work concentrates on embedding additional cells into the standard ConvLSTM to memorize spatial appearances during the prediction. These models always rely on the convolution layers to capture the spatial dependence, which are local and inefficient. However, long-range spatial dependencies are significant for spatial applications. To extract spatial features with both global and local dependencies, we introduce the self-attention mechanism into ConvLSTM. Specifically, a novel self-attention memory (SAM) is proposed to memorize features with long-range dependencies in terms of spatial and temporal domains. Based on the self-attention, SAM can produce features by aggregating features across all positions of both the input itself and memory features with pair-wise similarity scores. Moreover, the additional memory is updated by a gating mechanism on aggregated features and an established highway with the memory of the previous time step. Therefore, through SAM, we can extract features with long-range spatiotemporal dependencies. Furthermore, we embed the SAM into a standard ConvLSTM to construct a self-attention ConvLSTM (SA-ConvLSTM) for the spatiotemporal prediction. In experiments, we apply the SA-ConvLSTM to perform frame prediction on the MovingMNIST and KTH datasets and traffic flow prediction on the TexiBJ dataset. Our SA-ConvLSTM achieves state-of-the-art results on both datasets with fewer parameters and higher time efficiency than previous state-of-the-art method.

Download Full-text

Wildfire Segmentation Using Deep Vision Transformers

Remote Sensing ◽

10.3390/rs13173527 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3527

Author(s):

Rafik Ghali ◽

Moulay A. Akhloufi ◽

Marwa Jmal ◽

Wided Souidene Mseddi ◽

Rabah Attia

Keyword(s):

Early Detection ◽

Long Range ◽

Forest Fires ◽

State Of The Art ◽

The Self ◽

Convolution Operators ◽

Sequence Prediction ◽

Convolutional Networks ◽

Input And Output ◽

Global And Local

In this paper, we address the problem of forest fires’ early detection and segmentation in order to predict their spread and help with fire fighting. Techniques based on Convolutional Networks are the most used and have proven to be efficient at solving such a problem. However, they remain limited in modeling the long-range relationship between objects in the image, due to the intrinsic locality of convolution operators. In order to overcome this drawback, Transformers, designed for sequence-to-sequence prediction, have emerged as alternative architectures. They have recently been used to determine the global dependencies between input and output sequences using the self-attention mechanism. In this context, we present in this work the very first study, which explores the potential of vision Transformers in the context of forest fire segmentation. Two vision-based Transformers are used, TransUNet and MedT. Thus, we design two frameworks based on the former image Transformers adapted to our complex, non-structured environment, which we evaluate using varying backbones and we optimize for forest fires’ segmentation. Extensive evaluations of both frameworks revealed a performance superior to current methods. The proposed approaches achieved a state-of-the-art performance with an F1-score of 97.7% for TransUNet architecture and 96.0% for MedT architecture. The analysis of the results showed that these models reduce fire pixels mis-classifications thanks to the extraction of both global and local features, which provide finer detection of the fire’s shape.

Download Full-text

Attention-Based Fault-Tolerant Approach for Multi-Agent Reinforcement Learning Systems

Entropy ◽

10.3390/e23091133 ◽

2021 ◽

Vol 23 (9) ◽

pp. 1133

Author(s):

Shanzhi Gu ◽

Mingyang Geng ◽

Long Lan

Keyword(s):

Reinforcement Learning ◽

Noise Intensity ◽

Fault Tolerant ◽

State Of The Art ◽

Learning Systems ◽

Noisy Environments ◽

Time Step ◽

Malicious Behavior ◽

Previous State ◽

Multi Agent

The aim of multi-agent reinforcement learning systems is to provide interacting agents with the ability to collaboratively learn and adapt to the behavior of other agents. Typically, an agent receives its private observations providing a partial view of the true state of the environment. However, in realistic settings, the harsh environment might cause one or more agents to show arbitrarily faulty or malicious behavior, which may suffice to allow the current coordination mechanisms fail. In this paper, we study a practical scenario of multi-agent reinforcement learning systems considering the security issues in the presence of agents with arbitrarily faulty or malicious behavior. The previous state-of-the-art work that coped with extremely noisy environments was designed on the basis that the noise intensity in the environment was known in advance. However, when the noise intensity changes, the existing method has to adjust the configuration of the model to learn in new environments, which limits the practical applications. To overcome these difficulties, we present an Attention-based Fault-Tolerant (FT-Attn) model, which can select not only correct, but also relevant information for each agent at every time step in noisy environments. The multihead attention mechanism enables the agents to learn effective communication policies through experience concurrent with the action policies. Empirical results showed that FT-Attn beats previous state-of-the-art methods in some extremely noisy environments in both cooperative and competitive scenarios, much closer to the upper-bound performance. Furthermore, FT-Attn maintains a more general fault tolerance ability and does not rely on the prior knowledge about the noise intensity of the environment.

Download Full-text

Graph Deep Learning for Long Range Forecasting

10.5194/egusphere-egu21-9141 ◽

2021 ◽

Author(s):

Salva Rühling Cachay ◽

Emma Erickson ◽

Arthur Fender C. Bucker ◽

Ernest Pokropek ◽

Willa Potosnak ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Long Range ◽

Large Scale ◽

State Of The Art ◽

Southern Oscillation ◽

Predictive Skill ◽

Anomaly Pattern ◽

Previous State ◽

Graph Neural Networks

Deep learning-based models have been recently shown to be competitive with, or even outperform, state-of-the-art long range forecasting models, such as for projecting the El Ni&#241;o-Southern Oscillation (ENSO). However, current deep learning models are based on convolutional neural networks which are difficult to interpret and can fail to model large-scale dependencies, such as teleconnections, that are particularly important for long range projections. Hence, we propose to explicitly model large-scale dependencies with Graph Neural Networks (GNN) to enhance explainability and improve the predictive skill of long lead time forecasts.In preliminary experiments focusing on ENSO, our GNN model outperforms previous state-of-the-art machine learning based systems for forecasts up to 6 months ahead. The explicit modeling of information flow via edges makes our model more explainable, and it is indeed shown to learn a sensible graph structure from scratch that correlates with the ENSO anomaly pattern for a given number of lead months.&#160;

Download Full-text

Gated Autoencoder Network for Spectral–Spatial Hyperspectral Unmixing

Remote Sensing ◽

10.3390/rs13163147 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3147

Author(s):

Ziqiang Hua ◽

Xiaorun Li ◽

Jianfeng Jiang ◽

Liaoying Zhao

Keyword(s):

Real World ◽

Spatial Information ◽

State Of The Art ◽

The State ◽

Spectral Unmixing ◽

Experimental Results ◽

Spectral Information ◽

Hyperspectral Unmixing ◽

Spatial Features ◽

Gating Mechanism

Convolution-based autoencoder networks have yielded promising performances in exploiting spatial–contextual signatures for spectral unmixing. However, the extracted spectral and spatial features of some networks are aggregated, which makes it difficult to balance their effects on unmixing results. In this paper, we propose two gated autoencoder networks with the intention of adaptively controlling the contribution of spectral and spatial features in unmixing process. Gating mechanism is adopted in the networks to filter and regularize spatial features to construct an unmixing algorithm based on spectral information and supplemented by spatial information. In addition, abundance sparsity regularization and gating regularization are introduced to ensure the appropriate implementation. Experimental results validate the superiority of the proposed method to the state-of-the-art techniques in both synthetic and real-world scenes. This study confirms the effectiveness of gating mechanism in improving the accuracy and efficiency of utilizing spatial signatures for spectral unmixing.

Download Full-text

Collaborative Self-Attention Network for Session-based Recommendation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/359 ◽

2020 ◽

Author(s):

Anjing Luo ◽

Pengpeng Zhao ◽

Yanchi Liu ◽

Fuzhen Zhuang ◽

Deqing Wang ◽

...

Keyword(s):

Long Range ◽

Real World ◽

State Of The Art ◽

Time Step ◽

Attention Network ◽

Art Methods ◽

Real World Datasets ◽

Item Representation

Session-based recommendation becomes a research hotspot for its ability to make recommendations for anonymous users. However, existing session-based methods have the following limitations: (1) They either lack the capability to learn complex dependencies or focus mostly on the current session without explicitly considering collaborative information. (2) They assume that the representation of an item is static and fixed for all users at each time step. We argue that even the same item can be represented differently for different users at the same time step. To this end, we propose a novel solution, Collaborative Self-Attention Network (CoSAN) for session-based recommendation, to learn the session representation and predict the intent of the current session by investigating neighborhood sessions. Specially, we first devise a collaborative item representation by aggregating the embedding of neighborhood sessions retrieved according to each item in the current session. Then, we apply self-attention to learn long-range dependencies between collaborative items and generate collaborative session representation. Finally, each session is represented by concatenating the collaborative session representation and the embedding of the current session. Extensive experiments on two real-world datasets show that CoSAN constantly outperforms state-of-the-art methods.

Download Full-text

Multi-Resolution Autoregressive Graph-to-Graph Translation for Molecules

10.26434/chemrxiv.8266745.v1 ◽

2019 ◽

Author(s):

Wengong Jin ◽

Regina Barzilay ◽

Tommi S Jaakkola

Keyword(s):

Drug Discovery ◽

State Of The Art ◽

Molecular Graph ◽

Biochemical Properties ◽

Large Margin ◽

Previous State ◽

Translation Methods ◽

Atom Level ◽

Precursor Molecules ◽

Prior State

The problem of accelerating drug discovery relies heavily on automatic tools to optimize precursor molecules to afford them with better biochemical properties. Our work in this paper substantially extends prior state-of-the-art on graph-to-graph translation methods for molecular optimization. In particular, we realize coherent multi-resolution representations by interweaving trees over substructures with the atom-level encoding of the original molecular graph. Moreover, our graph decoder is fully autoregressive, and interleaves each step of adding a new substructure with the process of resolving its connectivity to the emerging molecule. We evaluate our model on multiple molecular optimization tasks and show that our model outperforms previous state-of-the-art baselines by a large margin.

Download Full-text

Using spatial-temporal ensembles of convolutional neural networks for lumen segmentation in ureteroscopy

International Journal of Computer Assisted Radiology and Surgery ◽

10.1007/s11548-021-02376-3 ◽

2021 ◽

Author(s):

Jorge F. Lazo ◽

Aldo Marzullo ◽

Sara Moccia ◽

Michele Catellani ◽

Benoit Rosa ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Automatic Segmentation ◽

Temporal Information ◽

Invasive Technique ◽

Dice Similarity Coefficient ◽

Specular Reflections ◽

Lumen Segmentation ◽

Previous State

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.

Download Full-text

Effects of Admixtures on the Self Compacting Concrete State of the Art Report

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1006/1/012038 ◽

2020 ◽

Vol 1006 ◽

pp. 012038

Author(s):

S Christopher Gnanaraj ◽

Ramesh Babu Chokkalingam ◽

G LiziaThankam

Keyword(s):

State Of The Art ◽

The Self ◽

Self Compacting Concrete

Download Full-text

NONLINEAR GATED EXPERTS FOR TIME SERIES: DISCOVERING REGIMES AND AVOIDING OVERFITTING

International Journal of Neural Systems ◽

10.1142/s0129065795000251 ◽

1995 ◽

Vol 06 (04) ◽

pp. 373-399 ◽

Cited By ~ 161

Author(s):

ANDREAS S. WEIGEND ◽

MORGAN MANGEAS ◽

ASHOK N. SRIVASTAVA

Keyword(s):

Time Series ◽

Real World ◽

Markov Models ◽

Multilayer Perceptrons ◽

Time Step ◽

Local Complexity ◽

Previous State ◽

Segmentation Task ◽

Gating Network ◽

Update Rules

In the analysis and prediction of real-world systems, two of the key problems are nonstationarity (often in the form of switching between regimes), and overfitting (particularly serious for noisy processes). This article addresses these problems using gated experts, consisting of a (nonlinear) gating network, and several (also nonlinear) competing experts. Each expert learns to predict the conditional mean, and each expert adapts its width to match the noise level in its regime. The gating network learns to predict the probability of each expert, given the input. This article focuses on the case where the gating network bases its decision on information from the inputs. This can be contrasted to hidden Markov models where the decision is based on the previous state(s) (i.e. on the output of the gating network at the previous time step), as well as to averaging over several predictors. In contrast, gated experts soft-partition the input space, only learning to model their region. This article discusses the underlying statistical assumptions, derives the weight update rules, and compares the performance of gated experts to standard methods on three time series: (1) a computer-generated series, obtained by randomly switching between two nonlinear processes; (2) a time series from the Santa Fe Time Series Competition (the light intensity of a laser in chaotic state); and (3) the daily electricity demand of France, a real-world multivariate problem with structure on several time scales. The main results are: (1) the gating network correctly discovers the different regimes of the process; (2) the widths associated with each expert are important for the segmentation task (and they can be used to characterize the sub-processes); and (3) there is less overfitting compared to single networks (homogeneous multilayer perceptrons), since the experts learn to match their variances to the (local) noise levels. This can be viewed as matching the local complexity of the model to the local complexity of the data.

Download Full-text

Sentence Generation for Entity Description with Content-Plan Attention

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6439 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9057-9064

Author(s):

Bayu Trisedya ◽

Jianzhong Qi ◽

Rui Zhang

Keyword(s):

State Of The Art ◽

Neural Models ◽

Time Step ◽

Two Stage ◽

Sentence Generation ◽

Neural Data ◽

Attention Model ◽

Linear Sequence ◽

Proper Order ◽

Real World Datasets

We study neural data-to-text generation. Specifically, we consider a target entity that is associated with a set of attributes. We aim to generate a sentence to describe the target entity. Previous studies use encoder-decoder frameworks where the encoder treats the input as a linear sequence and uses LSTM to encode the sequence. However, linearizing a set of attributes may not yield the proper order of the attributes, and hence leads the encoder to produce an improper context to generate a description. To handle disordered input, recent studies propose two-stage neural models that use pointer networks to generate a content-plan (i.e., content-planner) and use the content-plan as input for an encoder-decoder model (i.e., text generator). However, in two-stage models, the content-planner may yield an incomplete content-plan, due to missing one or more salient attributes in the generated content-plan. This will in turn cause the text generator to generate an incomplete description. To address these problems, we propose a novel attention model that exploits content-plan to highlight salient attributes in a proper order. The challenge of integrating a content-plan in the attention model of an encoder-decoder framework is to align the content-plan and the generated description. We handle this problem by devising a coverage mechanism to track the extent to which the content-plan is exposed in the previous decoding time-step, and hence it helps our proposed attention model select the attributes to be mentioned in the description in a proper order. Experimental results show that our model outperforms state-of-the-art baselines by up to 3% and 5% in terms of BLEU score on two real-world datasets, respectively.

Download Full-text