Deep Spatio-Temporal Attention Model for Grain Storage Temperature Forecasting

Author(s):  
Shanshan Duan ◽  
Weidong Yang ◽  
Xuyu Wang ◽  
Shiwen Mao ◽  
Yuan Zhang
2020 ◽  
Vol 79 (37-38) ◽  
pp. 28329-28354
Author(s):  
Dong Huang ◽  
Zhaoqiang Xia ◽  
Joshua Mwesigye ◽  
Xiaoyi Feng

Sensors ◽  
2019 ◽  
Vol 19 (23) ◽  
pp. 5142 ◽  
Author(s):  
Dong Liang ◽  
Jiaxing Pan ◽  
Han Sun ◽  
Huiyu Zhou

Foreground detection is an important theme in video surveillance. Conventional background modeling approaches build sophisticated temporal statistical model to detect foreground based on low-level features, while modern semantic/instance segmentation approaches generate high-level foreground annotation, but ignore the temporal relevance among consecutive frames. In this paper, we propose a Spatio-Temporal Attention Model (STAM) for cross-scene foreground detection. To fill the semantic gap between low and high level features, appearance and optical flow features are synthesized by attention modules via the feature learning procedure. Experimental results on CDnet 2014 benchmarks validate it and outperformed many state-of-the-art methods in seven evaluation metrics. With the attention modules and optical flow, its F-measure increased 9 % and 6 % respectively. The model without any tuning showed its cross-scene generalization on Wallflower and PETS datasets. The processing speed was 10.8 fps with the frame size 256 by 256.


Author(s):  
Shanshan Duan ◽  
Weidong Yang ◽  
Xuyu Wang ◽  
Shiwen Mao ◽  
Yuan Zhang

Author(s):  
Mujtaba Asad ◽  
He Jiang ◽  
Jie Yang ◽  
Enmei Tu ◽  
Aftab A. Malik

Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion Convolutional Neural Network (Fusion-CNN), spatio-temporal attention modules and Bi-directional Convolutional LSTMs (BiConvLSTM). The Fusion-CNN learns both spatial and temporal features by combining multi-level inter-layer features from both RGB and Optical flow input frames. The spatial attention module is used to generate an importance mask to focus on the most important areas of the image frame. The temporal attention part, which is based on BiConvLSTM, identifies the most significant video frames which are related to violent activity. The proposed model can also localize and discriminate prominent regions in both spatial and temporal domains, given the weakly supervised training with only video-level classification labels. Experimental results evaluated on different publicly available benchmarking datasets show the superior performance of the proposed model in comparison with the existing methods. Our model achieves the improved accuracies (ACC) of 89.1%, 99.1% and 98.15% for RWF-2000, HockeyFight and Crowd-Violence datasets, respectively. For CCTV-FIGHTS dataset, we choose the mean average precision (mAp) performance metric and our model obtained 80.7% mAp.


Author(s):  
Ya Wu ◽  
Guang Chen ◽  
Zhijun Li ◽  
Lijun Zhang ◽  
Lu Xiong ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document