scholarly journals Video Action Recognition With an Additional End-to-End Trained Temporal Stream

Author(s):  
Guojing Cong ◽  
Giacomo Domeniconi ◽  
Joshua Shapiro ◽  
Chih-Chieh Yang ◽  
Barry Chen
Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-23 ◽  
Author(s):  
Xiangchun Yu ◽  
Zhe Zhang ◽  
Lei Wu ◽  
Wei Pang ◽  
Hechang Chen ◽  
...  

Numerous human actions such as “Phoning,” “PlayingGuitar,” and “RidingHorse” can be inferred by static cue-based approaches even if their motions in video are available considering one single still image may already sufficiently explain a particular action. In this research, we investigate human action recognition in still images and utilize deep ensemble learning to automatically decompose the body pose and perceive its background information. Firstly, we construct an end-to-end NCNN-based model by attaching the nonsequential convolutional neural network (NCNN) module to the top of the pretrained model. The nonsequential network topology of NCNN can separately learn the spatial- and channel-wise features with parallel branches, which helps improve the model performance. Subsequently, in order to further exploit the advantage of the nonsequential topology, we propose an end-to-end deep ensemble learning based on the weight optimization (DELWO) model. It contributes to fusing the deep information derived from multiple models automatically from the data. Finally, we design the deep ensemble learning based on voting strategy (DELVS) model to pool together multiple deep models with weighted coefficients to obtain a better prediction. More importantly, the model complexity can be reduced by lessening the number of trainable parameters, thereby effectively mitigating overfitting issues of the model in small datasets to some extent. We conduct experiments in Li’s action dataset, uncropped and 1.5x cropped Willow action datasets, and the results have validated the effectiveness and robustness of our proposed models in terms of mitigating overfitting issues in small datasets. Finally, we open source our code for the model in GitHub (https://github.com/yxchspring/deep_ensemble_learning) in order to share our model with the community.


2018 ◽  
Vol 29 (7) ◽  
pp. 1127-1142 ◽  
Author(s):  
Hong Zhang ◽  
Miao Xin ◽  
Shuhang Wang ◽  
Yifan Yang ◽  
Lei Zhang ◽  
...  

Author(s):  
Chunyu Xie ◽  
Ce Li ◽  
Baochang Zhang ◽  
Chen Chen ◽  
Jungong Han ◽  
...  

Skeleton-based action recognition task is entangled with complex spatio-temporal variations of skeleton joints, and remains challenging for Recurrent Neural Networks (RNNs). In this work, we propose a temporal-then-spatial recalibration scheme to alleviate such complex variations, resulting in an end-to-end Memory Attention Networks (MANs) which consist of a Temporal Attention Recalibration Module (TARM) and a Spatio-Temporal Convolution Module (STCM). Specifically, the TARM is deployed in a residual learning module that employs a novel attention learning network to recalibrate the temporal attention of frames in a skeleton sequence. The STCM treats the attention calibrated skeleton joint sequences as images and leverages the Convolution Neural Networks (CNNs) to further model the spatial and temporal information of skeleton data. These two modules (TARM and STCM) seamlessly form a single network architecture that can be trained in an end-to-end fashion. MANs significantly boost the performance of skeleton-based action recognition and achieve the best results on four challenging benchmark datasets: NTU RGB+D, HDM05, SYSU-3D and UT-Kinect.


2018 ◽  
Vol 317 ◽  
pp. 101-109 ◽  
Author(s):  
Zhigang Zhu ◽  
Hongbing Ji ◽  
Wenbo Zhang ◽  
Yiping Xu

Sign in / Sign up

Export Citation Format

Share Document