scholarly journals Learning representative temporal features for action recognition

Author(s):  
Ali Javidani ◽  
Ahmad Mahmoudi-Aznaveh
2016 ◽  
Vol 173 ◽  
pp. 110-117 ◽  
Author(s):  
Zan Gao ◽  
Weizhi Nie ◽  
Anan Liu ◽  
Hua Zhang

AIP Advances ◽  
2020 ◽  
Vol 10 (7) ◽  
pp. 075208
Author(s):  
Qianyu Wu ◽  
Fangqiang Hu ◽  
Aichun Zhu ◽  
Zixuan Wang ◽  
Yaping Bao

Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 301
Author(s):  
Guocheng Liu ◽  
Caixia Zhang ◽  
Qingyang Xu ◽  
Ruoshi Cheng ◽  
Yong Song ◽  
...  

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.


Data ◽  
2020 ◽  
Vol 5 (4) ◽  
pp. 104
Author(s):  
Ashok Sarabu ◽  
Ajit Kumar Santra

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 164876-164886 ◽  
Author(s):  
Fei Wang ◽  
Guorui Wang ◽  
Yunwen Huang ◽  
Hao Chu

2020 ◽  
Vol 10 (15) ◽  
pp. 5326
Author(s):  
Xiaolei Diao ◽  
Xiaoqiang Li ◽  
Chen Huang

The same action takes different time in different cases. This difference will affect the accuracy of action recognition to a certain extent. We propose an end-to-end deep neural network called “Multi-Term Attention Networks” (MTANs), which solves the above problem by extracting temporal features with different time scales. The network consists of a Multi-Term Attention Recurrent Neural Network (MTA-RNN) and a Spatio-Temporal Convolutional Neural Network (ST-CNN). In MTA-RNN, a method for fusing multi-term temporal features are proposed to extract the temporal dependence of different time scales, and the weighted fusion temporal feature is recalibrated by the attention mechanism. Ablation research proves that this network has powerful spatio-temporal dynamic modeling capabilities for actions with different time scales. We perform extensive experiments on four challenging benchmark datasets, including the NTU RGB+D dataset, UT-Kinect dataset, Northwestern-UCLA dataset, and UWA3DII dataset. Our method achieves better results than the state-of-the-art benchmarks, which demonstrates the effectiveness of MTANs.


2014 ◽  
Vol 11 (5) ◽  
pp. 500-509 ◽  
Author(s):  
Xiao-Fei Ji ◽  
Qian-Qian Wu ◽  
Zhao-Jie Ju ◽  
Yang-Yang Wang

Sign in / Sign up

Export Citation Format

Share Document