Boundary graph convolutional network for temporal action detection

2021 ◽  
Vol 109 ◽  
pp. 104144
Author(s):  
Yaosen Chen ◽  
Bing Guo ◽  
Yan Shen ◽  
Wei Wang ◽  
Weichen Lu ◽  
...  
Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2380
Author(s):  
Yiming Xu ◽  
Fangjie Zhou ◽  
Li Wang ◽  
Wei Peng ◽  
Kai Zhang

Recently, people’s demand for action recognition has extended from the initial high classification accuracy to the high accuracy of the temporal action detection. It is challenging to meet the two requirements simultaneously. The key to behavior recognition lies in the quantity and quality of the extracted features. In this paper, a two-stream convolutional network is used. A three-dimensional convolutional neural network (3D-CNN) is used to extract spatiotemporal features from the consecutive frames. A two-dimensional convolutional neural network (2D-CNN) is used to extract spatial features from the key-frames. The integration of the two networks is excellent for improving the model’s accuracy and can complete the task of distinguishing the start–stop frame. In this paper, a multi-scale feature extraction method is presented to extract more abundant feature information. At the same time, a multi-task learning model is introduced. It can further improve the accuracy of classification via sharing the data between multiple tasks. The experimental result shows that the accuracy of the modified model is improved by 10%. Meanwhile, we propose the confidence gradient, which can optimize the distinguishing method of the start–stop frame to improve the temporal action detection accuracy. The experimental result shows that the accuracy has been enhanced by 11%.


Author(s):  
Zhanning Gao ◽  
Le Wang ◽  
Qilin Zhang ◽  
Zhenxing Niu ◽  
Nanning Zheng ◽  
...  

We propose a temporal action detection by spatial segmentation framework, which simultaneously categorize actions and temporally localize action instances in untrimmed videos. The core idea is the conversion of temporal detection task into a spatial semantic segmentation task. Firstly, the video imprint representation is employed to capture the spatial/temporal interdependences within/among frames and represent them as spatial proximity in a feature space. Subsequently, the obtained imprint representation is spatially segmented by a fully convolutional network. With such segmentation labels projected back to the video space, both temporal action boundary localization and per-frame spatial annotation can be obtained simultaneously. The proposed framework is robust to variable lengths of untrimmed videos, due to the underlying fixed-size imprint representations. The efficacy of the framework is validated in two public action detection datasets.


Author(s):  
Wenfei Yang ◽  
Tianzhu Zhang ◽  
Zhendong Mao ◽  
Yongdong Zhanga ◽  
Qi Tian ◽  
...  

Author(s):  
Linchao He ◽  
Jiong Mu ◽  
Mengting Luo ◽  
Yunlu Lu ◽  
Xuefeng Tan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document