Deep Neural Networks Using Residual Fast-Slow Refined Highway and Global Atomic Spatial Attention for Action Recognition and Detection

Abstract: In deep neural networks, human action detection is one of the most demanding and complex tasks. Human gesture recognition is the same as human action recognition. Gesture is defined as a series of bodily motions that communicate a message. Gestures are a more natural and preferable way for humans to engage with computers, thereby bridging the gap between humans and robots. The finest communication platform for the deaf and dumb is human action recognition. We propose in this work to create a system for hand gesture identification that recognizes hand movements, hand characteristics such as peak calculation and angle calculation, and then converts gesture photos into text. Index Terms: Human action recognition, Deaf and dumb, CNN.

Download Full-text

Image-based action recognition using hint-enhanced deep neural networks

Neurocomputing ◽

10.1016/j.neucom.2017.06.041 ◽

2017 ◽

Vol 267 ◽

pp. 475-488 ◽

Cited By ~ 29

Author(s):

Tangquan Qi ◽

Yong Xu ◽

Yuhui Quan ◽

Yaodong Wang ◽

Haibin Ling

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Deep Neural Networks

Download Full-text

Ensembles of Deep Neural Networks for Action Recognition in Still Images

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke48569.2019.8965014 ◽

2019 ◽

Author(s):

Sina Mohammadi ◽

Sina Ghofrani Majelan ◽

Shahriar B. Shokouhi

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Deep Neural Networks ◽

Still Images

Download Full-text

Multi-teacher Knowledge Distillation for Compressed Video Action Recognition on Deep Neural Networks

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8682450 ◽

2019 ◽

Author(s):

Meng-Chieh Wu ◽

Ching-Te Chiu ◽

Kun-Hsuan Wu

Keyword(s):

Neural Networks ◽

Teacher Knowledge ◽

Action Recognition ◽

Deep Neural Networks ◽

Compressed Video ◽

Knowledge Distillation

Download Full-text

Automatic segmentation of gross target volume of nasopharynx cancer using ensemble of multiscale deep neural networks with spatial attention

Neurocomputing ◽

10.1016/j.neucom.2020.06.146 ◽

2021 ◽

Vol 438 ◽

pp. 211-222

Author(s):

Haochen Mei ◽

Wenhui Lei ◽

Ran Gu ◽

Shan Ye ◽

Zhengwentai Sun ◽

...

Keyword(s):

Neural Networks ◽

Spatial Attention ◽

Deep Neural Networks ◽

Automatic Segmentation ◽

Target Volume ◽

Nasopharynx Cancer

Download Full-text

Learning Deep Trajectory Descriptor for action recognition in videos using deep neural networks

2015 IEEE International Conference on Multimedia and Expo (ICME) ◽

10.1109/icme.2015.7177461 ◽

2015 ◽

Cited By ~ 2

Author(s):

Yemin Shi ◽

Wei Zeng ◽

Tiejun Huang ◽

Yaowei Wang

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Deep Neural Networks

Download Full-text

Sparse Adversarial Perturbations for Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018973 ◽

2019 ◽

Vol 33 ◽

pp. 8973-8980 ◽

Cited By ~ 2

Author(s):

Xingxing Wei ◽

Jun Zhu ◽

Sha Yuan ◽

Hang Su

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Optimization Algorithm ◽

Deep Neural Networks ◽

Spatial Cues ◽

Current Frame ◽

Computation Cost ◽

Temporal Cues ◽

Static Images ◽

Temporal Interactions

Although adversarial samples of deep neural networks (DNNs) have been intensively studied on static images, their extensions in videos are never explored. Compared with images, attacking a video needs to consider not only spatial cues but also temporal cues. Moreover, to improve the imperceptibility as well as reduce the computation cost, perturbations should be added on as few frames as possible, i.e., adversarial perturbations are temporally sparse. This further motivates the propagation of perturbations, which denotes that perturbations added on the current frame can transfer to the next frames via their temporal interactions. Thus, no (or few) extra perturbations are needed for these frames to misclassify them. To this end, we propose the first white-box video attack method, which utilizes an l2,1-norm based optimization algorithm to compute the sparse adversarial perturbations for videos. We choose the action recognition as the targeted task, and networks with a CNN+RNN architecture as threat models to verify our method. Thanks to the propagation, we can compute perturbations on a shortened version video, and then adapt them to the long version video to fool DNNs. Experimental results on the UCF101 dataset demonstrate that even only one frame in a video is perturbed, the fooling rate can still reach 59.7%.

Download Full-text