scholarly journals Fast and robust dynamic hand gesture recognition via key frames extraction and feature fusion

2019 ◽  
Vol 331 ◽  
pp. 424-433 ◽  
Author(s):  
Hao Tang ◽  
Hong Liu ◽  
Wei Xiao ◽  
Nicu Sebe

The dynamic hand gesture is an essential and important research topic in human-computer interaction. Recently, Deep convolutional neural network gives excellent performance in this area and gets promising results. But the Researcher had focused less attention on the feature extraction process, unification of frame, various fusion scheme and sequence-to-sequence prediction of a frame. Therefore, in this paper, we have presented an effective 2D CNN architecture with three stream networks and advances weighted feature fusion scheme with the gated recurrent network for dynamic hand gesture recognition. To obtain enough and useful information we have converted each RGB-D video to 30-frame and 45-frame for input. We have calculated an optical flow for frame-to-frame by given RGB video and extract dense motion features. After finding proper motion path, we have assigned more weight to optical flow features and fuse this information to the next stage and gets a comparable result. We have also added a newest Gated recurrent network for temporal recognition of frame and minimize training time with improved accuracy. Our proposed architecture gives 85% accuracy on the standard VIVA dataset


2020 ◽  
Vol 17 (4) ◽  
pp. 497-506
Author(s):  
Sunil Patel ◽  
Ramji Makwana

Automatic classification of dynamic hand gesture is challenging due to the large diversity in a different class of gesture, Low resolution, and it is performed by finger. Due to a number of challenges many researchers focus on this area. Recently deep neural network can be used for implicit feature extraction and Soft Max layer is used for classification. In this paper, we propose a method based on a two-dimensional convolutional neural network that performs detection and classification of hand gesture simultaneously from multimodal Red, Green, Blue, Depth (RGBD) and Optical flow Data and passes this feature to Long-Short Term Memory (LSTM) recurrent network for frame-to-frame probability generation with Connectionist Temporal Classification (CTC) network for loss calculation. We have calculated an optical flow from Red, Green, Blue (RGB) data for getting proper motion information present in the video. CTC model is used to efficiently evaluate all possible alignment of hand gesture via dynamic programming and check consistency via frame-to-frame for the visual similarity of hand gesture in the unsegmented input stream. CTC network finds the most probable sequence of a frame for a class of gesture. The frame with the highest probability value is selected from the CTC network by max decoding. This entire CTC network is trained end-to-end with calculating CTC loss for recognition of the gesture. We have used challenging Vision for Intelligent Vehicles and Applications (VIVA) dataset for dynamic hand gesture recognition captured with RGB and Depth data. On this VIVA dataset, our proposed hand gesture recognition technique outperforms competing state-of-the-art algorithms and gets an accuracy of 86%


2020 ◽  
Vol 29 (6) ◽  
pp. 1153-1164
Author(s):  
Qianyi Xu ◽  
Guihe Qin ◽  
Minghui Sun ◽  
Jie Yan ◽  
Huiming Jiang ◽  
...  

2012 ◽  
Vol 33 (4) ◽  
pp. 476-484 ◽  
Author(s):  
Jun Cheng ◽  
Can Xie ◽  
Wei Bian ◽  
Dacheng Tao

Sign in / Sign up

Export Citation Format

Share Document