scholarly journals SqueezeNet and Fusion Network-Based Accurate Fast Fully Convolutional Network for Hand Detection and Gesture Recognition

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Baohua Qiang ◽  
Yijie Zhai ◽  
Mingliang Zhou ◽  
Xianyi Yang ◽  
Bo Peng ◽  
...  
Author(s):  
Dan Liu ◽  
Dawei Du ◽  
Libo Zhang ◽  
Tiejian Luo ◽  
Yanjun Wu ◽  
...  

Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i.e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection. In this paper, we propose a new Scale Invariant Fully Convolutional Network (SIFCN) trained in an end-to-end fashion to detect hands efficiently. Specifically, we merge the feature maps from high to low layers in an iterative way, which handles different scales of hands better with less time overhead comparing to concatenating them simply. Moreover, we develop the Complementary Weighted Fusion (CWF) block to make full use of the distinctive features among multiple layers to achieve scale invariance. To deal with rotated hand detection, we present the rotation map to get rid of complex rotation and derotation layers. Besides, we design the multi-scale loss scheme to accelerate the training process significantly by adding supervision to the intermediate layers of the network. Compared with the state-of-the-art methods, our algorithm shows comparable accuracy and runs a 4.23 times faster speed on the VIVA dataset and achieves better average precision on Oxford hand detection dataset at a speed of 62.5 fps.


Author(s):  
Vinit Sarode ◽  
Animesh Dhagat ◽  
Rangaprasad Arun Srivatsan ◽  
Nicolas Zevallos ◽  
Simon Lucey ◽  
...  

2021 ◽  
Author(s):  
Intissar Khalifa ◽  
Ridha Ejbali ◽  
Raimondo Schettini ◽  
Mourad Zaied

Abstract Affective computing is a key research topic in artificial intelligence which is applied to psychology and machines. It consists of the estimation and measurement of human emotions. A person’s body language is one of the most significant sources of information during job interview, and it reflects a deep psychological state that is often missing from other data sources. In our work, we combine two tasks of pose estimation and emotion classification for emotional body gesture recognition to propose a deep multi-stage architecture that is able to deal with both tasks. Our deep pose decoding method detects and tracks the candidate’s skeleton in a video using a combination of depthwise convolutional network and detection-based method for 2D pose reconstruction. Moreover, we propose a representation technique based on the superposition of skeletons to generate for each video sequence a single image synthesizing the different poses of the subject. We call this image: ‘history pose image’, and it is used as input to the convolutional neural network model based on the Visual Geometry Group architecture. We demonstrate the effectiveness of our method in comparison with other methods in the state of the art on the standard Common Object in Context keypoint dataset and Face and Body gesture video database.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 673-682
Author(s):  
Jian Ji ◽  
Xiaocong Lu ◽  
Mai Luo ◽  
Minghui Yin ◽  
Qiguang Miao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document