Real-Time Multi-View Face Detection and Pose Estimation in Video Stream

Author(s):  
Yan Wang ◽  
Yanghua Liu ◽  
Linmi Tao ◽  
Guangyou Xu
Author(s):  
Shicun Chen ◽  
Yong Zhang ◽  
Baocai Yin ◽  
Boyue Wang

AbstractNowadays, face detection and head pose estimation have a lot of application such as face recognition, aiding in gaze estimation and modeling attention. For these two tasks, it is usually to design two different models. However, the head pose estimation model often depends on the region of interest (ROI) detected in advance, which means that a serial face detector is needed. Even the lightest face detector will slow down the whole forward inference time and cannot achieve real-time performance when detecting the head pose of multiple people. We can see that both face detection and head pose estimation need face features, so a shared face feature map can be used between them. In this paper, a multi-task learning model is proposed that can solve both problems simultaneously. We directly detect the location of the center point of the bounding box of face; at this location, we calculate the size of the bounding box of face and the head attitude. We evaluate our model’s performance on the AFLW. The proposed model has great competitiveness with the multi-stage face attribute analysis model, and our model can achieve real-time performance.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Samy Bakheet ◽  
Ayoub Al-Hamadi

AbstractRobust vision-based hand pose estimation is highly sought but still remains a challenging task, due to its inherent difficulty partially caused by self-occlusion among hand fingers. In this paper, an innovative framework for real-time static hand gesture recognition is introduced, based on an optimized shape representation build from multiple shape cues. The framework incorporates a specific module for hand pose estimation based on depth map data, where the hand silhouette is first extracted from the extremely detailed and accurate depth map captured by a time-of-flight (ToF) depth sensor. A hybrid multi-modal descriptor that integrates multiple affine-invariant boundary-based and region-based features is created from the hand silhouette to obtain a reliable and representative description of individual gestures. Finally, an ensemble of one-vs.-all support vector machines (SVMs) is independently trained on each of these learned feature representations to perform gesture classification. When evaluated on a publicly available dataset incorporating a relatively large and diverse collection of egocentric hand gestures, the approach yields encouraging results that agree very favorably with those reported in the literature, while maintaining real-time operation.


Sign in / Sign up

Export Citation Format

Share Document