Cascaded Hierarchical CNN for RGB-Based 3D Hand Pose Estimation

Mathematical Problems in Engineering ◽

10.1155/2020/8432840 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Shiming Dai ◽

Wei Liu ◽

Wenji Yang ◽

Lili Fan ◽

Jihao Zhang

Keyword(s):

Pose Estimation ◽

Depth Image ◽

Estimation Methods ◽

Hierarchical Network ◽

Human Machine Interaction ◽

Depth Cameras ◽

Hand Pose Estimation ◽

Public Datasets ◽

Rgb Image ◽

Hand Pose

3D hand pose estimation can provide basic information about gestures, which has an important significance in the fields of Human-Machine Interaction (HMI) and Virtual Reality (VR). In recent years, 3D hand pose estimation from a single depth image has made great research achievements due to the development of depth cameras. However, 3D hand pose estimation from a single RGB image is still a highly challenging problem. In this work, we propose a novel four-stage cascaded hierarchical CNN (4CHNet), which leverages hierarchical network to decompose hand pose estimation into finger pose estimation and palm pose estimation, extracts separately finger features and palm features, and finally fuses them to estimate 3D hand pose. Compared with direct estimation methods, the hand feature information extracted by the hierarchical network is more representative. Furthermore, concatenating various stages of the network for end-to-end training can make each stage mutually beneficial and progress. The experimental results on two public datasets demonstrate that our 4CHNet can significantly improve the accuracy of 3D hand pose estimation from a single RGB image.

Download Full-text

CFAM: Estimating 3D Hand Poses from a Single RGB Image with Attention

Applied Sciences ◽

10.3390/app10020618 ◽

2020 ◽

Vol 10 (2) ◽

pp. 618

Author(s):

Xianghan Wang ◽

Jie Jiang ◽

Yanming Guo ◽

Lai Kang ◽

Yingmei Wei ◽

...

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

Spatial Information ◽

Image Features ◽

Estimation Methods ◽

Feature Maps ◽

Hand Pose Estimation ◽

Rgb Images ◽

Rgb Image ◽

Hand Pose

Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.

Download Full-text

A Comprehensive Study on Deep Learning-Based 3D Hand Pose Estimation Methods

Applied Sciences ◽

10.3390/app10196850 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6850

Author(s):

Theocharis Chatzis ◽

Andreas Stergioulas ◽

Dimitrios Konstantinidis ◽

Kosmas Dimitropoulos ◽

Petros Daras

Keyword(s):

Deep Learning ◽

Pose Estimation ◽

Estimation Methods ◽

Depth Cameras ◽

Hand Pose Estimation ◽

Technological Advances ◽

Multimodal Information ◽

Cost Efficient ◽

Hand Pose ◽

Comprehensive Study

The field of 3D hand pose estimation has been gaining a lot of attention recently, due to its significance in several applications that require human-computer interaction (HCI). The utilization of technological advances, such as cost-efficient depth cameras coupled with the explosive progress of Deep Neural Networks (DNNs), has led to a significant boost in the development of robust markerless 3D hand pose estimation methods. Nonetheless, finger occlusions and rapid motions still pose significant challenges to the accuracy of such methods. In this survey, we provide a comprehensive study of the most representative deep learning-based methods in literature and propose a new taxonomy heavily based on the input data modality, being RGB, depth, or multimodal information. Finally, we demonstrate results on the most popular RGB and depth-based datasets and discuss potential research directions in this rapidly growing field.

Download Full-text

3D Hand Pose Estimation Based on Five-Layer Ensemble CNN

Sensors ◽

10.3390/s21020649 ◽

2021 ◽

Vol 21 (2) ◽

pp. 649

Author(s):

Lili Fan ◽

Hong Rao ◽

Wenji Yang

Keyword(s):

Pose Estimation ◽

Middle Finger ◽

Estimation Methods ◽

Estimation Accuracy ◽

Depth Information ◽

Hand Pose Estimation ◽

Hand Model ◽

Single Finger ◽

Public Datasets ◽

Hand Pose

Estimating accurate 3D hand pose from a single RGB image is a highly challenging problem in pose estimation due to self-geometric ambiguities, self-occlusions, and the absence of depth information. To this end, a novel Five-Layer Ensemble CNN (5LENet) is proposed based on hierarchical thinking, which is designed to decompose the hand pose estimation task into five single-finger pose estimation sub-tasks. Then, the sub-task estimation results are fused to estimate full 3D hand pose. The hierarchical method is of great benefit to extract deeper and better finger feature information, which can effectively improve the estimation accuracy of 3D hand pose. In addition, we also build a hand model with the center of the palm (represented as Palm) connected to the middle finger according to the topological structure of hand, which can further boost the performance of 3D hand pose estimation. Additionally, extensive quantitative and qualitative results on two public datasets demonstrate the effectiveness of 5LENet, yielding new state-of-the-art 3D estimation accuracy, which is superior to most advanced estimation methods.

Download Full-text

Local Regression Based Hourglass Network for Hand Pose Estimation from a Single Depth Image

2018 24th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr.2018.8545460 ◽

2018 ◽

Author(s):

Jia Li ◽

Zengfu Wang

Keyword(s):

Pose Estimation ◽

Depth Image ◽

Local Regression ◽

Hand Pose Estimation ◽

Hand Pose

Download Full-text

Improving 3D Hand Pose Estimation with Synthetic RGB Image Enhancement Using RetinexNet and Dehazing

Soft Computing: Biomedical and Related Applications - Studies in Computational Intelligence ◽

10.1007/978-3-030-76620-7_8 ◽

2021 ◽

pp. 93-105

Author(s):

Alysa Tan ◽

Bryan Kwek ◽

Kenneth Anthony ◽

Vivian Teh ◽

Yifan Yang ◽

...

Keyword(s):

Image Enhancement ◽

Pose Estimation ◽

Hand Pose Estimation ◽

Rgb Image ◽

Hand Pose

Download Full-text

A Survey on Hand Pose Estimation with Wearable Sensors and Computer-Vision-Based Methods

Sensors ◽

10.3390/s20041074 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1074 ◽

Cited By ~ 3

Author(s):

Weiya Chen ◽

Chenchen Yu ◽

Chenyu Tu ◽

Zehua Lyu ◽

Jing Tang ◽

...

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

Wearable Sensors ◽

Complex Structure ◽

Estimation Methods ◽

Human Computer Interactions ◽

Hand Pose Estimation ◽

Timely Review ◽

Kinematic Models ◽

Hand Pose

Real-time sensing and modeling of the human body, especially the hands, is an important research endeavor for various applicative purposes such as in natural human computer interactions. Hand pose estimation is a big academic and technical challenge due to the complex structure and dexterous movement of human hands. Boosted by advancements from both hardware and artificial intelligence, various prototypes of data gloves and computer-vision-based methods have been proposed for accurate and rapid hand pose estimation in recent years. However, existing reviews either focused on data gloves or on vision methods or were even based on a particular type of camera, such as the depth camera. The purpose of this survey is to conduct a comprehensive and timely review of recent research advances in sensor-based hand pose estimation, including wearable and vision-based solutions. Hand kinematic models are firstly discussed. An in-depth review is conducted on data gloves and vision-based sensor systems with corresponding modeling methods. Particularly, this review also discusses deep-learning-based methods, which are very promising in hand pose estimation. Moreover, the advantages and drawbacks of the current hand gesture estimation methods, the applicative scope, and related challenges are also discussed.

Download Full-text

DGGAN: Depth-image Guided Generative Adversarial Networks for Disentangling RGB and Depth Images in 3D Hand Pose Estimation

2020 IEEE Winter Conference on Applications of Computer Vision (WACV) ◽

10.1109/wacv45572.2020.9093380 ◽

2020 ◽

Author(s):

Liangjian Chen ◽

Shih-Yao Lin ◽

Yusheng Xie ◽

Yen-Yu Lin ◽

Wei Fan ◽

...

Keyword(s):

Pose Estimation ◽

Depth Image ◽

Generative Adversarial Networks ◽

Hand Pose Estimation ◽

Image Guided ◽

Depth Images ◽

Adversarial Networks ◽

Hand Pose

Download Full-text

Real-Time Energy Efficient Hand Pose Estimation: A Case Study

Sensors ◽

10.3390/s20102828 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2828

Author(s):

Mhd Rashed Al Koutayni ◽

Vladimir Rybalkin ◽

Jameel Malik ◽

Ahmed Elhayek ◽

Christian Weis ◽

...

Keyword(s):

Neural Network ◽

Real Time ◽

Pose Estimation ◽

Energy Efficient ◽

Graphics Processing Units ◽

Estimation Algorithm ◽

High Energy ◽

Estimation Methods ◽

Hand Pose Estimation ◽

Hand Pose

The estimation of human hand pose has become the basis for many vital applications where the user depends mainly on the hand pose as a system input. Virtual reality (VR) headset, shadow dexterous hand and in-air signature verification are a few examples of applications that require to track the hand movements in real-time. The state-of-the-art 3D hand pose estimation methods are based on the Convolutional Neural Network (CNN). These methods are implemented on Graphics Processing Units (GPUs) mainly due to their extensive computational requirements. However, GPUs are not suitable for the practical application scenarios, where the low power consumption is crucial. Furthermore, the difficulty of embedding a bulky GPU into a small device prevents the portability of such applications on mobile devices. The goal of this work is to provide an energy efficient solution for an existing depth camera based hand pose estimation algorithm. First, we compress the deep neural network model by applying the dynamic quantization techniques on different layers to achieve maximum compression without compromising accuracy. Afterwards, we design a custom hardware architecture. For our device we selected the FPGA as a target platform because FPGAs provide high energy efficiency and can be integrated in portable devices. Our solution implemented on Xilinx UltraScale+ MPSoC FPGA is 4.2× faster and 577.3× more energy efficient than the original implementation of the hand pose estimation algorithm on NVIDIA GeForce GTX 1070.

Download Full-text

CrossFuNet: RGB and Depth Cross-Fusion Network for Hand Pose Estimation

Sensors ◽

10.3390/s21186095 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6095

Author(s):

Xiaojing Sun ◽

Bin Wang ◽

Longxiang Huang ◽

Qian Zhang ◽

Sulei Zhu ◽

...

Keyword(s):

Pose Estimation ◽

Depth Map ◽

Depth Information ◽

Feature Maps ◽

Hand Pose Estimation ◽

Depth Sensors ◽

Key Points ◽

Rgb Images ◽

Public Datasets ◽

Hand Pose

Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to the practical application of depth-based methods. The aforementioned challenges have inspired us to combine the two modalities to offset the shortcomings of the other. In this paper, we propose a novel RGB and depth information fusion network to improve the accuracy of 3D hand pose estimation, which is called CrossFuNet. Specifically, the RGB image and the paired depth map are input into two different subnetworks, respectively. The feature maps are fused in the fusion module in which we propose a completely new approach to combine the information from the two modalities. Then, the common method is used to regress the 3D key-points by heatmaps. We validate our model on two public datasets and the results reveal that our model outperforms the state-of-the-art methods.

Download Full-text

Pose-Guided Hierarchical Graph Reasoning for 3-D Hand Pose Estimation From a Single Depth Image

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2021.3083637 ◽

2021 ◽

pp. 1-14

Author(s):

Pengfei Ren ◽

Haifeng Sun ◽

Jiachang Hao ◽

Qi Qi ◽

Jingyu Wang ◽

...

Keyword(s):

Pose Estimation ◽

Depth Image ◽

Hand Pose Estimation ◽

Hierarchical Graph ◽

Hand Pose

Download Full-text