Depth Map Based Pose Estimation

AbstractRobust vision-based hand pose estimation is highly sought but still remains a challenging task, due to its inherent difficulty partially caused by self-occlusion among hand fingers. In this paper, an innovative framework for real-time static hand gesture recognition is introduced, based on an optimized shape representation build from multiple shape cues. The framework incorporates a specific module for hand pose estimation based on depth map data, where the hand silhouette is first extracted from the extremely detailed and accurate depth map captured by a time-of-flight (ToF) depth sensor. A hybrid multi-modal descriptor that integrates multiple affine-invariant boundary-based and region-based features is created from the hand silhouette to obtain a reliable and representative description of individual gestures. Finally, an ensemble of one-vs.-all support vector machines (SVMs) is independently trained on each of these learned feature representations to perform gesture classification. When evaluated on a publicly available dataset incorporating a relatively large and diverse collection of egocentric hand gestures, the approach yields encouraging results that agree very favorably with those reported in the literature, while maintaining real-time operation.

Download Full-text

CrossFuNet: RGB and Depth Cross-Fusion Network for Hand Pose Estimation

Sensors ◽

10.3390/s21186095 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6095

Author(s):

Xiaojing Sun ◽

Bin Wang ◽

Longxiang Huang ◽

Qian Zhang ◽

Sulei Zhu ◽

...

Keyword(s):

Pose Estimation ◽

Depth Map ◽

Depth Information ◽

Feature Maps ◽

Hand Pose Estimation ◽

Depth Sensors ◽

Key Points ◽

Rgb Images ◽

Public Datasets ◽

Hand Pose

Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to the practical application of depth-based methods. The aforementioned challenges have inspired us to combine the two modalities to offset the shortcomings of the other. In this paper, we propose a novel RGB and depth information fusion network to improve the accuracy of 3D hand pose estimation, which is called CrossFuNet. Specifically, the RGB image and the paired depth map are input into two different subnetworks, respectively. The feature maps are fused in the fusion module in which we propose a completely new approach to combine the information from the two modalities. Then, the common method is used to regress the 3D key-points by heatmaps. We validate our model on two public datasets and the results reveal that our model outperforms the state-of-the-art methods.

Download Full-text

Hand pose estimation based on deep learning depth map for hand gesture recognition

2017 Intelligent Systems and Computer Vision (ISCV) ◽

10.1109/isacv.2017.8054904 ◽

2017 ◽

Cited By ~ 2

Author(s):

Naima Otberdout ◽

Lahoucine Ballihi ◽

Driss Aboutajdine

Keyword(s):

Deep Learning ◽

Gesture Recognition ◽

Pose Estimation ◽

Depth Map ◽

Hand Gesture Recognition ◽

Hand Gesture ◽

Hand Pose Estimation ◽

Hand Pose

Download Full-text

V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2018.00533 ◽

2018 ◽

Cited By ~ 21

Author(s):

Ju Yong Chang ◽

Gyeongsik Moon ◽

Kyoung Mu Lee

Keyword(s):

Pose Estimation ◽

Depth Map ◽

Human Pose Estimation ◽

Human Pose

Download Full-text

Supervised High-Dimension Endecoder Net: 3D End to End Prediction Network for Mark-less Human Pose Estimation from Single Depth Map

2019 5th International Conference on Control, Automation and Robotics (ICCAR) ◽

10.1109/iccar.2019.8813337 ◽

2019 ◽

Author(s):

Li Shen ◽

Ying Chen

Keyword(s):

High Dimension ◽

Pose Estimation ◽

Depth Map ◽

Human Pose Estimation ◽

End To End ◽

Human Pose

Download Full-text

Learning hand latent features for unsupervised 3D hand pose estimation

Journal of Autonomous Intelligence ◽

10.32629/jai.v2i1.36 ◽

2019 ◽

Vol 2 (1) ◽

pp. 1

Author(s):

Jamal Firmat Banzi1,2 ◽

Isack Bulugu3 ◽

Zhongfu Ye1

Keyword(s):

Pose Estimation ◽

Predictive Coding ◽

Depth Map ◽

Real Data ◽

Training Data ◽

Estimation Methods ◽

Hand Pose Estimation ◽

Latent Features ◽

Estimation System ◽

Hand Pose

Recent hand pose estimation methods require large numbers of annotated training data to extract the dynamic information from a hand representation. Nevertheless, precise and dense annotation on the real data is difficult to come by and the amount of information passed to the training algorithm is significantly higher. This paper presents an approach to developing a hand pose estimation system which can accurately regress a 3D pose in an unsupervised manner. The whole process is performed in three stages. Firstly, the hand is modelled by a novel latent tree dependency model (LTDM) which transforms internal joints location to an explicit representation. Secondly, we perform predictive coding of image sequences of hand poses in order to capture latent features underlying a given image without supervision. A mapping is then performed between an image depth and a generated representation. Thirdly, the hand joints are regressed using convolutional neural networks to finally estimate the latent pose given some depth map. Finally, an unsupervised error term which is a part of the recurrent architecture ensures smooth estimations of the final pose. To demonstrate the performance of the proposed system, a complete experiment is conducted on three challenging public datasets, ICVL, MSRA, and NYU. The empirical results show the significant performance of our method which is comparable or better than state-of-the-art approaches.

Download Full-text

Faster than Real-Time Surface Pose Estimation with Application to Autonomous Robotic Grasping

Robotics ◽

10.3390/robotics11010007 ◽

2022 ◽

Vol 11 (1) ◽

pp. 7

Author(s):

Yannick Roberts ◽

Amirhossein Jabalameli ◽

Aman Behal

Keyword(s):

Real Time ◽

Pose Estimation ◽

First Principles ◽

Depth Map ◽

Software Implementation ◽

Grasp Planning ◽

Cluttered Environments ◽

Novel Approach ◽

Object Rotation ◽

Six Dof

Motivated by grasp planning applications within cluttered environments, this paper presents a novel approach to performing real-time surface segmentations of never-before-seen objects scattered across a given scene. This approach utilizes an input 2D depth map, where a first principles-based algorithm is utilized to exploit the fact that continuous surfaces are bounded by contours of high gradient. From these regions, the associated object surfaces can be isolated and further adapted for grasp planning. This paper also provides details for extracting the six-DOF pose for an isolated surface and presents the case of leveraging such a pose to execute planar grasping to achieve both force and torque closure. As a consequence of the highly parallel software implementation, the algorithm is shown to outperform prior approaches across all notable metrics and is also shown to be invariant to object rotation, scale, orientation relative to other objects, clutter, and varying degree of noise. This allows for a robust set of operations that could be applied to many areas of robotics research. The algorithm is faster than real time in the sense that it is nearly two times faster than the sensor rate of 30 fps.

Download Full-text

HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation From a Single Depth Map

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr42600.2020.00714 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jameel Malik ◽

Ibrahim Abdelaziz ◽

Ahmed Elhayek ◽

Soshi Shimada ◽

Sk Aziz Ali ◽

...

Keyword(s):

Pose Estimation ◽

Depth Map ◽

Hand Shape

Download Full-text

3D human pose estimation by depth map

The Visual Computer ◽

10.1007/s00371-019-01740-4 ◽

2019 ◽

Vol 36 (7) ◽

pp. 1401-1410

Author(s):

Jianzhai Wu ◽

Dewen Hu ◽

Fengtao Xiang ◽

Xingsheng Yuan ◽

Jiongming Su

Keyword(s):

Pose Estimation ◽

Depth Map ◽

Human Pose Estimation ◽

Human Pose ◽

3D Human Pose Estimation

Download Full-text

Image Reconstruction and Per-pixel Classification

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9941.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 5612-5617

Keyword(s):

Object Recognition ◽

Pose Estimation ◽

Depth Map ◽

Depth Image ◽

Depth Camera ◽

Image Simulation ◽

Pixel Classification ◽

Random Decision Forest ◽

Simulation Results ◽

Decision Forest

We describe face classification algorithm which can be used for object recognition, pose estimation, tracking and gesture recognition which are useful for human-computer interaction. We make use of depth camera (Creative Interactive Gesture Camera – Kinect®) to acquire the images which gives several advantages when compared over a normal RGB optical camera. In this paper we demonstrate a intermediate parsing scheme, so that an accurate per-pixel classification is used to localize the joints. We make use of an efficient random decision forest to classify the image which in turn helps to estimate the pose. As we employ depth camera to acquire depth image it may contain holes on or around depth map, so we first fill those holes and the classify the image. Simulation results was observed by varying several training parameters of the decision forest. We generally learned an efficient method which stems the basics in the development of pose estimation and tracking. Also we gained an intensive knowledge on Decision forests

Download Full-text