Depth Map Based Pose Estimation

Author(s):  
Dirk Buchholz
Keyword(s):  
2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Samy Bakheet ◽  
Ayoub Al-Hamadi

AbstractRobust vision-based hand pose estimation is highly sought but still remains a challenging task, due to its inherent difficulty partially caused by self-occlusion among hand fingers. In this paper, an innovative framework for real-time static hand gesture recognition is introduced, based on an optimized shape representation build from multiple shape cues. The framework incorporates a specific module for hand pose estimation based on depth map data, where the hand silhouette is first extracted from the extremely detailed and accurate depth map captured by a time-of-flight (ToF) depth sensor. A hybrid multi-modal descriptor that integrates multiple affine-invariant boundary-based and region-based features is created from the hand silhouette to obtain a reliable and representative description of individual gestures. Finally, an ensemble of one-vs.-all support vector machines (SVMs) is independently trained on each of these learned feature representations to perform gesture classification. When evaluated on a publicly available dataset incorporating a relatively large and diverse collection of egocentric hand gestures, the approach yields encouraging results that agree very favorably with those reported in the literature, while maintaining real-time operation.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6095
Author(s):  
Xiaojing Sun ◽  
Bin Wang ◽  
Longxiang Huang ◽  
Qian Zhang ◽  
Sulei Zhu ◽  
...  

Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to the practical application of depth-based methods. The aforementioned challenges have inspired us to combine the two modalities to offset the shortcomings of the other. In this paper, we propose a novel RGB and depth information fusion network to improve the accuracy of 3D hand pose estimation, which is called CrossFuNet. Specifically, the RGB image and the paired depth map are input into two different subnetworks, respectively. The feature maps are fused in the fusion module in which we propose a completely new approach to combine the information from the two modalities. Then, the common method is used to regress the 3D key-points by heatmaps. We validate our model on two public datasets and the results reveal that our model outperforms the state-of-the-art methods.


2019 ◽  
Vol 2 (1) ◽  
pp. 1
Author(s):  
Jamal Firmat Banzi1,2 ◽  
Isack Bulugu3 ◽  
Zhongfu Ye1

Recent hand pose estimation methods require large numbers of annotated training data to extract the dynamic information from a hand representation. Nevertheless, precise and dense annotation on the real data is difficult to come by and the amount of information passed to the training algorithm is significantly higher. This paper presents an approach to developing a hand pose estimation system which can accurately regress a 3D pose in an unsupervised manner. The whole process is performed in three stages. Firstly, the hand is modelled by a novel latent tree dependency model (LTDM) which transforms internal joints location to an explicit representation. Secondly, we perform predictive coding of image sequences of hand poses in order to capture latent features underlying a given image without supervision. A mapping is then performed between an image depth and a generated representation. Thirdly, the hand joints are regressed using convolutional neural networks to finally estimate the latent pose given some depth map. Finally, an unsupervised error term which is a part of the recurrent architecture ensures smooth estimations of the final pose. To demonstrate the performance of the proposed system, a complete experiment is conducted on three challenging public datasets, ICVL, MSRA, and NYU. The empirical results show the significant performance of our method which is comparable or better than state-of-the-art approaches.


Robotics ◽  
2022 ◽  
Vol 11 (1) ◽  
pp. 7
Author(s):  
Yannick Roberts ◽  
Amirhossein Jabalameli ◽  
Aman Behal

Motivated by grasp planning applications within cluttered environments, this paper presents a novel approach to performing real-time surface segmentations of never-before-seen objects scattered across a given scene. This approach utilizes an input 2D depth map, where a first principles-based algorithm is utilized to exploit the fact that continuous surfaces are bounded by contours of high gradient. From these regions, the associated object surfaces can be isolated and further adapted for grasp planning. This paper also provides details for extracting the six-DOF pose for an isolated surface and presents the case of leveraging such a pose to execute planar grasping to achieve both force and torque closure. As a consequence of the highly parallel software implementation, the algorithm is shown to outperform prior approaches across all notable metrics and is also shown to be invariant to object rotation, scale, orientation relative to other objects, clutter, and varying degree of noise. This allows for a robust set of operations that could be applied to many areas of robotics research. The algorithm is faster than real time in the sense that it is nearly two times faster than the sensor rate of 30 fps.


2019 ◽  
Vol 36 (7) ◽  
pp. 1401-1410
Author(s):  
Jianzhai Wu ◽  
Dewen Hu ◽  
Fengtao Xiang ◽  
Xingsheng Yuan ◽  
Jiongming Su

2020 ◽  
Vol 8 (6) ◽  
pp. 5612-5617

We describe face classification algorithm which can be used for object recognition, pose estimation, tracking and gesture recognition which are useful for human-computer interaction. We make use of depth camera (Creative Interactive Gesture Camera – Kinect®) to acquire the images which gives several advantages when compared over a normal RGB optical camera. In this paper we demonstrate a intermediate parsing scheme, so that an accurate per-pixel classification is used to localize the joints. We make use of an efficient random decision forest to classify the image which in turn helps to estimate the pose. As we employ depth camera to acquire depth image it may contain holes on or around depth map, so we first fill those holes and the classify the image. Simulation results was observed by varying several training parameters of the decision forest. We generally learned an efficient method which stems the basics in the development of pose estimation and tracking. Also we gained an intensive knowledge on Decision forests


Sign in / Sign up

Export Citation Format

Share Document