scholarly journals Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images

Author(s):  
Yujun Cai ◽  
Liuhao Ge ◽  
Jianfei Cai ◽  
Junsong Yuan
2017 ◽  
Vol 164 ◽  
pp. 56-67 ◽  
Author(s):  
Natalia Neverova ◽  
Christian Wolf ◽  
Florian Nebout ◽  
Graham W. Taylor

Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6095
Author(s):  
Xiaojing Sun ◽  
Bin Wang ◽  
Longxiang Huang ◽  
Qian Zhang ◽  
Sulei Zhu ◽  
...  

Despite recent successes in hand pose estimation from RGB images or depth maps, inherent challenges remain. RGB-based methods suffer from heavy self-occlusions and depth ambiguity. Depth sensors rely heavily on distance and can only be used indoors, thus there are many limitations to the practical application of depth-based methods. The aforementioned challenges have inspired us to combine the two modalities to offset the shortcomings of the other. In this paper, we propose a novel RGB and depth information fusion network to improve the accuracy of 3D hand pose estimation, which is called CrossFuNet. Specifically, the RGB image and the paired depth map are input into two different subnetworks, respectively. The feature maps are fused in the fusion module in which we propose a completely new approach to combine the information from the two modalities. Then, the common method is used to regress the 3D key-points by heatmaps. We validate our model on two public datasets and the results reveal that our model outperforms the state-of-the-art methods.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6747
Author(s):  
Yang Liu ◽  
Jie Jiang ◽  
Jiahao Sun ◽  
Xianghan Wang

Hand pose estimation from RGB images has always been a difficult task, owing to the incompleteness of the depth information. Moon et al. improved the accuracy of hand pose estimation by using a new network, InterNet, through their unique design. Still, the network still has potential for improvement. Based on the architecture of MobileNet v3 and MoGA, we redesigned a feature extractor that introduced the latest achievements in the field of computer vision, such as the ACON activation function and the new attention mechanism module, etc. Using these modules effectively with our network, architecture can better extract global features from an RGB image of the hand, leading to a greater performance improvement compared to InterNet and other similar networks.


2020 ◽  
Vol 10 (2) ◽  
pp. 618
Author(s):  
Xianghan Wang ◽  
Jie Jiang ◽  
Yanming Guo ◽  
Lai Kang ◽  
Yingmei Wei ◽  
...  

Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.


Author(s):  
YUJUN CAI ◽  
Liuhao Ge ◽  
Jianfei Cai ◽  
Nadia Magnenat-Thalmann ◽  
Junsong Yuan

Sign in / Sign up

Export Citation Format

Share Document