An Effective 3D Human Pose Estimation Method Based on Dilated Convolutions for Videos

Author(s):  
Hong Liu ◽  
Congyaxu Ren
Author(s):  
Zihao Zhang ◽  
Lei Hu ◽  
Xiaoming Deng ◽  
Shihong Xia

3D human pose estimation is a fundamental problem in artificial intelligence, and it has wide applications in AR/VR, HCI and robotics. However, human pose estimation from point clouds still suffers from noisy points and estimated jittery artifacts because of handcrafted-based point cloud sampling and single-frame-based estimation strategies. In this paper, we present a new perspective on the 3D human pose estimation method from point cloud sequences. To sample effective point clouds from input, we design a differentiable point cloud sampling method built on density-guided attention mechanism. To avoid the jitter caused by previous 3D human pose estimation problems, we adopt temporal information to obtain more stable results. Experiments on the ITOP dataset and the NTU-RGBD dataset demonstrate that all of our contributed components are effective, and our method can achieve state-of-the-art performance.


Symmetry ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 1116 ◽  
Author(s):  
Jun Sun ◽  
Mantao Wang ◽  
Xin Zhao ◽  
Dejun Zhang

In this paper, we study the problem of monocular 3D human pose estimation based on deep learning. Due to single view limitations, the monocular human pose estimation cannot avoid the inherent occlusion problem. The common methods use the multi-view based 3D pose estimation method to solve this problem. However, single-view images cannot be used directly in multi-view methods, which greatly limits practical applications. To address the above-mentioned issues, we propose a novel end-to-end 3D pose estimation network for monocular 3D human pose estimation. First, we propose a multi-view pose generator to predict multi-view 2D poses from the 2D poses in a single view. Secondly, we propose a simple but effective data augmentation method for generating multi-view 2D pose annotations, on account of the existing datasets (e.g., Human3.6M, etc.) not containing a large number of 2D pose annotations in different views. Thirdly, we employ graph convolutional network to infer a 3D pose from multi-view 2D poses. From experiments conducted on public datasets, the results have verified the effectiveness of our method. Furthermore, the ablation studies show that our method improved the performance of existing 3D pose estimation networks.


Author(s):  
Jinbao Wang ◽  
Shujie Tan ◽  
Xiantong Zhen ◽  
Shuo Xu ◽  
Feng Zheng ◽  
...  

2020 ◽  
Vol 2 (6) ◽  
pp. 471-500
Author(s):  
Xiaopeng Ji ◽  
Qi Fang ◽  
Junting Dong ◽  
Qing Shuai ◽  
Wen Jiang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document