scholarly journals PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation

Author(s):  
Yiran Zhu ◽  
Xing Xu ◽  
Fumin Shen ◽  
Yanli Ji ◽  
Lianli Gao ◽  
...  

Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utilize the restricted receptive fields of filters and single-scale information, while neglecting the valuable multi-scale contextual information. To tackle this issue, we propose a novel Graph Transformer Encoder-Decoder with Atrous Convolution, named PoseGTAC, to effectively extract multi-scale context and long-range information. In our proposed PoseGTAC model, Graph Atrous Convolution (GAC) and Graph Transformer Layer (GTL), respectively for the extraction of local multi-scale and global long-range information, are combined and stacked in an encoder-decoder structure, where graph pooling and unpooling are adopted for the interaction of multi-scale information from local to global (e.g., part-scale and body-scale). Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that the proposed PoseGTAC model exceeds all previous methods and achieves state-of-the-art performance.

2020 ◽  
Vol 34 (07) ◽  
pp. 13033-13040 ◽  
Author(s):  
Lu Zhou ◽  
Yingying Chen ◽  
Jinqiao Wang ◽  
Hanqing Lu

In this paper, we propose a progressive pose grammar network learned with Bi-C3D (Bidirectional Convolutional 3D) for human pose estimation. Exploiting the dependencies among the human body parts proves effective in solving the problems such as complex articulation, occlusion and so on. Therefore, we propose two articulated grammars learned with Bi-C3D to build the relationships of the human joints and exploit the contextual information of human body structure. Firstly, a local multi-scale Bi-C3D kinematics grammar is proposed to promote the message passing process among the locally related joints. The multi-scale kinematics grammar excavates different levels human context learned by the network. Moreover, a global sequential grammar is put forward to capture the long-range dependencies among the human body joints. The whole procedure can be regarded as a local-global progressive refinement process. Without bells and whistles, our method achieves competitive performance on both MPII and LSP benchmarks compared with previous methods, which confirms the feasibility and effectiveness of C3D in information interactions.


Author(s):  
Jinbao Wang ◽  
Shujie Tan ◽  
Xiantong Zhen ◽  
Shuo Xu ◽  
Feng Zheng ◽  
...  

2020 ◽  
Vol 2 (6) ◽  
pp. 471-500
Author(s):  
Xiaopeng Ji ◽  
Qi Fang ◽  
Junting Dong ◽  
Qing Shuai ◽  
Wen Jiang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document