scholarly journals Progressive Bi-C3D Pose Grammar for Human Pose Estimation

2020 ◽  
Vol 34 (07) ◽  
pp. 13033-13040 ◽  
Author(s):  
Lu Zhou ◽  
Yingying Chen ◽  
Jinqiao Wang ◽  
Hanqing Lu

In this paper, we propose a progressive pose grammar network learned with Bi-C3D (Bidirectional Convolutional 3D) for human pose estimation. Exploiting the dependencies among the human body parts proves effective in solving the problems such as complex articulation, occlusion and so on. Therefore, we propose two articulated grammars learned with Bi-C3D to build the relationships of the human joints and exploit the contextual information of human body structure. Firstly, a local multi-scale Bi-C3D kinematics grammar is proposed to promote the message passing process among the locally related joints. The multi-scale kinematics grammar excavates different levels human context learned by the network. Moreover, a global sequential grammar is put forward to capture the long-range dependencies among the human body joints. The whole procedure can be regarded as a local-global progressive refinement process. Without bells and whistles, our method achieves competitive performance on both MPII and LSP benchmarks compared with previous methods, which confirms the feasibility and effectiveness of C3D in information interactions.

Author(s):  
Yiran Zhu ◽  
Xing Xu ◽  
Fumin Shen ◽  
Yanli Ji ◽  
Lianli Gao ◽  
...  

Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utilize the restricted receptive fields of filters and single-scale information, while neglecting the valuable multi-scale contextual information. To tackle this issue, we propose a novel Graph Transformer Encoder-Decoder with Atrous Convolution, named PoseGTAC, to effectively extract multi-scale context and long-range information. In our proposed PoseGTAC model, Graph Atrous Convolution (GAC) and Graph Transformer Layer (GTL), respectively for the extraction of local multi-scale and global long-range information, are combined and stacked in an encoder-decoder structure, where graph pooling and unpooling are adopted for the interaction of multi-scale information from local to global (e.g., part-scale and body-scale). Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that the proposed PoseGTAC model exceeds all previous methods and achieves state-of-the-art performance.


2017 ◽  
Vol 11 (6) ◽  
pp. 426-433 ◽  
Author(s):  
Manuel I. López‐Quintero ◽  
Manuel J. Marín‐Jiménez ◽  
Rafael Muñoz‐Salinas ◽  
Rafael Medina‐Carnicer

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 71158-71166 ◽  
Author(s):  
Rui Wang ◽  
Zhongzheng Cao ◽  
Xiangyang Wang ◽  
Zhi Liu ◽  
Xiaoqiang Zhu

Sign in / Sign up

Export Citation Format

Share Document