scholarly journals Multi-Scale Structure-Aware Network for Human Pose Estimation

Author(s):  
Lipeng Ke ◽  
Ming-Ching Chang ◽  
Honggang Qi ◽  
Siwei Lyu
IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 71158-71166 ◽  
Author(s):  
Rui Wang ◽  
Zhongzheng Cao ◽  
Xiangyang Wang ◽  
Zhi Liu ◽  
Xiaoqiang Zhu

2019 ◽  
Vol 16 (04) ◽  
pp. 1941003
Author(s):  
Chunsheng Guo ◽  
Jialuo Zhou ◽  
Wenlong Du ◽  
Xuguang Zhang

Human pose estimation is a fundamental but challenging task in computer vision. The estimation of human pose mainly depends on the global information of the keypoint type and the local information of the keypoint location. However, the consistency of the cascading process makes it difficult for each stacking network to form a differentiation and collaboration mechanism. In order to solve these problems, this paper introduces a new human pose estimation framework called Multi-Scale Collaborative (MSC) network. The pre-processing network forms feature maps of different sizes, and dispatches them to various locations of the stack network, with small-scale features reaching the front-end stacking network and large-scale features reaching the back-end stacking network. A new loss function is proposed for MSC network. Different keypoints have different weight coefficients of loss function at different scales, and the keypoint weight coefficients are dynamically adjusted from the top hourglass network to the bottom hourglass network. Experimental results show that the proposed method is competitive in MPII and LSP challenge leaderboard among the state-of-the-art methods.


Author(s):  
Yiran Zhu ◽  
Xing Xu ◽  
Fumin Shen ◽  
Yanli Ji ◽  
Lianli Gao ◽  
...  

Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utilize the restricted receptive fields of filters and single-scale information, while neglecting the valuable multi-scale contextual information. To tackle this issue, we propose a novel Graph Transformer Encoder-Decoder with Atrous Convolution, named PoseGTAC, to effectively extract multi-scale context and long-range information. In our proposed PoseGTAC model, Graph Atrous Convolution (GAC) and Graph Transformer Layer (GTL), respectively for the extraction of local multi-scale and global long-range information, are combined and stacked in an encoder-decoder structure, where graph pooling and unpooling are adopted for the interaction of multi-scale information from local to global (e.g., part-scale and body-scale). Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that the proposed PoseGTAC model exceeds all previous methods and achieves state-of-the-art performance.


Sign in / Sign up

Export Citation Format

Share Document