scholarly journals Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Shili Niu ◽  
Weihua Ou ◽  
Shihua Feng ◽  
Jianping Gou ◽  
Fei Long ◽  
...  

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

2021 ◽  
Vol 423 ◽  
pp. 327-335
Author(s):  
Fujin Zhong ◽  
Mingyang Li ◽  
Kun Zhang ◽  
Jun Hu ◽  
Li Liu

Author(s):  
Daniel Groos ◽  
Heri Ramampiaro ◽  
Espen AF Ihlen

Abstract Single-person human pose estimation facilitates markerless movement analysis in sports, as well as in clinical applications. Still, state-of-the-art models for human pose estimation generally do not meet the requirements of real-life applications. The proliferation of deep learning techniques has resulted in the development of many advanced approaches. However, with the progresses in the field, more complex and inefficient models have also been introduced, which have caused tremendous increases in computational demands. To cope with these complexity and inefficiency challenges, we propose a novel convolutional neural network architecture, called EfficientPose, which exploits recently proposed EfficientNets in order to deliver efficient and scalable single-person pose estimation. EfficientPose is a family of models harnessing an effective multi-scale feature extractor and computationally efficient detection blocks using mobile inverted bottleneck convolutions, while at the same time ensuring that the precision of the pose configurations is still improved. Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost. The results from our experiments, using the challenging MPII single-person benchmark, show that the proposed EfficientPose models substantially outperform the widely-used OpenPose model both in terms of accuracy and computational efficiency. In particular, our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets.


Author(s):  
Zhihui Yang ◽  
Xiangyu Tang ◽  
Lijuan Zhang ◽  
Zhiling Yang

Human pose estimate can be used in action recognition, video surveillance and other fields, which has received a lot of attentions. Since the flexibility of human joints and environmental factors greatly influence pose estimation accuracy, related research is confronted with many challenges. In this paper, we incorporate the pyramid convolution and attention mechanism into the residual block, and introduce a hybrid structure model which synthetically applies the local and global information of the image for the analysis of keypoints detection. In addition, our improved structure model adopts grouped convolution, and the attention module used is lightweight, which will reduce the computational cost of the network. Simulation experiments based on the MS COCO human body keypoints detection data set show that, compared with the Simple Baseline model, our model is similar in parameters and GFLOPs (giga floating-point operations per second), but the performance is better on the detection of accuracy under the multi-person scenes.


Sensors ◽  
2019 ◽  
Vol 19 (3) ◽  
pp. 718 ◽  
Author(s):  
Baohua Qiang ◽  
Shihao Zhang ◽  
Yongsong Zhan ◽  
Wu Xie ◽  
Tian Zhao

In recent years, increasing human data comes from image sensors. In this paper, a novel approach combining convolutional pose machines (CPMs) with GoogLeNet is proposed for human pose estimation using image sensor data. The first stage of the CPMs directly generates a response map of each human skeleton’s key points from images, in which we introduce some layers from the GoogLeNet. On the one hand, the improved model uses deeper network layers and more complex network structures to enhance the ability of low level feature extraction. On the other hand, the improved model applies a fine-tuning strategy, which benefits the estimation accuracy. Moreover, we introduce the inception structure to greatly reduce parameters of the model, which reduces the convergence time significantly. Extensive experiments on several datasets show that the improved model outperforms most mainstream models in accuracy and training time. The prediction efficiency of the improved model is improved by 1.023 times compared with the CPMs. At the same time, the training time of the improved model is reduced 3.414 times. This paper presents a new idea for future research.


Author(s):  
Jielu Yan ◽  
MingLiang Zhou ◽  
Jinli Pan ◽  
Meng Yin ◽  
Bin Fang

3D human pose estimation describes estimating 3D articulation structure of a person from an image or a video. The technology has massive potential because it can enable tracking people and analyzing motion in real time. Recently, much research has been conducted to optimize human pose estimation, but few works have focused on reviewing 3D human pose estimation. In this paper, we offer a comprehensive survey of the state-of-the-art methods for 3D human pose estimation, referred to as pose estimation solutions, implementations on images or videos that contain different numbers of people and advanced 3D human pose estimation techniques. Furthermore, different kinds of algorithms are further subdivided into sub-categories and compared in light of different methodologies. To the best of our knowledge, this is the first such comprehensive survey of the recent progress of 3D human pose estimation and will hopefully facilitate the completion, refinement and applications of 3D human pose estimation.


Author(s):  
Wenqiang Zhang ◽  
Jiemin Fang ◽  
Xinggang Wang ◽  
Wenyu Liu

AbstractHuman pose estimation from image and video is a key task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on lightweight devices. Nowadays, real-time multimedia applications call for more efficient models for better interaction. Moreover, most deep neural networks for pose estimation directly reuse networks designed for image classification as the backbone, which are not optimized for the pose estimation task. In this paper, we propose an efficient framework for human pose estimation with two parts, an efficient backbone and an efficient head. By implementing a differentiable neural architecture search method, we customize the backbone network design for pose estimation, and reduce computational cost with negligible accuracy degradation. For the efficient head, we slim the transposed convolutions and propose a spatial information correction module to promote the performance of the final prediction. In experiments, we evaluate our networks on the MPII and COCO datasets. Our smallest model requires only 0.65 GFLOPs with 88.1% [email protected] on MPII and our large model needs only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model, HRNet, which takes 9.5 GFLOPs.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2464
Author(s):  
Zhe Zhang ◽  
Chunyu Wang ◽  
Wenhu Qin

Multiple-camera systems can expand coverage and mitigate occlusion problems. However, temporal synchronization remains a problem for budget cameras and capture devices. We propose an out-of-the-box framework to temporally synchronize multiple cameras using semantic human pose estimation from the videos. Human pose predictions are obtained with an out-of-the-shelf pose estimator for each camera. Our method firstly calibrates each pair of cameras by minimizing an energy function related to epipolar distances. We also propose a simple yet effective multiple-person association algorithm across cameras and a score-regularized energy function for improved performance. Secondly, we integrate the synchronized camera pairs into a graph and derive the optimal temporal displacement configuration for the multiple-camera system. We evaluate our method on four public benchmark datasets and demonstrate robust sub-frame synchronization accuracy on all of them.


2021 ◽  
Vol 11 (4) ◽  
pp. 1826
Author(s):  
Hailun Xia ◽  
Tianyang Zhang

Estimating the positions of human joints from monocular single RGB images has been a challenging task in recent years. Despite great progress in human pose estimation with convolutional neural networks (CNNs), a central problem still exists: the relationships and constraints, such as symmetric relations of human structures, are not well exploited in previous CNN-based methods. Considering the effectiveness of combining local and nonlocal consistencies, we propose an end-to-end self-attention network (SAN) to alleviate this issue. In SANs, attention-driven and long-range dependency modeling are adopted between joints to compensate for local content and mine details from all feature locations. To enable an SAN for both 2D and 3D pose estimations, we also design a compatible, effective and general joint learning framework to mix up the usage of different dimension data. We evaluate the proposed network on challenging benchmark datasets. The experimental results show that our method has significantly achieved competitive results on Human3.6M, MPII and COCO datasets.


Sign in / Sign up

Export Citation Format

Share Document