A combined local and global structure module for human pose estimation

Human pose estimate can be used in action recognition, video surveillance and other fields, which has received a lot of attentions. Since the flexibility of human joints and environmental factors greatly influence pose estimation accuracy, related research is confronted with many challenges. In this paper, we incorporate the pyramid convolution and attention mechanism into the residual block, and introduce a hybrid structure model which synthetically applies the local and global information of the image for the analysis of keypoints detection. In addition, our improved structure model adopts grouped convolution, and the attention module used is lightweight, which will reduce the computational cost of the network. Simulation experiments based on the MS COCO human body keypoints detection data set show that, compared with the Simple Baseline model, our model is similar in parameters and GFLOPs (giga floating-point operations per second), but the performance is better on the detection of accuracy under the multi-person scenes.

Download Full-text

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Wireless Communications and Mobile Computing ◽

10.1155/2021/1333250 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Shili Niu ◽

Weihua Ou ◽

Shihua Feng ◽

Jianping Gou ◽

Fei Long ◽

...

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Computational Cost ◽

Estimation Accuracy ◽

Human Pose Estimation ◽

Model Parameters ◽

Resource Limited ◽

Benchmark Datasets ◽

Human Pose ◽

Low Computational Cost

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

Download Full-text

Lightweight Stacked Hourglass Network for Human Pose Estimation

Applied Sciences ◽

10.3390/app10186497 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6497

Author(s):

Seung-Taek Kim ◽

Hyo Jong Lee

Keyword(s):

Receptive Field ◽

Pose Estimation ◽

Network Architecture ◽

Human Pose Estimation ◽

Current State ◽

Reduction Methods ◽

Substantial Progress ◽

Human Pose ◽

Residual Block ◽

Point Detection

Human pose estimation is a problem that continues to be one of the greatest challenges in the field of computer vision. While the stacked structure of an hourglass network has enabled substantial progress in human pose estimation and key-point detection areas, it is largely used as a backbone network. However, it also requires a relatively large number of parameters and high computational capacity due to the characteristics of its stacked structure. Accordingly, the present work proposes a more lightweight version of the hourglass network, which also improves the human pose estimation performance. The new hourglass network architecture utilizes several additional skip connections, which improve performance with minimal modifications while still maintaining the number of parameters in the network. Additionally, the size of the convolutional receptive field has a decisive effect in learning to detect features of the full human body. Therefore, we propose a multidilated light residual block, which expands the convolutional receptive field while also reducing the computational load. The proposed residual block is also invariant in scale when using multiple dilations. The well-known MPII and LSP human pose datasets were used to evaluate the performance using the proposed method. A variety of experiments were conducted that confirm that our method is more efficient compared to current state-of-the-art hourglass weight-reduction methods.

Download Full-text

DSPNet: A low computational-cost network for human pose estimation

Neurocomputing ◽

10.1016/j.neucom.2020.11.003 ◽

2021 ◽

Vol 423 ◽

pp. 327-335

Author(s):

Fujin Zhong ◽

Mingyang Li ◽

Kun Zhang ◽

Jun Hu ◽

Li Liu

Keyword(s):

Pose Estimation ◽

Computational Cost ◽

Human Pose Estimation ◽

Human Pose ◽

Low Computational Cost

Download Full-text

Wearable Device for High-Speed Hand Pose Estimation with a Ultrasmall Camera

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2015.p0167 ◽

2015 ◽

Vol 27 (2) ◽

pp. 167-173 ◽

Cited By ~ 5

Author(s):

Motomasa Tomida ◽

◽

Kiyoshi Hoshino

Keyword(s):

Pose Estimation ◽

High Speed ◽

Computational Cost ◽

Wearable Device ◽

Estimation Accuracy ◽

Hand Pose Estimation ◽

Data Matching ◽

Pip Joint ◽

Image Characteristic ◽

Hand Pose

<div class=""abs_img""> <img src=""[disp_template_path]/JRM/abst-image/00270002/06.jpg"" width=""300"" /> Hand pose estimation with ultrasmall camera</div> Operating a robot intentionally by using various complex motions of the hands and fingers requires a system that accurately detects hand and finger motions at high speed. This study uses an ultrasmall camera and compact computer for development of a wearable device of hand pose estimation, also called a hand-capture device. The accurate estimations, however, require data matching with a large database. But a compact computer usually has only limited memory and low machine power. We avoided this problem by reducing frequently used image characteristics from 1,600 dimensions to 64 dimensions of characteristic quantities. This saved on memory and lowered computational cost while achieving high accuracy and speed. To enable an operator to wear the device comfortably, the camera was placed as close to the back of the hand as possible to enable hand pose estimation from hand images without fingertips. A prototype device with a compact computer used to evaluate performance indicated that the device achieved high-speed estimation. Estimation accuracy was 2.32°±14.61° at the PIP joint of the index finger and 3.06°±10.56° at the CM joint of the thumb – as accurate as obtained using previous methods. This indicated that dimensional compression of image-characteristic quantities is important for realizing a compact hand-capture device. </span>

Download Full-text

Improved Convolutional Pose Machines for Human Pose Estimation Using Image Sensor Data

Sensors ◽

10.3390/s19030718 ◽

2019 ◽

Vol 19 (3) ◽

pp. 718 ◽

Cited By ~ 6

Author(s):

Baohua Qiang ◽

Shihao Zhang ◽

Yongsong Zhan ◽

Wu Xie ◽

Tian Zhao

Keyword(s):

Pose Estimation ◽

Image Sensor ◽

Fine Tuning ◽

Sensor Data ◽

Convergence Time ◽

Estimation Accuracy ◽

Human Pose Estimation ◽

Training Time ◽

Human Pose ◽

Improved Model

In recent years, increasing human data comes from image sensors. In this paper, a novel approach combining convolutional pose machines (CPMs) with GoogLeNet is proposed for human pose estimation using image sensor data. The first stage of the CPMs directly generates a response map of each human skeleton’s key points from images, in which we introduce some layers from the GoogLeNet. On the one hand, the improved model uses deeper network layers and more complex network structures to enhance the ability of low level feature extraction. On the other hand, the improved model applies a fine-tuning strategy, which benefits the estimation accuracy. Moreover, we introduce the inception structure to greatly reduce parameters of the model, which reduces the convergence time significantly. Extensive experiments on several datasets show that the improved model outperforms most mainstream models in accuracy and training time. The prediction efficiency of the improved model is improved by 1.023 times compared with the CPMs. At the same time, the training time of the improved model is reduced 3.414 times. This paper presents a new idea for future research.

Download Full-text

EfficientPose: Scalable single-person pose estimation

Applied Intelligence ◽

10.1007/s10489-020-01918-7 ◽

2020 ◽

Author(s):

Daniel Groos ◽

Heri Ramampiaro ◽

Espen AF Ihlen

Keyword(s):

Pose Estimation ◽

Network Architecture ◽

State Of The Art ◽

Computational Cost ◽

Real Life ◽

Low Complexity ◽

Human Pose Estimation ◽

Efficient Detection ◽

Single Person ◽

Human Pose

Abstract Single-person human pose estimation facilitates markerless movement analysis in sports, as well as in clinical applications. Still, state-of-the-art models for human pose estimation generally do not meet the requirements of real-life applications. The proliferation of deep learning techniques has resulted in the development of many advanced approaches. However, with the progresses in the field, more complex and inefficient models have also been introduced, which have caused tremendous increases in computational demands. To cope with these complexity and inefficiency challenges, we propose a novel convolutional neural network architecture, called EfficientPose, which exploits recently proposed EfficientNets in order to deliver efficient and scalable single-person pose estimation. EfficientPose is a family of models harnessing an effective multi-scale feature extractor and computationally efficient detection blocks using mobile inverted bottleneck convolutions, while at the same time ensuring that the precision of the pose configurations is still improved. Due to its low complexity and efficiency, EfficientPose enables real-world applications on edge devices by limiting the memory footprint and computational cost. The results from our experiments, using the challenging MPII single-person benchmark, show that the proposed EfficientPose models substantially outperform the widely-used OpenPose model both in terms of accuracy and computational efficiency. In particular, our top-performing model achieves state-of-the-art accuracy on single-person MPII, with low-complexity ConvNets.

Download Full-text

EfficientPose: Efficient human pose estimation with neural architecture search

Computational Visual Media ◽

10.1007/s41095-021-0214-z ◽

2021 ◽

Author(s):

Wenqiang Zhang ◽

Jiemin Fang ◽

Xinggang Wang ◽

Wenyu Liu

Keyword(s):

Pose Estimation ◽

Spatial Information ◽

Computational Cost ◽

Multimedia Applications ◽

Human Pose Estimation ◽

Estimation Task ◽

Neural Architecture ◽

Great Performance ◽

Human Pose ◽

Large Model

AbstractHuman pose estimation from image and video is a key task in many multimedia applications. Previous methods achieve great performance but rarely take efficiency into consideration, which makes it difficult to implement the networks on lightweight devices. Nowadays, real-time multimedia applications call for more efficient models for better interaction. Moreover, most deep neural networks for pose estimation directly reuse networks designed for image classification as the backbone, which are not optimized for the pose estimation task. In this paper, we propose an efficient framework for human pose estimation with two parts, an efficient backbone and an efficient head. By implementing a differentiable neural architecture search method, we customize the backbone network design for pose estimation, and reduce computational cost with negligible accuracy degradation. For the efficient head, we slim the transposed convolutions and propose a spatial information correction module to promote the performance of the final prediction. In experiments, we evaluate our networks on the MPII and COCO datasets. Our smallest model requires only 0.65 GFLOPs with 88.1% [email protected] on MPII and our large model needs only 2 GFLOPs while its accuracy is competitive with the state-of-the-art large model, HRNet, which takes 9.5 GFLOPs.

Download Full-text

Deep Full-Body HPE for Activity Recognition from RGB Frames Only

Informatics ◽

10.3390/informatics8010002 ◽

2021 ◽

Vol 8 (1) ◽

pp. 2

Author(s):

Sameh Neili Boualia ◽

Najoua Essoukri Ben Amara

Keyword(s):

Computer Vision ◽

Activity Recognition ◽

Pose Estimation ◽

Human Robot Interaction ◽

Svm Classifier ◽

Estimation Model ◽

Rgb Images ◽

Human Pose ◽

Human Joints ◽

Full Body

Human Pose Estimation (HPE) is defined as the problem of human joints’ localization (also known as keypoints: elbows, wrists, etc.) in images or videos. It is also defined as the search for a specific pose in space of all articulated joints. HPE has recently received significant attention from the scientific community. The main reason behind this trend is that pose estimation is considered as a key step for many computer vision tasks. Although many approaches have reported promising results, this domain remains largely unsolved due to several challenges such as occlusions, small and barely visible joints, and variations in clothing and lighting. In the last few years, the power of deep neural networks has been demonstrated in a wide variety of computer vision problems and especially the HPE task. In this context, we present in this paper a Deep Full-Body-HPE (DFB-HPE) approach from RGB images only. Based on ConvNets, fifteen human joint positions are predicted and can be further exploited for a large range of applications such as gesture recognition, sports performance analysis, or human-robot interaction. To evaluate the proposed deep pose estimation model, we apply it to recognize the daily activities of a person in an unconstrained environment. Therefore, the extracted features, represented by deep estimated poses, are fed to an SVM classifier. To validate the proposed architecture, our approach is tested on two publicly available benchmarks for pose estimation and activity recognition, namely the J-HMDBand CAD-60datasets. The obtained results demonstrate the efficiency of the proposed method based on ConvNets and SVM and prove how deep pose estimation can improve the recognition accuracy. By means of comparison with state-of-the-art methods, we achieve the best HPE performance, as well as the best activity recognition precision on the CAD-60 dataset.

Download Full-text

DRPose3D: Depth Ranking in 3D Human Pose Estimation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/136 ◽

2018 ◽

Cited By ~ 11

Author(s):

Min Wang ◽

Xipeng Chen ◽

Wentao Liu ◽

Chen Qian ◽

Liang Lin ◽

...

Keyword(s):

Neural Network ◽

Pose Estimation ◽

Human Pose Estimation ◽

Geometric Feature ◽

Classification Problems ◽

Two Stage ◽

Human Pose ◽

Human Joints ◽

3D Information ◽

3D Human Pose Estimation

In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation. Instead of accurate 3D positions, the depth ranking can be identified by human intuitively and learned using the deep neural network more easily by solving classification problems. Moreover, depth ranking contains rich 3D information. It prevents the 2D-to-3D pose regression in two-stage methods from being ill-posed. In our method, firstly, we design a Pairwise Ranking Convolutional Neural Network (PRCNN) to extract depth rankings of human joints from images. Secondly, a coarse-to-fine 3D Pose Network(DPNet) is proposed to estimate 3D poses from both depth rankings and 2D human joint locations. Additionally, to improve the generality of our model, we introduce a statistical method to augment depth rankings. Our approach outperforms the state-of-the-art methods in the Human3.6M benchmark for all three testing protocols, indicating that depth ranking is an essential geometric feature which can be learned to improve the 3D pose estimation.

Download Full-text

Self-Attention Network for Human Pose Estimation

Applied Sciences ◽

10.3390/app11041826 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1826

Author(s):

Hailun Xia ◽

Tianyang Zhang

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Attention Network ◽

Learning Framework ◽

Benchmark Datasets ◽

Rgb Images ◽

Human Pose ◽

Human Joints ◽

Symmetric Relations ◽

2D And 3D

Estimating the positions of human joints from monocular single RGB images has been a challenging task in recent years. Despite great progress in human pose estimation with convolutional neural networks (CNNs), a central problem still exists: the relationships and constraints, such as symmetric relations of human structures, are not well exploited in previous CNN-based methods. Considering the effectiveness of combining local and nonlocal consistencies, we propose an end-to-end self-attention network (SAN) to alleviate this issue. In SANs, attention-driven and long-range dependency modeling are adopted between joints to compensate for local content and mine details from all feature locations. To enable an SAN for both 2D and 3D pose estimations, we also design a compatible, effective and general joint learning framework to mix up the usage of different dimension data. We evaluate the proposed network on challenging benchmark datasets. The experimental results show that our method has significantly achieved competitive results on Human3.6M, MPII and COCO datasets.

Download Full-text