scholarly journals Deep Full-Body HPE for Activity Recognition from RGB Frames Only

Informatics ◽  
2021 ◽  
Vol 8 (1) ◽  
pp. 2
Author(s):  
Sameh Neili Boualia ◽  
Najoua Essoukri Ben Amara

Human Pose Estimation (HPE) is defined as the problem of human joints’ localization (also known as keypoints: elbows, wrists, etc.) in images or videos. It is also defined as the search for a specific pose in space of all articulated joints. HPE has recently received significant attention from the scientific community. The main reason behind this trend is that pose estimation is considered as a key step for many computer vision tasks. Although many approaches have reported promising results, this domain remains largely unsolved due to several challenges such as occlusions, small and barely visible joints, and variations in clothing and lighting. In the last few years, the power of deep neural networks has been demonstrated in a wide variety of computer vision problems and especially the HPE task. In this context, we present in this paper a Deep Full-Body-HPE (DFB-HPE) approach from RGB images only. Based on ConvNets, fifteen human joint positions are predicted and can be further exploited for a large range of applications such as gesture recognition, sports performance analysis, or human-robot interaction. To evaluate the proposed deep pose estimation model, we apply it to recognize the daily activities of a person in an unconstrained environment. Therefore, the extracted features, represented by deep estimated poses, are fed to an SVM classifier. To validate the proposed architecture, our approach is tested on two publicly available benchmarks for pose estimation and activity recognition, namely the J-HMDBand CAD-60datasets. The obtained results demonstrate the efficiency of the proposed method based on ConvNets and SVM and prove how deep pose estimation can improve the recognition accuracy. By means of comparison with state-of-the-art methods, we achieve the best HPE performance, as well as the best activity recognition precision on the CAD-60 dataset.

2021 ◽  
Vol 11 (4) ◽  
pp. 1826
Author(s):  
Hailun Xia ◽  
Tianyang Zhang

Estimating the positions of human joints from monocular single RGB images has been a challenging task in recent years. Despite great progress in human pose estimation with convolutional neural networks (CNNs), a central problem still exists: the relationships and constraints, such as symmetric relations of human structures, are not well exploited in previous CNN-based methods. Considering the effectiveness of combining local and nonlocal consistencies, we propose an end-to-end self-attention network (SAN) to alleviate this issue. In SANs, attention-driven and long-range dependency modeling are adopted between joints to compensate for local content and mine details from all feature locations. To enable an SAN for both 2D and 3D pose estimations, we also design a compatible, effective and general joint learning framework to mix up the usage of different dimension data. We evaluate the proposed network on challenging benchmark datasets. The experimental results show that our method has significantly achieved competitive results on Human3.6M, MPII and COCO datasets.


2021 ◽  
Vol 10 ◽  
pp. 117957272110223
Author(s):  
Thomas Hellsten ◽  
Jonny Karlsson ◽  
Muhammed Shamsuzzaman ◽  
Göran Pulkkis

Background: Several factors, including the aging population and the recent corona pandemic, have increased the need for cost effective, easy-to-use and reliable telerehabilitation services. Computer vision-based marker-less human pose estimation is a promising variant of telerehabilitation and is currently an intensive research topic. It has attracted significant interest for detailed motion analysis, as it does not need arrangement of external fiducials while capturing motion data from images. This is promising for rehabilitation applications, as they enable analysis and supervision of clients’ exercises and reduce clients’ need for visiting physiotherapists in person. However, development of a marker-less motion analysis system with precise accuracy for joint identification, joint angle measurements and advanced motion analysis is an open challenge. Objectives: The main objective of this paper is to provide a critical overview of recent computer vision-based marker-less human pose estimation systems and their applicability for rehabilitation application. An overview of some existing marker-less rehabilitation applications is also provided. Methods: This paper presents a critical review of recent computer vision-based marker-less human pose estimation systems with focus on their provided joint localization accuracy in comparison to physiotherapy requirements and ease of use. The accuracy, in terms of the capability to measure the knee angle, is analysed using simulation. Results: Current pose estimation systems use 2D, 3D, multiple and single view-based techniques. The most promising techniques from a physiotherapy point of view are 3D marker-less pose estimation based on a single view as these can perform advanced motion analysis of the human body while only requiring a single camera and a computing device. Preliminary simulations reveal that some proposed systems already provide a sufficient accuracy for 2D joint angle estimations. Conclusions: Even though test results of different applications for some proposed techniques are promising, more rigour testing is required for validating their accuracy before they can be widely adopted in advanced rehabilitation applications.


Sensors ◽  
2015 ◽  
Vol 15 (6) ◽  
pp. 12410-12427 ◽  
Author(s):  
Hanguen Kim ◽  
Sangwon Lee ◽  
Dongsung Lee ◽  
Soonmin Choi ◽  
Jinsun Ju ◽  
...  

Author(s):  
Zhihui Yang ◽  
Xiangyu Tang ◽  
Lijuan Zhang ◽  
Zhiling Yang

Human pose estimate can be used in action recognition, video surveillance and other fields, which has received a lot of attentions. Since the flexibility of human joints and environmental factors greatly influence pose estimation accuracy, related research is confronted with many challenges. In this paper, we incorporate the pyramid convolution and attention mechanism into the residual block, and introduce a hybrid structure model which synthetically applies the local and global information of the image for the analysis of keypoints detection. In addition, our improved structure model adopts grouped convolution, and the attention module used is lightweight, which will reduce the computational cost of the network. Simulation experiments based on the MS COCO human body keypoints detection data set show that, compared with the Simple Baseline model, our model is similar in parameters and GFLOPs (giga floating-point operations per second), but the performance is better on the detection of accuracy under the multi-person scenes.


2018 ◽  
Author(s):  
◽  
Guanghan Ning

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] The task of human pose estimation in natural scenes is to determine the precise pixel locations of body keypoints. It is very important for many high-level computer vision tasks, including action and activity recognition, human-computer interaction, motion capture, and animation. We cover two different approaches for this task: top-down approach and bottom-up approach. In the top-down approach, we propose a human tracking method called ROLO that localizes each person. We then propose a state-of-the-art single-person human pose estimator that predicts the body keypoints of each individual. In the bottomup approach, we propose an efficient multi-person pose estimator with which we participated in a PoseTrack challenge [11]. On top of these, we propose to employ adversarial training to further boost the performance of single-person human pose estimator while generating synthetic images. We also propose a novel PoSeg network that jointly estimates the multi-person human poses and semantically segment the portraits of these persons at pixel-level. Lastly, we extend some of the proposed methods on human pose estimation and portrait segmentation to the task of human parsing, a more finegrained computer vision perception of humans.


Author(s):  
José Gomes da Silva Neto ◽  
João Marcelo Xavier Natário Teixeira ◽  
Veronica Teichrieb

This work represents the first phase of a more complete work that has the goal of using RGB images as information to make analyses of human behavior. In this phase, we developed a prototype of hardware/software, capable of estimating human pose using only RGB information. The equipment chosen was the NVIDIA Jetson Nano, known for having have a better computational performance compared to Raspberry pi and Arduino microcontoller alternatives. In the search for important algorithms for pose estimation, applied to the limited platform as the Jetson Nano, we found important works such as HyperPose, TensorRT Pose Estimation, and the used on the project, tf-poseestimation. The results show a low FPS performance of the Jetson Nano, using the chosen algorithm, compared to related hardware, such as the NVIDIA Jetson TX2 and NVIDIA Jetson Xavier.


Sign in / Sign up

Export Citation Format

Share Document