Deep Full-Body HPE for Activity Recognition from RGB Frames Only

Human Pose Estimation (HPE) is defined as the problem of human joints’ localization (also known as keypoints: elbows, wrists, etc.) in images or videos. It is also defined as the search for a specific pose in space of all articulated joints. HPE has recently received significant attention from the scientific community. The main reason behind this trend is that pose estimation is considered as a key step for many computer vision tasks. Although many approaches have reported promising results, this domain remains largely unsolved due to several challenges such as occlusions, small and barely visible joints, and variations in clothing and lighting. In the last few years, the power of deep neural networks has been demonstrated in a wide variety of computer vision problems and especially the HPE task. In this context, we present in this paper a Deep Full-Body-HPE (DFB-HPE) approach from RGB images only. Based on ConvNets, fifteen human joint positions are predicted and can be further exploited for a large range of applications such as gesture recognition, sports performance analysis, or human-robot interaction. To evaluate the proposed deep pose estimation model, we apply it to recognize the daily activities of a person in an unconstrained environment. Therefore, the extracted features, represented by deep estimated poses, are fed to an SVM classifier. To validate the proposed architecture, our approach is tested on two publicly available benchmarks for pose estimation and activity recognition, namely the J-HMDBand CAD-60datasets. The obtained results demonstrate the efficiency of the proposed method based on ConvNets and SVM and prove how deep pose estimation can improve the recognition accuracy. By means of comparison with state-of-the-art methods, we achieve the best HPE performance, as well as the best activity recognition precision on the CAD-60 dataset.

Download Full-text

Self-Attention Network for Human Pose Estimation

Applied Sciences ◽

10.3390/app11041826 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1826

Author(s):

Hailun Xia ◽

Tianyang Zhang

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Attention Network ◽

Learning Framework ◽

Benchmark Datasets ◽

Rgb Images ◽

Human Pose ◽

Human Joints ◽

Symmetric Relations ◽

2D And 3D

Estimating the positions of human joints from monocular single RGB images has been a challenging task in recent years. Despite great progress in human pose estimation with convolutional neural networks (CNNs), a central problem still exists: the relationships and constraints, such as symmetric relations of human structures, are not well exploited in previous CNN-based methods. Considering the effectiveness of combining local and nonlocal consistencies, we propose an end-to-end self-attention network (SAN) to alleviate this issue. In SANs, attention-driven and long-range dependency modeling are adopted between joints to compensate for local content and mine details from all feature locations. To enable an SAN for both 2D and 3D pose estimations, we also design a compatible, effective and general joint learning framework to mix up the usage of different dimension data. We evaluate the proposed network on challenging benchmark datasets. The experimental results show that our method has significantly achieved competitive results on Human3.6M, MPII and COCO datasets.

Download Full-text

The Potential of Computer Vision-Based Marker-Less Human Motion Analysis for Rehabilitation

Rehabilitation Process and Outcome ◽

10.1177/11795727211022330 ◽

2021 ◽

Vol 10 ◽

pp. 117957272110223

Author(s):

Thomas Hellsten ◽

Jonny Karlsson ◽

Muhammed Shamsuzzaman ◽

Göran Pulkkis

Keyword(s):

Computer Vision ◽

Motion Analysis ◽

Pose Estimation ◽

Joint Angle ◽

Human Motion ◽

Human Pose Estimation ◽

Human Motion Analysis ◽

Single View ◽

Recent Computer ◽

Human Pose

Background: Several factors, including the aging population and the recent corona pandemic, have increased the need for cost effective, easy-to-use and reliable telerehabilitation services. Computer vision-based marker-less human pose estimation is a promising variant of telerehabilitation and is currently an intensive research topic. It has attracted significant interest for detailed motion analysis, as it does not need arrangement of external fiducials while capturing motion data from images. This is promising for rehabilitation applications, as they enable analysis and supervision of clients’ exercises and reduce clients’ need for visiting physiotherapists in person. However, development of a marker-less motion analysis system with precise accuracy for joint identification, joint angle measurements and advanced motion analysis is an open challenge. Objectives: The main objective of this paper is to provide a critical overview of recent computer vision-based marker-less human pose estimation systems and their applicability for rehabilitation application. An overview of some existing marker-less rehabilitation applications is also provided. Methods: This paper presents a critical review of recent computer vision-based marker-less human pose estimation systems with focus on their provided joint localization accuracy in comparison to physiotherapy requirements and ease of use. The accuracy, in terms of the capability to measure the knee angle, is analysed using simulation. Results: Current pose estimation systems use 2D, 3D, multiple and single view-based techniques. The most promising techniques from a physiotherapy point of view are 3D marker-less pose estimation based on a single view as these can perform advanced motion analysis of the human body while only requiring a single camera and a computing device. Preliminary simulations reveal that some proposed systems already provide a sufficient accuracy for 2D joint angle estimations. Conclusions: Even though test results of different applications for some proposed techniques are promising, more rigour testing is required for validating their accuracy before they can be widely adopted in advanced rehabilitation applications.

Download Full-text

Robust 3D Human Pose Estimation Model Based on Temporal Convolution

2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP) ◽

10.1109/icsip49896.2020.9339257 ◽

2020 ◽

Author(s):

Zunshi Liu ◽

Huabin Wang ◽

Xuesheng He ◽

Liang Tao

Keyword(s):

Pose Estimation ◽

Human Pose Estimation ◽

Estimation Model ◽

Model Based ◽

Human Pose ◽

3D Human Pose Estimation

Download Full-text

Real-Time Human Pose Estimation and Gesture Recognition from Depth Images Using Superpixels and SVM Classifier

Sensors ◽

10.3390/s150612410 ◽

2015 ◽

Vol 15 (6) ◽

pp. 12410-12427 ◽

Cited By ~ 19

Author(s):

Hanguen Kim ◽

Sangwon Lee ◽

Dongsung Lee ◽

Soonmin Choi ◽

Jinsun Ju ◽

...

Keyword(s):

Real Time ◽

Gesture Recognition ◽

Pose Estimation ◽

Human Pose Estimation ◽

Svm Classifier ◽

Depth Images ◽

Human Pose

Download Full-text

A combined local and global structure module for human pose estimation

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-215210 ◽

2021 ◽

pp. 1-11

Author(s):

Zhihui Yang ◽

Xiangyu Tang ◽

Lijuan Zhang ◽

Zhiling Yang

Keyword(s):

Pose Estimation ◽

Computational Cost ◽

Hybrid Structure ◽

Estimation Accuracy ◽

Structure Model ◽

Data Set ◽

Human Pose ◽

Human Joints ◽

Residual Block ◽

Keypoints Detection

Human pose estimate can be used in action recognition, video surveillance and other fields, which has received a lot of attentions. Since the flexibility of human joints and environmental factors greatly influence pose estimation accuracy, related research is confronted with many challenges. In this paper, we incorporate the pyramid convolution and attention mechanism into the residual block, and introduce a hybrid structure model which synthetically applies the local and global information of the image for the analysis of keypoints detection. In addition, our improved structure model adopts grouped convolution, and the attention module used is lightweight, which will reduce the computational cost of the network. Simulation experiments based on the MS COCO human body keypoints detection data set show that, compared with the Simple Baseline model, our model is similar in parameters and GFLOPs (giga floating-point operations per second), but the performance is better on the detection of accuracy under the multi-person scenes.

Download Full-text

Human Pose Estimation and Activity Recognition From Multi-View Videos: Comparative Explorations of Recent Developments

IEEE Journal of Selected Topics in Signal Processing ◽

10.1109/jstsp.2012.2196975 ◽

2012 ◽

Vol 6 (5) ◽

pp. 538-552 ◽

Cited By ~ 79

Author(s):

Michael B. Holte ◽

Cuong Tran ◽

Mohan M. Trivedi ◽

Thomas B. Moeslund

Keyword(s):

Activity Recognition ◽

Pose Estimation ◽

Human Pose Estimation ◽

Recent Developments ◽

Human Pose

Download Full-text

Two-Stream Framework for Activity Recognition with 2D Human Pose Estimation

Lecture Notes in Computer Science - Image Analysis and Recognition ◽

10.1007/978-3-030-50347-5_18 ◽

2020 ◽

pp. 196-208

Author(s):

Wei Chang ◽

Chunyang Ye ◽

Hui Zhou

Keyword(s):

Activity Recognition ◽

Pose Estimation ◽

Human Pose Estimation ◽

Human Pose

Download Full-text

Driver Activity Recognition Using Deep Learning and Human Pose Estimation

10.1109/inista52262.2021.9548625 ◽

2021 ◽

Author(s):

Mert Cetinkaya ◽

Tankut Acarman

Keyword(s):

Deep Learning ◽

Activity Recognition ◽

Pose Estimation ◽

Human Pose Estimation ◽

Human Pose

Download Full-text

Learning human poses in natural scenes

10.32469/10355/66196 ◽

2018 ◽

Author(s):

◽

Guanghan Ning

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

The Body ◽

Human Pose Estimation ◽

Natural Scenes ◽

Top Down ◽

University Of Missouri ◽

Single Person ◽

Human Pose ◽

High Level

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] The task of human pose estimation in natural scenes is to determine the precise pixel locations of body keypoints. It is very important for many high-level computer vision tasks, including action and activity recognition, human-computer interaction, motion capture, and animation. We cover two different approaches for this task: top-down approach and bottom-up approach. In the top-down approach, we propose a human tracking method called ROLO that localizes each person. We then propose a state-of-the-art single-person human pose estimator that predicts the body keypoints of each individual. In the bottomup approach, we propose an efficient multi-person pose estimator with which we participated in a PoseTrack challenge [11]. On top of these, we propose to employ adversarial training to further boost the performance of single-person human pose estimator while generating synthetic images. We also propose a novel PoSeg network that jointly estimates the multi-person human poses and semantically segment the portraits of these persons at pixel-level. Lastly, we extend some of the proposed methods on human pose estimation and portrait segmentation to the task of human parsing, a more finegrained computer vision perception of humans.

Download Full-text

Analyzing embedded pose estimation solutions for human behaviour understanding

10.5753/svr_estendido.2020.12951 ◽

2020 ◽

Author(s):

José Gomes da Silva Neto ◽

João Marcelo Xavier Natário Teixeira ◽

Veronica Teichrieb

Keyword(s):

Human Behavior ◽

Pose Estimation ◽

Human Behaviour ◽

Raspberry Pi ◽

Computational Performance ◽

Complete Work ◽

Rgb Images ◽

Human Pose

This work represents the first phase of a more complete work that has the goal of using RGB images as information to make analyses of human behavior. In this phase, we developed a prototype of hardware/software, capable of estimating human pose using only RGB information. The equipment chosen was the NVIDIA Jetson Nano, known for having have a better computational performance compared to Raspberry pi and Arduino microcontoller alternatives. In the search for important algorithms for pose estimation, applied to the limited platform as the Jetson Nano, we found important works such as HyperPose, TensorRT Pose Estimation, and the used on the project, tf-poseestimation. The results show a low FPS performance of the Jetson Nano, using the chosen algorithm, compared to related hardware, such as the NVIDIA Jetson TX2 and NVIDIA Jetson Xavier.

Download Full-text