Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

We rethink a well-known bottom-up approach for multi-person pose estimation and propose an improved one. The improved approach surpasses the baseline significantly thanks to (1) an intuitional yet more sensible representation, which we refer to as body parts to encode the connection information between keypoints, (2) an improved stacked hourglass network with attention mechanisms, (3) a novel focal L2 loss which is dedicated to “hard” keypoint and keypoint association (body part) mining, and (4) a robust greedy keypoint assignment algorithm for grouping the detected keypoints into individual poses. Our approach not only works straightforwardly but also outperforms the baseline by about 15% in average precision and is comparable to the state of the art on the MS-COCO test-dev dataset. The code and pre-trained models are publicly available on our project page1.

Download Full-text

A New Multi-Person Pose Estimation Method Using the Partitioned CenterPose Network

Applied Sciences ◽

10.3390/app11094241 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4241

Author(s):

Jiahua Wu ◽

Hyo Jong Lee

Keyword(s):

Pose Estimation ◽

Human Body ◽

State Of The Art ◽

Estimation Method ◽

Bottom Up ◽

Center Point ◽

Novel Approach ◽

Body Joints

In bottom-up multi-person pose estimation, grouping joint candidates into the appropriately structured corresponding instance of a person is challenging. In this paper, a new bottom-up method, the Partitioned CenterPose (PCP) Network, is proposed to better cluster the detected joints. To achieve this goal, we propose a novel approach called Partition Pose Representation (PPR) which integrates the instance of a person and its body joints based on joint offset. PPR leverages information about the center of the human body and the offsets between that center point and the positions of the body’s joints to encode human poses accurately. To enhance the relationships between body joints, we divide the human body into five parts, and then, we generate a sub-PPR for each part. Based on this PPR, the PCP Network can detect people and their body joints simultaneously, then group all body joints according to joint offset. Moreover, an improved l1 loss is designed to more accurately measure joint offset. Using the COCO keypoints and CrowdPose datasets for testing, it was found that the performance of the proposed method is on par with that of existing state-of-the-art bottom-up methods in terms of accuracy and speed.

Download Full-text

Multi-agent Attentional Activity Recognition

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/186 ◽

2019 ◽

Cited By ~ 3

Author(s):

Kaixuan Chen ◽

Lina Yao ◽

Dalin Zhang ◽

Bin Guo ◽

Zhiwen Yu

Keyword(s):

Activity Recognition ◽

State Of The Art ◽

Body Part ◽

Body Parts ◽

Temporal Attention ◽

Attention Model ◽

Proposed Model ◽

Collective Motions ◽

Multi Agent ◽

Real World Datasets

Multi-modality is an important feature of sensor based activity recognition. In this work, we consider two inherent characteristics of human activities, the spatially-temporally varying salience of features and the relations between activities and corresponding body part motions. Based on these, we propose a multi-agent spatial-temporal attention model. The spatial-temporal attention mechanism helps intelligently select informative modalities and their active periods. And the multiple agents in the proposed model represent activities with collective motions across body parts by independently selecting modalities associated with single motions. With a joint recognition goal, the agents share gained information and coordinate their selection policies to learn the optimal recognition model. The experimental results on four real-world datasets demonstrate that the proposed model outperforms the state-of-the-art methods.

Download Full-text

DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6867 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11924-11931

Author(s):

Zhongwei Qiu ◽

Kai Qiu ◽

Jianlong Fu ◽

Dongmei Fu

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Semantic Relations ◽

Dynamic Graphs ◽

Dynamic Graph ◽

Convolutional Network ◽

Bottom Up ◽

Multi Level ◽

Human Pose ◽

Relative Gains

Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.

Download Full-text

Towards Explanatory Interactive Image Captioning Using Top-Down and Bottom-Up Features, Beam Search and Re-ranking

KI - Künstliche Intelligenz ◽

10.1007/s13218-020-00679-2 ◽

2020 ◽

Vol 34 (4) ◽

pp. 571-584

Author(s):

Rajarshi Biswas ◽

Michael Barz ◽

Daniel Sonntag

Keyword(s):

State Of The Art ◽

Input Image ◽

The State ◽

Beam Search ◽

Image Captioning ◽

Bottom Up ◽

Interactive Machine Learning ◽

Joint Embedding ◽

Bounding Boxes ◽

High Level

AbstractImage captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.

Download Full-text

Recent Advances in 3D Human Pose Estimation: From Optimization to Implementation and Beyond

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001422550035 ◽

2022 ◽

Author(s):

Jielu Yan ◽

MingLiang Zhou ◽

Jinli Pan ◽

Meng Yin ◽

Bin Fang

Keyword(s):

Real Time ◽

Pose Estimation ◽

Recent Progress ◽

State Of The Art ◽

The State ◽

Human Pose Estimation ◽

Estimation Techniques ◽

Comprehensive Survey ◽

Human Pose ◽

3D Human Pose Estimation

3D human pose estimation describes estimating 3D articulation structure of a person from an image or a video. The technology has massive potential because it can enable tracking people and analyzing motion in real time. Recently, much research has been conducted to optimize human pose estimation, but few works have focused on reviewing 3D human pose estimation. In this paper, we offer a comprehensive survey of the state-of-the-art methods for 3D human pose estimation, referred to as pose estimation solutions, implementations on images or videos that contain different numbers of people and advanced 3D human pose estimation techniques. Furthermore, different kinds of algorithms are further subdivided into sub-categories and compared in light of different methodologies. To the best of our knowledge, this is the first such comprehensive survey of the recent progress of 3D human pose estimation and will hopefully facilitate the completion, refinement and applications of 3D human pose estimation.

Download Full-text

A Novel Multiple Person Pose Estimation Optimization Model Utilizing Genetic Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/2129/1/012027 ◽

2021 ◽

Vol 2129 (1) ◽

pp. 012027

Author(s):

Qing Zhang ◽

Lei Ding ◽

Kai Qing Zhou ◽

Jian Feng Li

Keyword(s):

Genetic Algorithm ◽

Pose Estimation ◽

Human Body ◽

Optimization Model ◽

Body Part ◽

Body Parts ◽

Position Information ◽

Assembly Method ◽

Optimal Value ◽

Human Pose

Abstract For traditional human pose estimation models rely on a large amount of human body feature information, this paper proposes an optimization model using genetic algorithm to solve the problem of multiple person body part assembly. Different from other human body parts assembly method. The method proposed in this paper depends on the joints position information, namely the sum of the connection distances between the joints as the objective function, and finds the optimal value to obtain the best human pose assembly information. The simulation results show that compared with the traditional OpenPose model, the model proposed in this paper can obtain the same human skeleton using less position information.

Download Full-text

Pose Estimation of Swimming Fish Using NACA Airfoil Model for Collective Behavior Analysis

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2021.p0547 ◽

2021 ◽

Vol 33 (3) ◽

pp. 547-555

Author(s):

Hitoshi Habe ◽

Yoshiki Takeuchi ◽

Kei Terayama ◽

Masa-aki Sakagami ◽

◽

...

Keyword(s):

Behavior Analysis ◽

Pose Estimation ◽

Collective Behavior ◽

Estimation Method ◽

The State ◽

Video Data ◽

Body Parts ◽

Fish Schools ◽

Dynamic Variations ◽

Rapid Changes

We propose a pose estimation method using a National Advisory Committee for Aeronautics (NACA) airfoil model for fish schools. This method allows one to understand the state in which fish are swimming based on their posture and dynamic variations. Moreover, their collective behavior can be understood based on their posture changes. Therefore, fish pose is a crucial indicator for collective behavior analysis. We use the NACA model to represent the fish posture; this enables more accurate tracking and movement prediction owing to the capability of the model in describing posture dynamics. To fit the model to video data, we first adopt the DeepLabCut toolbox to detect body parts (i.e., head, center, and tail fin) in an image sequence. Subsequently, we apply a particle filter to fit a set of parameters from the NACA model. The results from DeepLabCut, i.e., three points on a fish body, are used to adjust the components of the state vector. This enables more reliable estimation results to be obtained when the speed and direction of the fish change abruptly. Experimental results using both simulation data and real video data demonstrate that the proposed method provides good results, including when rapid changes occur in the swimming direction.

Download Full-text

Action Recognition from Pose Signature in Static Image

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001416550107 ◽

2016 ◽

Vol 30 (03) ◽

pp. 1655010 ◽

Cited By ~ 3

Author(s):

Yinzhong Qian ◽

Wenbin Chen ◽

I-fan Shen

Keyword(s):

Performance Improvement ◽

Action Recognition ◽

State Of The Art ◽

Body Part ◽

The Body ◽

Whole Body ◽

Body Parts ◽

Static Image ◽

Image Dataset ◽

Part Structure

This paper addresses the problem of action recognition from body pose. Detecting body pose in static image faces great challenges because of pose variability. Our method is based on action-specific hierarchical poselet. We use hierarchical body parts each of which is represented by a set of poselets to demonstrate the pose variability of the body part. Pose signature of a body part is represented by a vector of detection responses of all poselets for the part. In order to suppress detection error and ambiguity we explore to use part-based model (PBM) as detection context. We propose a constrained optimization algorithm for detecting all poselets of each part in context of PBM, which recover neglected pose clue by global optimization. We use a PBM with hierarchical part structure, where body parts have varying granularity from whole body steadily decreasing to limb parts. From the structure we get models with different depth to study saliency of different body parts in action recognition. Pose signature of an action image is composed of pose signature of all the body parts in the PBM, which provides rich discriminate information for our task. We evaluate our algorithm on two datasets. Compared with counterpart methods, pose signature has obvious performance improvement on static image dataset. While using the model trained from static image dataset to label detected action person on video dataset, pose signature achieves state-of-the-art performance.

Download Full-text

Vibration Confinement via Optimal Eigenvector Assignment and Piezoelectric Networks

Volume 6B: 18th Biennial Conference on Mechanical Vibration and Noise ◽

10.1115/detc2001/vib-21474 ◽

2001 ◽

Author(s):

J. Tang ◽

K. W. Wang

Keyword(s):

Design Space ◽

State Of The Art ◽

The State ◽

Regions Of Interest ◽

Control Input ◽

Eigenstructure Assignment ◽

Assignment Algorithm ◽

Underlying Principle ◽

Rayleigh Principle ◽

Selection Of

Abstract The underlying principle for vibration confinement is to alter the structural modes so that the corresponding modal components have much smaller amplitude in concerned areas than the remaining part of the structure. In this research, the state-of-the-art in vibration confinement technique is advanced in two correlated ways. First, a new eigenstructure assignment algorithm is developed to more directly suppress vibration in regions of interest. This algorithm is featured by the optimal selection of achievable eigenvectors that minimizes the eigenvector components at concerned areas by using the Rayleigh Principle. Second, the active control input is applied through an active-passive hybrid piezoelectric network. With the introduction of circuitry elements, which is much easier to implement than changing or adding mechanical components, the state matrices can be reformed and the design space in eigenstructure assignment can be greatly enlarged. The merit of the proposed system and scheme is demonstrated and analyzed using a numerical example.

Download Full-text

Estimating 6D Aircraft Pose from Keypoints and Structures

Remote Sensing ◽

10.3390/rs13040663 ◽

2021 ◽

Vol 13 (4) ◽

pp. 663

Author(s):

Runze Fan ◽

Ting-Bing Xu ◽

Zhenzhong Wei

Keyword(s):

Open Access ◽

Pose Estimation ◽

State Of The Art ◽

The State ◽

Estimation Methods ◽

Geometric Information ◽

Comparative Performance ◽

Topological Relationship ◽

Novel Approach ◽

Rgb Image

This article addresses the challenge of 6D aircraft pose estimation from a single RGB image during the flight. Many recent works have shown that keypoints-based approaches, which first detect keypoints and then estimate the 6D pose, achieve remarkable performance. However, it is hard to locate the keypoints precisely in complex weather scenes. In this article, we propose a novel approach, called Pose Estimation with Keypoints and Structures (PEKS), which leverages multiple intermediate representations to estimate the 6D pose. Unlike previous works, our approach simultaneously locates keypoints and structures to recover the pose parameter of aircraft through a Perspective-n-Point Structure (PnPS) algorithm. These representations integrate the local geometric information of the object and the topological relationship between components of the target, which effectively improve the accuracy and robustness of 6D pose estimation. In addition, we contribute a dataset for aircraft pose estimation which consists of 3681 real images and 216,000 rendered images. Extensive experiments on our own aircraft pose dataset and multiple open-access pose datasets (e.g., ObjectNet3D, LineMOD) demonstrate that our proposed method can accurately estimate 6D aircraft pose in various complex weather scenes while achieving the comparative performance with the state-of-the-art pose estimation methods.

Download Full-text