scholarly journals Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

2020 ◽  
Vol 34 (07) ◽  
pp. 11354-11361
Author(s):  
Jia Li ◽  
Wen Su ◽  
Zengfu Wang

We rethink a well-known bottom-up approach for multi-person pose estimation and propose an improved one. The improved approach surpasses the baseline significantly thanks to (1) an intuitional yet more sensible representation, which we refer to as body parts to encode the connection information between keypoints, (2) an improved stacked hourglass network with attention mechanisms, (3) a novel focal L2 loss which is dedicated to “hard” keypoint and keypoint association (body part) mining, and (4) a robust greedy keypoint assignment algorithm for grouping the detected keypoints into individual poses. Our approach not only works straightforwardly but also outperforms the baseline by about 15% in average precision and is comparable to the state of the art on the MS-COCO test-dev dataset. The code and pre-trained models are publicly available on our project page1.

2021 ◽  
Vol 11 (9) ◽  
pp. 4241
Author(s):  
Jiahua Wu ◽  
Hyo Jong Lee

In bottom-up multi-person pose estimation, grouping joint candidates into the appropriately structured corresponding instance of a person is challenging. In this paper, a new bottom-up method, the Partitioned CenterPose (PCP) Network, is proposed to better cluster the detected joints. To achieve this goal, we propose a novel approach called Partition Pose Representation (PPR) which integrates the instance of a person and its body joints based on joint offset. PPR leverages information about the center of the human body and the offsets between that center point and the positions of the body’s joints to encode human poses accurately. To enhance the relationships between body joints, we divide the human body into five parts, and then, we generate a sub-PPR for each part. Based on this PPR, the PCP Network can detect people and their body joints simultaneously, then group all body joints according to joint offset. Moreover, an improved l1 loss is designed to more accurately measure joint offset. Using the COCO keypoints and CrowdPose datasets for testing, it was found that the performance of the proposed method is on par with that of existing state-of-the-art bottom-up methods in terms of accuracy and speed.


Author(s):  
Kaixuan Chen ◽  
Lina Yao ◽  
Dalin Zhang ◽  
Bin Guo ◽  
Zhiwen Yu

Multi-modality is an important feature of sensor based activity recognition. In this work, we consider two inherent characteristics of human activities, the spatially-temporally varying salience of features and the relations between activities and corresponding body part motions. Based on these, we propose a multi-agent spatial-temporal attention model. The spatial-temporal attention mechanism helps intelligently select informative modalities and their active periods. And the multiple agents in the proposed model represent activities with collective motions across body parts by independently selecting modalities associated with single motions. With a joint recognition goal, the agents share gained information and coordinate their selection policies to learn the optimal recognition model. The experimental results on four real-world datasets demonstrate that the proposed model outperforms the state-of-the-art methods.


2020 ◽  
Vol 34 (07) ◽  
pp. 11924-11931
Author(s):  
Zhongwei Qiu ◽  
Kai Qiu ◽  
Jianlong Fu ◽  
Dongmei Fu

Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.


2020 ◽  
Vol 34 (4) ◽  
pp. 571-584
Author(s):  
Rajarshi Biswas ◽  
Michael Barz ◽  
Daniel Sonntag

AbstractImage captioning is a challenging multimodal task. Significant improvements could be obtained by deep learning. Yet, captions generated by humans are still considered better, which makes it an interesting application for interactive machine learning and explainable artificial intelligence methods. In this work, we aim at improving the performance and explainability of the state-of-the-art method Show, Attend and Tell by augmenting their attention mechanism using additional bottom-up features. We compute visual attention on the joint embedding space formed by the union of high-level features and the low-level features obtained from the object specific salient regions of the input image. We embed the content of bounding boxes from a pre-trained Mask R-CNN model. This delivers state-of-the-art performance, while it provides explanatory features. Further, we discuss how interactive model improvement can be realized through re-ranking caption candidates using beam search decoders and explanatory features. We show that interactive re-ranking of beam search candidates has the potential to outperform the state-of-the-art in image captioning.


Author(s):  
Jielu Yan ◽  
MingLiang Zhou ◽  
Jinli Pan ◽  
Meng Yin ◽  
Bin Fang

3D human pose estimation describes estimating 3D articulation structure of a person from an image or a video. The technology has massive potential because it can enable tracking people and analyzing motion in real time. Recently, much research has been conducted to optimize human pose estimation, but few works have focused on reviewing 3D human pose estimation. In this paper, we offer a comprehensive survey of the state-of-the-art methods for 3D human pose estimation, referred to as pose estimation solutions, implementations on images or videos that contain different numbers of people and advanced 3D human pose estimation techniques. Furthermore, different kinds of algorithms are further subdivided into sub-categories and compared in light of different methodologies. To the best of our knowledge, this is the first such comprehensive survey of the recent progress of 3D human pose estimation and will hopefully facilitate the completion, refinement and applications of 3D human pose estimation.


2021 ◽  
Vol 2129 (1) ◽  
pp. 012027
Author(s):  
Qing Zhang ◽  
Lei Ding ◽  
Kai Qing Zhou ◽  
Jian Feng Li

Abstract For traditional human pose estimation models rely on a large amount of human body feature information, this paper proposes an optimization model using genetic algorithm to solve the problem of multiple person body part assembly. Different from other human body parts assembly method. The method proposed in this paper depends on the joints position information, namely the sum of the connection distances between the joints as the objective function, and finds the optimal value to obtain the best human pose assembly information. The simulation results show that compared with the traditional OpenPose model, the model proposed in this paper can obtain the same human skeleton using less position information.


2021 ◽  
Vol 33 (3) ◽  
pp. 547-555
Author(s):  
Hitoshi Habe ◽  
Yoshiki Takeuchi ◽  
Kei Terayama ◽  
Masa-aki Sakagami ◽  
◽  
...  

We propose a pose estimation method using a National Advisory Committee for Aeronautics (NACA) airfoil model for fish schools. This method allows one to understand the state in which fish are swimming based on their posture and dynamic variations. Moreover, their collective behavior can be understood based on their posture changes. Therefore, fish pose is a crucial indicator for collective behavior analysis. We use the NACA model to represent the fish posture; this enables more accurate tracking and movement prediction owing to the capability of the model in describing posture dynamics. To fit the model to video data, we first adopt the DeepLabCut toolbox to detect body parts (i.e., head, center, and tail fin) in an image sequence. Subsequently, we apply a particle filter to fit a set of parameters from the NACA model. The results from DeepLabCut, i.e., three points on a fish body, are used to adjust the components of the state vector. This enables more reliable estimation results to be obtained when the speed and direction of the fish change abruptly. Experimental results using both simulation data and real video data demonstrate that the proposed method provides good results, including when rapid changes occur in the swimming direction.


Author(s):  
Yinzhong Qian ◽  
Wenbin Chen ◽  
I-fan Shen

This paper addresses the problem of action recognition from body pose. Detecting body pose in static image faces great challenges because of pose variability. Our method is based on action-specific hierarchical poselet. We use hierarchical body parts each of which is represented by a set of poselets to demonstrate the pose variability of the body part. Pose signature of a body part is represented by a vector of detection responses of all poselets for the part. In order to suppress detection error and ambiguity we explore to use part-based model (PBM) as detection context. We propose a constrained optimization algorithm for detecting all poselets of each part in context of PBM, which recover neglected pose clue by global optimization. We use a PBM with hierarchical part structure, where body parts have varying granularity from whole body steadily decreasing to limb parts. From the structure we get models with different depth to study saliency of different body parts in action recognition. Pose signature of an action image is composed of pose signature of all the body parts in the PBM, which provides rich discriminate information for our task. We evaluate our algorithm on two datasets. Compared with counterpart methods, pose signature has obvious performance improvement on static image dataset. While using the model trained from static image dataset to label detected action person on video dataset, pose signature achieves state-of-the-art performance.


Author(s):  
J. Tang ◽  
K. W. Wang

Abstract The underlying principle for vibration confinement is to alter the structural modes so that the corresponding modal components have much smaller amplitude in concerned areas than the remaining part of the structure. In this research, the state-of-the-art in vibration confinement technique is advanced in two correlated ways. First, a new eigenstructure assignment algorithm is developed to more directly suppress vibration in regions of interest. This algorithm is featured by the optimal selection of achievable eigenvectors that minimizes the eigenvector components at concerned areas by using the Rayleigh Principle. Second, the active control input is applied through an active-passive hybrid piezoelectric network. With the introduction of circuitry elements, which is much easier to implement than changing or adding mechanical components, the state matrices can be reformed and the design space in eigenstructure assignment can be greatly enlarged. The merit of the proposed system and scheme is demonstrated and analyzed using a numerical example.


2021 ◽  
Vol 13 (4) ◽  
pp. 663
Author(s):  
Runze Fan ◽  
Ting-Bing Xu ◽  
Zhenzhong Wei

This article addresses the challenge of 6D aircraft pose estimation from a single RGB image during the flight. Many recent works have shown that keypoints-based approaches, which first detect keypoints and then estimate the 6D pose, achieve remarkable performance. However, it is hard to locate the keypoints precisely in complex weather scenes. In this article, we propose a novel approach, called Pose Estimation with Keypoints and Structures (PEKS), which leverages multiple intermediate representations to estimate the 6D pose. Unlike previous works, our approach simultaneously locates keypoints and structures to recover the pose parameter of aircraft through a Perspective-n-Point Structure (PnPS) algorithm. These representations integrate the local geometric information of the object and the topological relationship between components of the target, which effectively improve the accuracy and robustness of 6D pose estimation. In addition, we contribute a dataset for aircraft pose estimation which consists of 3681 real images and 216,000 rendered images. Extensive experiments on our own aircraft pose dataset and multiple open-access pose datasets (e.g., ObjectNet3D, LineMOD) demonstrate that our proposed method can accurately estimate 6D aircraft pose in various complex weather scenes while achieving the comparative performance with the state-of-the-art pose estimation methods.


Sign in / Sign up

Export Citation Format

Share Document