scholarly journals Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation

2020 ◽  
Vol 34 (07) ◽  
pp. 11312-11319 ◽  
Author(s):  
Jogendra Nath Kundu ◽  
Siddharth Seth ◽  
Rahul M V ◽  
Mugalodi Rakesh ◽  
Venkatesh Babu Radhakrishnan ◽  
...  

Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications. However, generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable, as these models often perform unsatisfactorily on unseen in-the-wild environments. Though weakly-supervised models have been proposed to address this shortcoming, performance of such models relies on availability of paired supervision on some related task, such as 2D pose or multi-view image pairs. In contrast, we propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our pose estimation framework relies on a minimal set of prior knowledge that defines the underlying kinematic 3D structure, such as skeletal joint connectivity information with bone-length ratios in a fixed canonical scale. The proposed model employs three consecutive differentiable transformations namely forward-kinematics, camera-projection and spatial-map transformation. This design not only acts as a suitable bottleneck stimulating effective pose disentanglement, but also yields interpretable latent pose representations avoiding training of an explicit latent embedding to pose mapper. Furthermore, devoid of unstable adversarial setup, we re-utilize the decoder to formalize an energy-based loss, which enables us to learn from in-the-wild videos, beyond laboratory settings. Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets. Qualitative results on unseen environments further establish our superior generalization ability.

Author(s):  
Dushyant Mehta ◽  
Helge Rhodin ◽  
Dan Casas ◽  
Pascal Fua ◽  
Oleksandr Sotnychenko ◽  
...  

Author(s):  
Sheng Jin ◽  
Lumin Xu ◽  
Jin Xu ◽  
Can Wang ◽  
Wentao Liu ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3769
Author(s):  
Michał Rapczyński ◽  
Philipp Werner ◽  
Sebastian Handrich ◽  
Ayoub Al-Hamadi

Vision-based 3D human pose estimation approaches are typically evaluated on datasets that are limited in diversity regarding many factors, e.g., subjects, poses, cameras, and lighting. However, for real-life applications, it would be desirable to create systems that work under arbitrary conditions (“in-the-wild”). To advance towards this goal, we investigated the commonly used datasets HumanEva-I, Human3.6M, and Panoptic Studio, discussed their biases (that is, their limitations in diversity), and illustrated them in cross-database experiments (for which we used a surrogate for roughly estimating in-the-wild performance). For this purpose, we first harmonized the differing skeleton joint definitions of the datasets, reducing the biases and systematic test errors in cross-database experiments. We further proposed a scale normalization method that significantly improved generalization across camera viewpoints, subjects, and datasets. In additional experiments, we investigated the effect of using more or less cameras, training with multiple datasets, applying a proposed anatomy-based pose validation step, and using OpenPose as the basis for the 3D pose estimation. The experimental results showed the usefulness of the joint harmonization, of the scale normalization, and of augmenting virtual cameras to significantly improve cross-database and in-database generalization. At the same time, the experiments showed that there were dataset biases that could not be compensated and call for new datasets covering more diversity. We discussed our results and promising directions for future work.


2019 ◽  
Vol 16 (04) ◽  
pp. 1941003
Author(s):  
Chunsheng Guo ◽  
Jialuo Zhou ◽  
Wenlong Du ◽  
Xuguang Zhang

Human pose estimation is a fundamental but challenging task in computer vision. The estimation of human pose mainly depends on the global information of the keypoint type and the local information of the keypoint location. However, the consistency of the cascading process makes it difficult for each stacking network to form a differentiation and collaboration mechanism. In order to solve these problems, this paper introduces a new human pose estimation framework called Multi-Scale Collaborative (MSC) network. The pre-processing network forms feature maps of different sizes, and dispatches them to various locations of the stack network, with small-scale features reaching the front-end stacking network and large-scale features reaching the back-end stacking network. A new loss function is proposed for MSC network. Different keypoints have different weight coefficients of loss function at different scales, and the keypoint weight coefficients are dynamically adjusted from the top hourglass network to the bottom hourglass network. Experimental results show that the proposed method is competitive in MPII and LSP challenge leaderboard among the state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document