scholarly journals Coarse-to-Fine Hand–Object Pose Estimation with Interaction-Aware Graph Convolutional Network

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8092
Author(s):  
Maomao Zhang ◽  
Ao Li ◽  
Honglei Liu ◽  
Minghui Wang

The analysis of hand–object poses from RGB images is important for understanding and imitating human behavior and acts as a key factor in various applications. In this paper, we propose a novel coarse-to-fine two-stage framework for hand–object pose estimation, which explicitly models hand–object relations in 3D pose refinement rather than in the process of converting 2D poses to 3D poses. Specifically, in the coarse stage, 2D heatmaps of hand and object keypoints are obtained from RGB image and subsequently fed into pose regressor to derive coarse 3D poses. As for the fine stage, an interaction-aware graph convolutional network called InterGCN is introduced to perform pose refinement by fully leveraging the hand–object relations in 3D context. One major challenge in 3D pose refinement lies in the fact that relations between hand and object change dynamically according to different HOI scenarios. In response to this issue, we leverage both general and interaction-specific relation graphs to significantly enhance the capacity of the network to cover variations of HOI scenarios for successful 3D pose refinement. Extensive experiments demonstrate state-of-the-art performance of our approach on benchmark hand–object datasets.

Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4064
Author(s):  
Can Li ◽  
Ping Chen ◽  
Xin Xu ◽  
Xinyu Wang ◽  
Aijun Yin

In this work, we propose a novel coarse-to-fine method for object pose estimation coupled with admittance control to promote robotic shaft-in-hole assembly. Considering that traditional approaches to locate the hole by force sensing are time-consuming, we employ 3D vision to estimate the axis pose of the hole. Thus, robots can locate the target hole in both position and orientation and enable the shaft to move into the hole along the axis orientation. In our method, first, the raw point cloud of a hole is processed to acquire the keypoints. Then, a coarse axis is extracted according to the geometric constraints between the surface normals and axis. Lastly, axis refinement is performed on the coarse axis to achieve higher precision. Practical experiments verified the effectiveness of the axis pose estimation. The assembly strategy composed of axis pose estimation and admittance control was effectively applied to the robotic shaft-in-hole assembly.


Sensors ◽  
2019 ◽  
Vol 19 (8) ◽  
pp. 1889 ◽  
Author(s):  
Shuang Liu ◽  
Hongli Xu ◽  
Yang Lin ◽  
Lei Gao

Autonomous underwater vehicles (AUVs) play very important roles in underwater missions. However, the reliability of the automated recovery of AUVs has still not been well addressed. We propose a vision-based framework for automatically recovering an AUV by another AUV in shallow water. The proposed framework contains a detection phase for the robust detection of underwater landmarks mounted on the docking station in shallow water and a pose-estimation phase for estimating the pose between AUVs and underwater landmarks. We propose a Laplacian-of-Gaussian-based coarse-to-fine blockwise (LCB) method for the detection of underwater landmarks to overcome ambient light and nonuniform spreading, which are the two main problems in shallow water. We propose a novel method for pose estimation in practical cases where landmarks are broken or covered by biofouling. In the experiments, we show that our proposed LCB method outperforms the state-of-the-art method in terms of remote landmark detection. We then combine our proposed vision-based framework with acoustic sensors in field experiments to demonstrate its effectiveness in the automated recovery of AUVs.


2020 ◽  
Vol 34 (07) ◽  
pp. 11924-11931
Author(s):  
Zhongwei Qiu ◽  
Kai Qiu ◽  
Jianlong Fu ◽  
Dongmei Fu

Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.


2020 ◽  
Author(s):  
Guoliang Liu

Full resolution depth is required in many realworld engineering applications. However, exist depth sensorsonly offer sparse depth sample points with limited resolutionand noise, e.g., LiDARs. We here propose a deep learningbased full resolution depth recovery method from monocularimages and corresponding sparse depth measurements of targetenvironment. The novelty of our idea is that the structure similarinformation between the RGB image and depth image is used torefine the dense depth estimation result. This important similarstructure information can be found using a correlation layerin the regression neural network. We show that the proposedmethod can achieve higher estimation accuracy compared tothe state of the art methods. The experiments conducted on theNYU Depth V2 prove the novelty of our idea.<br>


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6790
Author(s):  
Chi Xu ◽  
Jiale Chen ◽  
Mengyang Yao ◽  
Jun Zhou ◽  
Lijun Zhang ◽  
...  

6DoF object pose estimation is a foundation for many important applications, such as robotic grasping, automatic driving, and so on. However, it is very challenging to estimate 6DoF pose of transparent object which is commonly seen in our daily life, because the optical characteristics of transparent material lead to significant depth error which results in false estimation. To solve this problem, a two-stage approach is proposed to estimate 6DoF pose of transparent object from a single RGB-D image. In the first stage, the influence of the depth error is eliminated by transparent segmentation, surface normal recovering, and RANSAC plane estimation. In the second stage, an extended point-cloud representation is presented to accurately and efficiently estimate object pose. As far as we know, it is the first deep learning based approach which focuses on 6DoF pose estimation of transparent objects from a single RGB-D image. Experimental results show that the proposed approach can effectively estimate 6DoF pose of transparent object, and it out-performs the state-of-the-art baselines by a large margin.


2020 ◽  
Vol 2020 (8) ◽  
pp. 221-1-221-7
Author(s):  
Jianhang Chen ◽  
Daniel Mas Montserrat ◽  
Qian Lin ◽  
Edward J. Delp ◽  
Jan P. Allebach

We introduce a new image dataset for object detection and 6D pose estimation, named Extra FAT. The dataset consists of 825K photorealistic RGB images with annotations of groundtruth location and rotation for both the virtual camera and the objects. A registered pixel-level object segmentation mask is also provided for object detection and segmentation tasks. The dataset includes 110 different 3D object models. The object models were rendered in five scenes with diverse illumination, reflection, and occlusion conditions.


2020 ◽  
Vol 10 (2) ◽  
pp. 618
Author(s):  
Xianghan Wang ◽  
Jie Jiang ◽  
Yanming Guo ◽  
Lai Kang ◽  
Yingmei Wei ◽  
...  

Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.


2021 ◽  
Vol 13 (4) ◽  
pp. 663
Author(s):  
Runze Fan ◽  
Ting-Bing Xu ◽  
Zhenzhong Wei

This article addresses the challenge of 6D aircraft pose estimation from a single RGB image during the flight. Many recent works have shown that keypoints-based approaches, which first detect keypoints and then estimate the 6D pose, achieve remarkable performance. However, it is hard to locate the keypoints precisely in complex weather scenes. In this article, we propose a novel approach, called Pose Estimation with Keypoints and Structures (PEKS), which leverages multiple intermediate representations to estimate the 6D pose. Unlike previous works, our approach simultaneously locates keypoints and structures to recover the pose parameter of aircraft through a Perspective-n-Point Structure (PnPS) algorithm. These representations integrate the local geometric information of the object and the topological relationship between components of the target, which effectively improve the accuracy and robustness of 6D pose estimation. In addition, we contribute a dataset for aircraft pose estimation which consists of 3681 real images and 216,000 rendered images. Extensive experiments on our own aircraft pose dataset and multiple open-access pose datasets (e.g., ObjectNet3D, LineMOD) demonstrate that our proposed method can accurately estimate 6D aircraft pose in various complex weather scenes while achieving the comparative performance with the state-of-the-art pose estimation methods.


2020 ◽  
Vol 34 (07) ◽  
pp. 11221-11228
Author(s):  
Yueying Kao ◽  
Weiming Li ◽  
Qiang Wang ◽  
Zhouchen Lin ◽  
Wooshik Kim ◽  
...  

Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.


Sign in / Sign up

Export Citation Format

Share Document