Coarse-to-Fine Hand–Object Pose Estimation with Interaction-Aware Graph Convolutional Network

The analysis of hand–object poses from RGB images is important for understanding and imitating human behavior and acts as a key factor in various applications. In this paper, we propose a novel coarse-to-fine two-stage framework for hand–object pose estimation, which explicitly models hand–object relations in 3D pose refinement rather than in the process of converting 2D poses to 3D poses. Specifically, in the coarse stage, 2D heatmaps of hand and object keypoints are obtained from RGB image and subsequently fed into pose regressor to derive coarse 3D poses. As for the fine stage, an interaction-aware graph convolutional network called InterGCN is introduced to perform pose refinement by fully leveraging the hand–object relations in 3D context. One major challenge in 3D pose refinement lies in the fact that relations between hand and object change dynamically according to different HOI scenarios. In response to this issue, we leverage both general and interaction-specific relation graphs to significantly enhance the capacity of the network to cover variations of HOI scenarios for successful 3D pose refinement. Extensive experiments demonstrate state-of-the-art performance of our approach on benchmark hand–object datasets.

Download Full-text

A Coarse-to-Fine Method for Estimating the Axis Pose Based on 3D Point Clouds in Robotic Cylindrical Shaft-in-Hole Assembly

Sensors ◽

10.3390/s21124064 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4064

Author(s):

Can Li ◽

Ping Chen ◽

Xin Xu ◽

Xinyu Wang ◽

Aijun Yin

Keyword(s):

Pose Estimation ◽

Point Clouds ◽

Geometric Constraints ◽

Admittance Control ◽

3D Vision ◽

Axis Orientation ◽

3D Point Clouds ◽

Object Pose Estimation ◽

Traditional Approaches ◽

Coarse To Fine

In this work, we propose a novel coarse-to-fine method for object pose estimation coupled with admittance control to promote robotic shaft-in-hole assembly. Considering that traditional approaches to locate the hole by force sensing are time-consuming, we employ 3D vision to estimate the axis pose of the hole. Thus, robots can locate the target hole in both position and orientation and enable the shaft to move into the hole along the axis orientation. In our method, first, the raw point cloud of a hole is processed to acquire the keypoints. Then, a coarse axis is extracted according to the geometric constraints between the surface normals and axis. Lastly, axis refinement is performed on the coarse axis to achieve higher precision. Practical experiments verified the effectiveness of the axis pose estimation. The assembly strategy composed of axis pose estimation and admittance control was effectively applied to the robotic shaft-in-hole assembly.

Download Full-text

Visual Navigation for Recovering an AUV by Another AUV in Shallow Water

Sensors ◽

10.3390/s19081889 ◽

2019 ◽

Vol 19 (8) ◽

pp. 1889 ◽

Cited By ~ 4

Author(s):

Shuang Liu ◽

Hongli Xu ◽

Yang Lin ◽

Lei Gao

Keyword(s):

Shallow Water ◽

Pose Estimation ◽

Field Experiments ◽

State Of The Art ◽

Autonomous Underwater Vehicles ◽

Visual Navigation ◽

Ambient Light ◽

Robust Detection ◽

Novel Method ◽

Coarse To Fine

Autonomous underwater vehicles (AUVs) play very important roles in underwater missions. However, the reliability of the automated recovery of AUVs has still not been well addressed. We propose a vision-based framework for automatically recovering an AUV by another AUV in shallow water. The proposed framework contains a detection phase for the robust detection of underwater landmarks mounted on the docking station in shallow water and a pose-estimation phase for estimating the pose between AUVs and underwater landmarks. We propose a Laplacian-of-Gaussian-based coarse-to-fine blockwise (LCB) method for the detection of underwater landmarks to overcome ambient light and nonuniform spreading, which are the two main problems in shallow water. We propose a novel method for pose estimation in practical cases where landmarks are broken or covered by biofouling. In the experiments, we show that our proposed LCB method outperforms the state-of-the-art method in terms of remote landmark detection. We then combine our proposed vision-based framework with acoustic sensors in field experiments to demonstrate its effectiveness in the automated recovery of AUVs.

Download Full-text

DGCN: Dynamic Graph Convolutional Network for Efficient Multi-Person Pose Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6867 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11924-11931

Author(s):

Zhongwei Qiu ◽

Kai Qiu ◽

Jianlong Fu ◽

Dongmei Fu

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Semantic Relations ◽

Dynamic Graphs ◽

Dynamic Graph ◽

Convolutional Network ◽

Bottom Up ◽

Multi Level ◽

Human Pose ◽

Relative Gains

Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.

Download Full-text

Full Resolution Dense Depth Recovery by Fusing RGB Images and Sparse Depth

10.36227/techrxiv.11687193.v1 ◽

2020 ◽

Author(s):

Guoliang Liu

Keyword(s):

State Of The Art ◽

Depth Estimation ◽

Depth Image ◽

Estimation Accuracy ◽

Estimation Result ◽

Recovery Method ◽

Depth Recovery ◽

Full Resolution ◽

Rgb Images ◽

Rgb Image

Full resolution depth is required in many realworld engineering applications. However, exist depth sensorsonly offer sparse depth sample points with limited resolutionand noise, e.g., LiDARs. We here propose a deep learningbased full resolution depth recovery method from monocularimages and corresponding sparse depth measurements of targetenvironment. The novelty of our idea is that the structure similarinformation between the RGB image and depth image is used torefine the dense depth estimation result. This important similarstructure information can be found using a correlation layerin the regression neural network. We show that the proposedmethod can achieve higher estimation accuracy compared tothe state of the art methods. The experiments conducted on theNYU Depth V2 prove the novelty of our idea.<br>

Download Full-text

6DoF Pose Estimation of Transparent Object from a Single RGB-D Image

Sensors ◽

10.3390/s20236790 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6790

Author(s):

Chi Xu ◽

Jiale Chen ◽

Mengyang Yao ◽

Jun Zhou ◽

Lijun Zhang ◽

...

Keyword(s):

Pose Estimation ◽

Point Cloud ◽

State Of The Art ◽

Transparent Material ◽

Automatic Driving ◽

Second Stage ◽

Transparent Objects ◽

Object Pose Estimation ◽

Transparent Object ◽

Extended Point

6DoF object pose estimation is a foundation for many important applications, such as robotic grasping, automatic driving, and so on. However, it is very challenging to estimate 6DoF pose of transparent object which is commonly seen in our daily life, because the optical characteristics of transparent material lead to significant depth error which results in false estimation. To solve this problem, a two-stage approach is proposed to estimate 6DoF pose of transparent object from a single RGB-D image. In the first stage, the influence of the depth error is eliminated by transparent segmentation, surface normal recovering, and RANSAC plane estimation. In the second stage, an extended point-cloud representation is presented to accurately and efficiently estimate object pose. As far as we know, it is the first deep learning based approach which focuses on 6DoF pose estimation of transparent objects from a single RGB-D image. Experimental results show that the proposed approach can effectively estimate 6DoF pose of transparent object, and it out-performs the state-of-the-art baselines by a large margin.

Download Full-text

Extra FAT: A Photorealistic Dataset for 6D Object Pose Estimation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.8.imawm-221 ◽

2020 ◽

Vol 2020 (8) ◽

pp. 221-1-221-7

Author(s):

Jianhang Chen ◽

Daniel Mas Montserrat ◽

Qian Lin ◽

Edward J. Delp ◽

Jan P. Allebach

Keyword(s):

Object Detection ◽

Pose Estimation ◽

Object Segmentation ◽

3D Object ◽

Virtual Camera ◽

Image Dataset ◽

Rgb Images ◽

Object Pose Estimation ◽

Object Models

We introduce a new image dataset for object detection and 6D pose estimation, named Extra FAT. The dataset consists of 825K photorealistic RGB images with annotations of groundtruth location and rotation for both the virtual camera and the objects. A registered pixel-level object segmentation mask is also provided for object detection and segmentation tasks. The dataset includes 110 different 3D object models. The object models were rendered in five scenes with diverse illumination, reflection, and occlusion conditions.

Download Full-text

Hybrid 6D Object Pose Estimation from the RGB Image

Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics ◽

10.5220/0007933105410549 ◽

2019 ◽

Author(s):

Rafal Staszak ◽

Dominik Belter

Keyword(s):

Pose Estimation ◽

Object Pose Estimation ◽

Rgb Image

Download Full-text

CFAM: Estimating 3D Hand Poses from a Single RGB Image with Attention

Applied Sciences ◽

10.3390/app10020618 ◽

2020 ◽

Vol 10 (2) ◽

pp. 618

Author(s):

Xianghan Wang ◽

Jie Jiang ◽

Yanming Guo ◽

Lai Kang ◽

Yingmei Wei ◽

...

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

Spatial Information ◽

Image Features ◽

Estimation Methods ◽

Feature Maps ◽

Hand Pose Estimation ◽

Rgb Images ◽

Rgb Image ◽

Hand Pose

Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.

Download Full-text

Estimating 6D Aircraft Pose from Keypoints and Structures

Remote Sensing ◽

10.3390/rs13040663 ◽

2021 ◽

Vol 13 (4) ◽

pp. 663

Author(s):

Runze Fan ◽

Ting-Bing Xu ◽

Zhenzhong Wei

Keyword(s):

Open Access ◽

Pose Estimation ◽

State Of The Art ◽

The State ◽

Estimation Methods ◽

Geometric Information ◽

Comparative Performance ◽

Topological Relationship ◽

Novel Approach ◽

Rgb Image

This article addresses the challenge of 6D aircraft pose estimation from a single RGB image during the flight. Many recent works have shown that keypoints-based approaches, which first detect keypoints and then estimate the 6D pose, achieve remarkable performance. However, it is hard to locate the keypoints precisely in complex weather scenes. In this article, we propose a novel approach, called Pose Estimation with Keypoints and Structures (PEKS), which leverages multiple intermediate representations to estimate the 6D pose. Unlike previous works, our approach simultaneously locates keypoints and structures to recover the pose parameter of aircraft through a Perspective-n-Point Structure (PnPS) algorithm. These representations integrate the local geometric information of the object and the topological relationship between components of the target, which effectively improve the accuracy and robustness of 6D pose estimation. In addition, we contribute a dataset for aircraft pose estimation which consists of 3681 real images and 216,000 rendered images. Extensive experiments on our own aircraft pose dataset and multiple open-access pose datasets (e.g., ObjectNet3D, LineMOD) demonstrate that our proposed method can accurately estimate 6D aircraft pose in various complex weather scenes while achieving the comparative performance with the state-of-the-art pose estimation methods.

Download Full-text

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6781 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11221-11228

Author(s):

Yueying Kao ◽

Weiming Li ◽

Qiang Wang ◽

Zhouchen Lin ◽

Wooshik Kim ◽

...

Keyword(s):

Pose Estimation ◽

Large Scale ◽

Synthetic Data ◽

Real Data ◽

Depth Image ◽

Depth Images ◽

In The Wild ◽

Object Pose Estimation ◽

Image Pairs ◽

Rgb Image

Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.

Download Full-text