Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

This article presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses an object-agnostic grasping framework to map from visual observations to actions: inferring dense pixel-wise probability maps of the affordances for four different grasping primitive actions. It then executes the action with the highest affordance and recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional data collection or re-training. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT–Princeton Team system that took first place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.edu/

Download Full-text

Cross Domain Image Matching in Presence of Outliers

2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW) ◽

10.1109/iccvw.2019.00406 ◽

2019 ◽

Author(s):

Xin Liu ◽

Seyran Khademi ◽

Jan Van Gemert

Keyword(s):

Image Matching ◽

Cross Domain

Download Full-text

AE-GAN-Net: Learning Invariant Feature Descriptor to Match Ground Camera Images and a Large-Scale 3D Image-Based Point Cloud for Outdoor Augmented Reality

Remote Sensing ◽

10.3390/rs11192243 ◽

2019 ◽

Vol 11 (19) ◽

pp. 2243 ◽

Cited By ~ 1

Author(s):

Weiquan Liu ◽

Cheng Wang ◽

Xuesheng Bian ◽

Shuting Chen ◽

Wei Li ◽

...

Keyword(s):

Image Matching ◽

Point Cloud ◽

Spatial Relationship ◽

3D Image ◽

Image Patch ◽

3D Space ◽

Cross Domain ◽

Feature Descriptors ◽

Outdoor Environments ◽

2D And 3D

Establishing the spatial relationship between 2D images captured by real cameras and 3D models of the environment (2D and 3D space) is one way to achieve the virtual–real registration for Augmented Reality (AR) in outdoor environments. In this paper, we propose to match the 2D images captured by real cameras and the rendered images from the 3D image-based point cloud to indirectly establish the spatial relationship between 2D and 3D space. We call these two kinds of images as cross-domain images, because their imaging mechanisms and nature are quite different. However, unlike real camera images, the rendered images from the 3D image-based point cloud are inevitably contaminated with image distortion, blurred resolution, and obstructions, which makes image matching with the handcrafted descriptors or existing feature learning neural networks very challenging. Thus, we first propose a novel end-to-end network, AE-GAN-Net, consisting of two AutoEncoders (AEs) with Generative Adversarial Network (GAN) embedding, to learn invariant feature descriptors for cross-domain image matching. Second, a domain-consistent loss function, which balances image content and consistency of feature descriptors for cross-domain image pairs, is introduced to optimize AE-GAN-Net. AE-GAN-Net effectively captures domain-specific information, which is embedded into the learned feature descriptors, thus making the learned feature descriptors robust against image distortion, variations in viewpoints, spatial resolutions, rotation, and scaling. Experimental results show that AE-GAN-Net achieves state-of-the-art performance for image patch retrieval with the cross-domain image patch dataset, which is built from real camera images and the rendered images from 3D image-based point cloud. Finally, by evaluating virtual–real registration for AR on a campus by using the cross-domain image matching results, we demonstrate the feasibility of applying the proposed virtual–real registration to AR in outdoor environments.

Download Full-text

Data-driven visual similarity for cross-domain image matching

ACM Transactions on Graphics ◽

10.1145/2070781.2024188 ◽

2011 ◽

Vol 30 (6) ◽

pp. 1-10 ◽

Cited By ~ 105

Author(s):

Abhinav Shrivastava ◽

Tomasz Malisiewicz ◽

Abhinav Gupta ◽

Alexei A. Efros

Keyword(s):

Image Matching ◽

Visual Similarity ◽

Data Driven ◽

Cross Domain

Download Full-text

Task-Agnostic Object Recognition for Mobile Robots through Few-Shot Image Matching

Electronics ◽

10.3390/electronics9030380 ◽

2020 ◽

Vol 9 (3) ◽

pp. 380 ◽

Cited By ~ 1

Author(s):

Agnese Chiatti ◽

Gianluca Bardaro ◽

Emanuele Bastianelli ◽

Ilaria Tiddi ◽

Prasenjit Mitra ◽

...

Keyword(s):

Object Recognition ◽

Mobile Robots ◽

Image Matching ◽

Scale Up ◽

Target Object ◽

Data Set ◽

Vast Number ◽

Training Time ◽

Novel Objects ◽

Object Classes

To assist humans with their daily tasks, mobile robots are expected to navigate complex and dynamic environments, presenting unpredictable combinations of known and unknown objects. Most state-of-the-art object recognition methods are unsuitable for this scenario because they require that: (i) all target object classes are known beforehand, and (ii) a vast number of training examples is provided for each class. This evidence calls for novel methods to handle unknown object classes, for which fewer images are initially available (few-shot recognition). One way of tackling the problem is learning how to match novel objects to their most similar supporting example. Here, we compare different (shallow and deep) approaches to few-shot image matching on a novel data set, consisting of 2D views of common object types drawn from a combination of ShapeNet and Google. First, we assess if the similarity of objects learned from a combination of ShapeNet and Google can scale up to new object classes, i.e., categories unseen at training time. Furthermore, we show how normalising the learned embeddings can impact the generalisation abilities of the tested methods, in the context of two novel configurations: (i) where the weights of a Convolutional two-branch Network are imprinted and (ii) where the embeddings of a Convolutional Siamese Network are L2-normalised.

Download Full-text

Data-driven visual similarity for cross-domain image matching

Proceedings of the 2011 SIGGRAPH Asia Conference on - SA '11 ◽

10.1145/2024156.2024188 ◽

2011 ◽

Cited By ~ 12

Author(s):

Abhinav Shrivastava ◽

Tomasz Malisiewicz ◽

Abhinav Gupta ◽

Alexei A. Efros

Keyword(s):

Image Matching ◽

Visual Similarity ◽

Data Driven ◽

Cross Domain

Download Full-text

Cross-Domain Co-Occurring Feature for Visible-Infrared Image Matching

IEEE Access ◽

10.1109/access.2018.2820680 ◽

2018 ◽

Vol 6 ◽

pp. 17681-17698 ◽

Cited By ~ 6

Author(s):

Jing Li ◽

Congcong Li ◽

Tao Yang ◽

Zhaoyang Lu

Keyword(s):

Image Matching ◽

Infrared Image ◽

Cross Domain

Download Full-text

Cross-Domain Image Matching with Deep Feature Maps

International Journal of Computer Vision ◽

10.1007/s11263-018-01143-3 ◽

2019 ◽

Vol 127 (11-12) ◽

pp. 1738-1750 ◽

Cited By ~ 10

Author(s):

Bailey Kong ◽

James Supanc̆ic̆ ◽

Deva Ramanan ◽

Charless C. Fowlkes

Keyword(s):

Image Matching ◽

Feature Maps ◽

Cross Domain ◽

Deep Feature

Download Full-text

A Novel Visual-Vocabulary-Translator-Based Cross-Domain Image Matching

IEEE Access ◽

10.1109/access.2017.2759799 ◽

2017 ◽

Vol 5 ◽

pp. 23190-23203 ◽

Cited By ~ 7

Author(s):

Jing Li ◽

Congcong Li ◽

Tao Yang ◽

Zhaoyang Lu

Keyword(s):

Image Matching ◽

Visual Vocabulary ◽

Cross Domain

Download Full-text

Compositional Analysis of III-V Interface Lattice Images

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s1431927600001331 ◽

1980 ◽

Vol 38 ◽

pp. 318-319

Author(s):

A. Olsen ◽

J.C.H. Spence ◽

P. Petroff

Keyword(s):

Crystal Structure ◽

Image Matching ◽

Compositional Analysis ◽

Depth Of Field ◽

Limit Set ◽

High Contrast ◽

Resolution Limit ◽

Fourier Image ◽

Lattice Images ◽

True Structure

Since the point resolution of the JEOL 200CX electron microscope is up = 2.6Å it is not possible to obtain a true structure image of any of the III-V or elemental semiconductors with this machine. Since the information resolution limit set by electronic instability (1) u0 = (2/πλΔ)½ = 1.4Å for Δ = 50Å, it is however possible to obtain, by choice of focus and thickness, clear lattice images both resembling (see figure 2(b)), and not resembling, the true crystal structure (see (2) for an example of a Fourier image which is structurally incorrect). The crucial difficulty in using the information between Up and u0 is the fractional accuracy with which Af and Cs must be determined, and these accuracies Δff/4Δf = (2λu2Δf)-1 and ΔCS/CS = (λ3u4Cs)-1 (for a π/4 phase change, Δff the Fourier image period) are strongly dependent on spatial frequency u. Note that ΔCs(up)/Cs ≈ 10%, independent of CS and λ. Note also that the number n of identical high contrast spurious Fourier images within the depth of field Δz = (αu)-1 (α beam divergence) decreases with increasing high voltage, since n = 2Δz/Δff = θ/α = λu/α (θ the scattering angle). Thus image matching becomes easier in semiconductors at higher voltage because there are fewer high contrast identical images in any focal series.

Download Full-text