scholarly journals Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching

Author(s):  
Andy Zeng ◽  
Shuran Song ◽  
Kuan-Ting Yu ◽  
Elliott Donlon ◽  
Francois R. Hogan ◽  
...  
2019 ◽  
pp. 027836491986801 ◽  
Author(s):  
Andy Zeng ◽  
Shuran Song ◽  
Kuan-Ting Yu ◽  
Elliott Donlon ◽  
Francois R. Hogan ◽  
...  

This article presents a robotic pick-and-place system that is capable of grasping and recognizing both known and novel objects in cluttered environments. The key new feature of the system is that it handles a wide range of object categories without needing any task-specific training data for novel objects. To achieve this, it first uses an object-agnostic grasping framework to map from visual observations to actions: inferring dense pixel-wise probability maps of the affordances for four different grasping primitive actions. It then executes the action with the highest affordance and recognizes picked objects with a cross-domain image classification framework that matches observed images to product images. Since product images are readily available for a wide range of objects (e.g., from the web), the system works out-of-the-box for novel objects without requiring any additional data collection or re-training. Exhaustive experimental results demonstrate that our multi-affordance grasping achieves high success rates for a wide variety of objects in clutter, and our recognition algorithm achieves high accuracy for both known and novel grasped objects. The approach was part of the MIT–Princeton Team system that took first place in the stowing task at the 2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are available online at http://arc.cs.princeton.edu/


2019 ◽  
Vol 11 (19) ◽  
pp. 2243 ◽  
Author(s):  
Weiquan Liu ◽  
Cheng Wang ◽  
Xuesheng Bian ◽  
Shuting Chen ◽  
Wei Li ◽  
...  

Establishing the spatial relationship between 2D images captured by real cameras and 3D models of the environment (2D and 3D space) is one way to achieve the virtual–real registration for Augmented Reality (AR) in outdoor environments. In this paper, we propose to match the 2D images captured by real cameras and the rendered images from the 3D image-based point cloud to indirectly establish the spatial relationship between 2D and 3D space. We call these two kinds of images as cross-domain images, because their imaging mechanisms and nature are quite different. However, unlike real camera images, the rendered images from the 3D image-based point cloud are inevitably contaminated with image distortion, blurred resolution, and obstructions, which makes image matching with the handcrafted descriptors or existing feature learning neural networks very challenging. Thus, we first propose a novel end-to-end network, AE-GAN-Net, consisting of two AutoEncoders (AEs) with Generative Adversarial Network (GAN) embedding, to learn invariant feature descriptors for cross-domain image matching. Second, a domain-consistent loss function, which balances image content and consistency of feature descriptors for cross-domain image pairs, is introduced to optimize AE-GAN-Net. AE-GAN-Net effectively captures domain-specific information, which is embedded into the learned feature descriptors, thus making the learned feature descriptors robust against image distortion, variations in viewpoints, spatial resolutions, rotation, and scaling. Experimental results show that AE-GAN-Net achieves state-of-the-art performance for image patch retrieval with the cross-domain image patch dataset, which is built from real camera images and the rendered images from 3D image-based point cloud. Finally, by evaluating virtual–real registration for AR on a campus by using the cross-domain image matching results, we demonstrate the feasibility of applying the proposed virtual–real registration to AR in outdoor environments.


2011 ◽  
Vol 30 (6) ◽  
pp. 1-10 ◽  
Author(s):  
Abhinav Shrivastava ◽  
Tomasz Malisiewicz ◽  
Abhinav Gupta ◽  
Alexei A. Efros

Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 380 ◽  
Author(s):  
Agnese Chiatti ◽  
Gianluca Bardaro ◽  
Emanuele Bastianelli ◽  
Ilaria Tiddi ◽  
Prasenjit Mitra ◽  
...  

To assist humans with their daily tasks, mobile robots are expected to navigate complex and dynamic environments, presenting unpredictable combinations of known and unknown objects. Most state-of-the-art object recognition methods are unsuitable for this scenario because they require that: (i) all target object classes are known beforehand, and (ii) a vast number of training examples is provided for each class. This evidence calls for novel methods to handle unknown object classes, for which fewer images are initially available (few-shot recognition). One way of tackling the problem is learning how to match novel objects to their most similar supporting example. Here, we compare different (shallow and deep) approaches to few-shot image matching on a novel data set, consisting of 2D views of common object types drawn from a combination of ShapeNet and Google. First, we assess if the similarity of objects learned from a combination of ShapeNet and Google can scale up to new object classes, i.e., categories unseen at training time. Furthermore, we show how normalising the learned embeddings can impact the generalisation abilities of the tested methods, in the context of two novel configurations: (i) where the weights of a Convolutional two-branch Network are imprinted and (ii) where the embeddings of a Convolutional Siamese Network are L2-normalised.


Author(s):  
Abhinav Shrivastava ◽  
Tomasz Malisiewicz ◽  
Abhinav Gupta ◽  
Alexei A. Efros

IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 17681-17698 ◽  
Author(s):  
Jing Li ◽  
Congcong Li ◽  
Tao Yang ◽  
Zhaoyang Lu

2019 ◽  
Vol 127 (11-12) ◽  
pp. 1738-1750 ◽  
Author(s):  
Bailey Kong ◽  
James Supanc̆ic̆ ◽  
Deva Ramanan ◽  
Charless C. Fowlkes

IEEE Access ◽  
2017 ◽  
Vol 5 ◽  
pp. 23190-23203 ◽  
Author(s):  
Jing Li ◽  
Congcong Li ◽  
Tao Yang ◽  
Zhaoyang Lu

Author(s):  
A. Olsen ◽  
J.C.H. Spence ◽  
P. Petroff

Since the point resolution of the JEOL 200CX electron microscope is up = 2.6Å it is not possible to obtain a true structure image of any of the III-V or elemental semiconductors with this machine. Since the information resolution limit set by electronic instability (1) u0 = (2/πλΔ)½ = 1.4Å for Δ = 50Å, it is however possible to obtain, by choice of focus and thickness, clear lattice images both resembling (see figure 2(b)), and not resembling, the true crystal structure (see (2) for an example of a Fourier image which is structurally incorrect). The crucial difficulty in using the information between Up and u0 is the fractional accuracy with which Af and Cs must be determined, and these accuracies Δff/4Δf = (2λu2Δf)-1 and ΔCS/CS = (λ3u4Cs)-1 (for a π/4 phase change, Δff the Fourier image period) are strongly dependent on spatial frequency u. Note that ΔCs(up)/Cs ≈ 10%, independent of CS and λ. Note also that the number n of identical high contrast spurious Fourier images within the depth of field Δz = (αu)-1 (α beam divergence) decreases with increasing high voltage, since n = 2Δz/Δff = θ/α = λu/α (θ the scattering angle). Thus image matching becomes easier in semiconductors at higher voltage because there are fewer high contrast identical images in any focal series.


Sign in / Sign up

Export Citation Format

Share Document