Depth Image–Based Deep Learning of Grasp Planning for Textureless Planar-Faced Objects in Vision-Guided Robotic Bin-Picking

Bin-picking of small parcels and other textureless planar-faced objects is a common task at warehouses. A general color image–based vision-guided robot picking system requires feature extraction and goal image preparation of various objects. However, feature extraction for goal image matching is difficult for textureless objects. Further, prior preparation of huge numbers of goal images is impractical at a warehouse. In this paper, we propose a novel depth image–based vision-guided robot bin-picking system for textureless planar-faced objects. Our method uses a deep convolutional neural network (DCNN) model that is trained on 15,000 annotated depth images synthetically generated in a physics simulator to directly predict grasp points without object segmentation. Unlike previous studies that predicted grasp points for a robot suction hand with only one vacuum cup, our DCNN also predicts optimal grasp patterns for a hand with two vacuum cups (left cup on, right cup on, or both cups on). Further, we propose a surface feature descriptor to extract surface features (center position and normal) and refine the predicted grasp point position, removing the need for texture features for vision-guided robot control and sim-to-real modification for DCNN model training. Experimental results demonstrate the efficiency of our system, namely that a robot with 7 degrees of freedom can pick randomly posed textureless boxes in a cluttered environment with a 97.5% success rate at speeds exceeding 1000 pieces per hour.

Download Full-text

Astronaut Visual Tracking of Flying Assistant Robot in Space Station Based on Deep Learning and Probabilistic Model

International Journal of Aerospace Engineering ◽

10.1155/2018/6357185 ◽

2018 ◽

Vol 2018 ◽

pp. 1-17 ◽

Cited By ~ 5

Author(s):

Rui Zhang ◽

Zhaokui Wang ◽

Yulin Zhang

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Visual Tracking ◽

Probabilistic Model ◽

Color Image ◽

Space Station ◽

Depth Image ◽

Single Shot ◽

Occlusion Detection ◽

Robust Tracking

Real-time astronaut visual tracking is the most important prerequisite for flying assistant robot to follow and assist the served astronaut in the space station. In this paper, an astronaut visual tracking algorithm which is based on deep learning and probabilistic model is proposed. Fine-tuned with feature extraction layers’ parameters being initialized by ready-made model, an improved SSD (Single Shot Multibox Detector) network was proposed for robust astronaut detection in color image. Associating the detection results with synchronized depth image measured by RGB-D camera, a probabilistic model is presented to ensure accurate and consecutive tracking of the certain served astronaut. The algorithm runs 10 fps at Jetson TX2, and it was extensively validated by several datasets which contain most instances of astronaut activities. The experimental results indicate that our proposed algorithm achieves not only robust tracking of the specified person with diverse postures or dressings but also effective occlusion detection for avoiding mistaken tracking.

Download Full-text

Pose Estimation of Primitive-Shaped Objects from a Depth Image Using Superquadric Representation

Applied Sciences ◽

10.3390/app10165442 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5442

Author(s):

Ryo Hachiuma ◽

Hideo Saito

Keyword(s):

Pose Estimation ◽

Degrees Of Freedom ◽

Shape Representation ◽

Estimation Method ◽

Depth Image ◽

Six Degrees Of Freedom ◽

Depth Images ◽

Object Pose Estimation ◽

Primitive Shape ◽

Conventional Methods

This paper presents a method for estimating the six Degrees of Freedom (6DoF) pose of texture-less primitive-shaped objects from depth images. As the conventional methods for object pose estimation require rich texture or geometric features to the target objects, these methods are not suitable for texture-less and geometrically simple shaped objects. In order to estimate the pose of the primitive-shaped object, the parameters that represent primitive shapes are estimated. However, these methods explicitly limit the number of types of primitive shapes that can be estimated. We employ superquadrics as a primitive shape representation that can represent various types of primitive shapes with only a few parameters. In order to estimate the superquadric parameters of primitive-shaped objects, the point cloud of the object must be segmented from a depth image. It is known that the parameter estimation is sensitive to outliers, which are caused by the miss-segmentation of the depth image. Therefore, we propose a novel estimation method for superquadric parameters that are robust to outliers. In the experiment, we constructed a dataset in which the person grasps and moves the primitive-shaped objects. The experimental results show that our estimation method outperformed three conventional methods and the baseline method.

Download Full-text

Using nonlocal filtering and feature extraction approaches in three-dimensional face recognition by Kinect

International Journal of Advanced Robotic Systems ◽

10.1177/1729881418787743 ◽

2018 ◽

Vol 15 (4) ◽

pp. 172988141878774 ◽

Cited By ~ 2

Author(s):

Shahram Mohammadi ◽

Omid Gervei

Keyword(s):

Feature Extraction ◽

Face Recognition ◽

Recognition Rate ◽

Three Dimensional ◽

Component Analysis ◽

Block Matching ◽

Depth Image ◽

Support Vector ◽

Depth Images ◽

Dimensional Face

To use low-cost depth sensors such as Kinect for three-dimensional face recognition with an acceptable rate of recognition, the challenges of filling up nonmeasured pixels and smoothing of noisy data need to be addressed. The main goal of this article is presenting solutions for aforementioned challenges as well as offering feature extraction methods to reach the highest level of accuracy in the presence of different facial expressions and occlusions. To use this method, a domestic database was created. First, the noisy pixels-called holes-of depth image is removed by solving multiple linear equations resulted from the values of the surrounding pixels of the holes. Then, bilateral and block matching 3-D filtering approaches, as representatives of local and nonlocal filtering approaches, are used for depth image smoothing. Curvelet transform as a well-known nonlocal feature extraction technique applied on both RGB and depth images. Two unsupervised dimension reduction techniques, namely, principal component analysis and independent component analysis, are used to reduce the dimension of extracted features. Finally, support vector machine is used for classification. Experimental results show a recognition rate of 90% for just depth images and 100% when combining RGB and depth data of a Kinect sensor which is much higher than other recently proposed algorithms.

Download Full-text

Deep Learning-based Fast Grasp Planning for Robotic Bin-picking by Small Data Set without GPU

10.36227/techrxiv.14384864 ◽

2021 ◽

Author(s):

SHOGO ARAI ◽

ZHUANG FENG ◽

Fuyuki Tokuda ◽

Adam Purnomo ◽

Kazuhiro Kosuge

Keyword(s):

Neural Network ◽

Deep Learning ◽

Computation Time ◽

Depth Image ◽

Training Dataset ◽

Small Data ◽

Data Set ◽

Grasp Planning ◽

Mechanical Parts ◽

Bin Picking

<div>This paper proposes a deep learning-based fast grasp detection method with a small dataset for robotic bin-picking. We consider the problem of grasping stacked up mechanical parts on a planar workspace using a parallel gripper. In this paper, we use a deep neural network to solve the problem with a single depth image. To reduce the computation time, we propose an edge-based algorithm to generate potential grasps. Then, a convolutional neural network (CNN) is applied to evaluate the robustness of all potential grasps for bin-picking. Finally, the proposed method ranks them and the object is grasped by using the grasp with the highest score. In bin-picking experiments, we evaluate the proposed method with a 7-DOF manipulator using textureless mechanical parts with complex shapes. The success ratio of grasping is 97%, and the average computation time of CNN inference is less than 0.23[s] on a laptop PC without a GPU array. In addition, we also confirm that the proposed method can be applied to unseen objects which are not included in the training dataset. </div>

Download Full-text

Deep Learning-based Fast Grasp Planning for Robotic Bin-picking by Small Data Set without GPU

10.36227/techrxiv.14384864.v1 ◽

2021 ◽

Author(s):

SHOGO ARAI ◽

ZHUANG FENG ◽

Fuyuki Tokuda ◽

Adam Purnomo ◽

Kazuhiro Kosuge

Keyword(s):

Neural Network ◽

Deep Learning ◽

Computation Time ◽

Depth Image ◽

Training Dataset ◽

Small Data ◽

Data Set ◽

Grasp Planning ◽

Mechanical Parts ◽

Bin Picking

Download Full-text

Fast and Robust Bin-picking System for Densely Piled Industrial Objects *

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9327193 ◽

2020 ◽

Author(s):

Jiaxin Guo ◽

Lian Fu ◽

Mingkai Jia ◽

Kaijun Wang ◽

Shan Liu

Keyword(s):

Bin Picking ◽

Picking System

Download Full-text

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Sensors ◽

10.3390/s21041299 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1299

Author(s):

Honglin Yuan ◽

Tim Hoogenkamp ◽

Remco C. Veltkamp

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

3D Models ◽

Depth Image ◽

Great Success ◽

Estimation Algorithms ◽

Depth Images ◽

Object Pose Estimation ◽

Image Pairs ◽

Bounding Boxes

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Download Full-text

HRDepthNet: Depth Image-Based Marker-Less Tracking of Body Joints

Sensors ◽

10.3390/s21041356 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1356

Author(s):

Linda Christin Büker ◽

Finnja Zuber ◽

Andreas Hein ◽

Sebastian Fudickar

Keyword(s):

Color Images ◽

Depth Image ◽

Accuracy Evaluation ◽

Timed Up And Go ◽

Position Errors ◽

Depth Images ◽

Upper And Lower Extremities ◽

Rgb Images ◽

Human Joints ◽

Body Joints

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).

Download Full-text

A Multi-Scale Feature Extraction-Based Normalized Attention Neural Network for Image Denoising

Electronics ◽

10.3390/electronics10030319 ◽

2021 ◽

Vol 10 (3) ◽

pp. 319

Author(s):

Yi Wang ◽

Xiao Song ◽

Guanghong Gong ◽

Ni Li

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Image Denoising ◽

Color Image ◽

Rapid Development ◽

Similarity Index ◽

Structural Similarity ◽

Convolutional Network ◽

Scale Feature ◽

Multi Scale

Due to the rapid development of deep learning and artificial intelligence techniques, denoising via neural networks has drawn great attention due to their flexibility and excellent performances. However, for most convolutional network denoising methods, the convolution kernel is only one layer deep, and features of distinct scales are neglected. Moreover, in the convolution operation, all channels are treated equally; the relationships of channels are not considered. In this paper, we propose a multi-scale feature extraction-based normalized attention neural network (MFENANN) for image denoising. In MFENANN, we define a multi-scale feature extraction block to extract and combine features at distinct scales of the noisy image. In addition, we propose a normalized attention network (NAN) to learn the relationships between channels, which smooths the optimization landscape and speeds up the convergence process for training an attention model. Moreover, we introduce the NAN to convolutional network denoising, in which each channel gets gain; channels can play different roles in the subsequent convolution. To testify the effectiveness of the proposed MFENANN, we used both grayscale and color image sets whose noise levels ranged from 0 to 75 to do the experiments. The experimental results show that compared with some state-of-the-art denoising methods, the restored images of MFENANN have larger peak signal-to-noise ratios (PSNR) and structural similarity index measure (SSIM) values and get better overall appearance.

Download Full-text