scholarly journals Towards dense people detection with deep learning and depth images

2021 ◽  
Vol 106 ◽  
pp. 104484
Author(s):  
David Fuentes-Jimenez ◽  
Cristina Losada-Gutierrez ◽  
David Casillas-Perez ◽  
Javier Macias-Guarasa ◽  
Daniel Pizarro ◽  
...  
Author(s):  
Yi Liu ◽  
Ming Cong ◽  
Hang Dong ◽  
Dong Liu

Purpose The purpose of this paper is to propose a new method based on three-dimensional (3D) vision technologies and human skill integrated deep learning to solve assembly positioning task such as peg-in-hole. Design/methodology/approach Hybrid camera configuration was used to provide the global and local views. Eye-in-hand mode guided the peg to be in contact with the hole plate using 3D vision in global view. When the peg was in contact with the workpiece surface, eye-to-hand mode provided the local view to accomplish peg-hole positioning based on trained CNN. Findings The results of assembly positioning experiments proved that the proposed method successfully distinguished the target hole from the other same size holes according to the CNN. The robot planned the motion according to the depth images and human skill guide line. The final positioning precision was good enough for the robot to carry out force controlled assembly. Practical implications The developed framework can have an important impact on robotic assembly positioning process, which combine with the existing force-guidance assembly technology as to build a whole set of autonomous assembly technology. Originality/value This paper proposed a new approach to the robotic assembly positioning based on 3D visual technologies and human skill integrated deep learning. Dual cameras swapping mode was used to provide visual feedback for the entire assembly motion planning process. The proposed workpiece positioning method provided an effective disturbance rejection, autonomous motion planning and increased overall performance with depth images feedback. The proposed peg-hole positioning method with human skill integrated provided the capability of target perceptual aliasing avoiding and successive motion decision for the robotic assembly manipulation.


2021 ◽  
Author(s):  
Kazuyuki Kaneda ◽  
Tatsuya Ooba ◽  
Hideki Shimada ◽  
Osamu Shiku ◽  
Yuji Teshima

2019 ◽  
Vol 1 (3) ◽  
pp. 883-903 ◽  
Author(s):  
Daulet Baimukashev ◽  
Alikhan Zhilisbayev ◽  
Askat Kuzdeuov ◽  
Artemiy Oleinikov ◽  
Denis Fadeyev ◽  
...  

Recognizing objects and estimating their poses have a wide range of application in robotics. For instance, to grasp objects, robots need the position and orientation of objects in 3D. The task becomes challenging in a cluttered environment with different types of objects. A popular approach to tackle this problem is to utilize a deep neural network for object recognition. However, deep learning-based object detection in cluttered environments requires a substantial amount of data. Collection of these data requires time and extensive human labor for manual labeling. In this study, our objective was the development and validation of a deep object recognition framework using a synthetic depth image dataset. We synthetically generated a depth image dataset of 22 objects randomly placed in a 0.5 m × 0.5 m × 0.1 m box, and automatically labeled all objects with an occlusion rate below 70%. Faster Region Convolutional Neural Network (R-CNN) architecture was adopted for training using a dataset of 800,000 synthetic depth images, and its performance was tested on a real-world depth image dataset consisting of 2000 samples. Deep object recognizer has 40.96% detection accuracy on the real depth images and 93.5% on the synthetic depth images. Training the deep learning model with noise-added synthetic images improves the recognition accuracy for real images to 46.3%. The object detection framework can be trained on synthetically generated depth data, and then employed for object recognition on the real depth data in a cluttered environment. Synthetic depth data-based deep object detection has the potential to substantially decrease the time and human effort required for the extensive data collection and labeling.


2018 ◽  
Vol 147 ◽  
pp. 51-63 ◽  
Author(s):  
Chan Zheng ◽  
Xunmu Zhu ◽  
Xiaofan Yang ◽  
Lina Wang ◽  
Shuqin Tu ◽  
...  

2020 ◽  
Vol 10 (16) ◽  
pp. 5426 ◽  
Author(s):  
Qiang Liu ◽  
Haidong Zhang ◽  
Yiming Xu ◽  
Li Wang

Recently, deep learning frameworks have been deployed in visual odometry systems and achieved comparable results to traditional feature matching based systems. However, most deep learning-based frameworks inevitably need labeled data as ground truth for training. On the other hand, monocular odometry systems are incapable of restoring absolute scale. External or prior information has to be introduced for scale recovery. To solve these problems, we present a novel deep learning-based RGB-D visual odometry system. Our two main contributions are: (i) during network training and pose estimation, the depth images are fed into the network to form a dual-stream structure with the RGB images, and a dual-stream deep neural network is proposed. (ii) the system adopts an unsupervised end-to-end training method, thus the labor-intensive data labeling task is not required. We have tested our system on the KITTI dataset, and results show that the proposed RGB-D Visual Odometry (VO) system has obvious advantages over other state-of-the-art systems in terms of both translation and rotation errors.


Author(s):  
Vinícius da Silva Ramalho ◽  
Rômulo Francisco Lepinsk Lopes ◽  
Ricardo Luhm Silva ◽  
Marcelo Rudek

Synthetic datasets have been used to train 2D and 3D image-based deep learning models, and they serve as also as performance benchmarking. Although some authors already use 3D models for the development of navigation systems, their applications do not consider noise sources, which affects 3D sensors. Time-of-Flight sensors are susceptible to noise and conventional filters have limitations depending on the scenario it will be applied. On the other hand, deep learning filters can be more invariant to changes and take into consideration contextual information to attenuate noise. However, to train a deep learning filter a noiseless ground truth is required, but highly accurate hardware would be need. Synthetic datasets are provided with ground truth data, and similar noise can be applied to it, creating a noisy dataset for a deep learning approach. This research explores the training of a noise removal application using deep learning trained only with the Flying Things synthetic dataset with ground truth data and applying random noise to it. The trained model is validated with the Middlebury dataset which contains real-world data. The research results show that training the deep learning architecture for noise removal with only a synthetic dataset is capable to achieve near state of art performance, and the proposed model is able to process 12bit resolution depth images instead of 8bit images. Future studies will evaluate the algorithm performance regarding real-time noise removal to allow embedded applications.


Sign in / Sign up

Export Citation Format

Share Document