Real-Time 3D object detection using improved convolutional neural network based on image-driven point cloud

Author(s):  
Zhiyong Gao ◽  
Jianhong Xiang

Background: While detecting the object directly from the 3D point cloud, the natural 3D patterns and invariance of 3D data are often obscure. Objective: In this work, we aimed at studying the 3D object detection from discrete, disordered and sparse 3D point clouds. Methods: The CNN is composed of the frustum sequence module, 3D instance segmentation module S-NET, 3D point cloud transformation module T-NET, and 3D boundary box estimation module E-NET. The search space of the object is determined by the frustum sequence module. The instance segmentation of the point cloud is performed by the 3D instance segmentation module. The 3D coordinates of the object are confirmed by the transformation module and the 3D bounding box estimation module. Results: Evaluated on KITTI benchmark dataset, our method outperforms the state of the art by remarkable margins while having real-time capability. Conclusion: We achieve real-time 3D object detection by proposing an improved convolutional neural network (CNN) based on image-driven point clouds.

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 171461-171470
Author(s):  
Dianwei Wang ◽  
Yanhui He ◽  
Ying Liu ◽  
Daxiang Li ◽  
Shiqian Wu ◽  
...  

2020 ◽  
Author(s):  
Joanna Stanisz ◽  
Konrad Lis ◽  
Tomasz Kryjak ◽  
Marek Gorgon

In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. Techniques like quantisation and pruning available in the Brevitas and PyTorch tools were used. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. The aim of this work was to propose a variant of the network which we will ultimately implement in an FPGA device. This will allow for real-time LiDAR data processing with low energy consumption. The obtained results indicate that even a significant quantisation from 32-bit floating point to 2-bit integer in the main part of the algorithm, results in 5%-9% decrease of the detection accuracy, while allowing for almost a 16-fold reduction in size of the model.


2021 ◽  
Vol 1979 (1) ◽  
pp. 012020
Author(s):  
Gadug Sudhansu ◽  
A N Mohamed Zabeeulla ◽  
M N Nachappa

Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6043
Author(s):  
Yujun Jiao ◽  
Zhishuai Yin

A two-phase cross-modality fusion detector is proposed in this study for robust and high-precision 3D object detection with RGB images and LiDAR point clouds. First, a two-stream fusion network is built into the framework of Faster RCNN to perform accurate and robust 2D detection. The visible stream takes the RGB images as inputs, while the intensity stream is fed with the intensity maps which are generated by projecting the reflection intensity of point clouds to the front view. A multi-layer feature-level fusion scheme is designed to merge multi-modal features across multiple layers in order to enhance the expressiveness and robustness of the produced features upon which region proposals are generated. Second, a decision-level fusion is implemented by projecting 2D proposals to the space of the point cloud to generate 3D frustums, on the basis of which the second-phase 3D detector is built to accomplish instance segmentation and 3D-box regression on the filtered point cloud. The results on the KITTI benchmark show that features extracted from RGB images and intensity maps complement each other, and our proposed detector achieves state-of-the-art performance on 3D object detection with a substantially lower running time as compared to available competitors.


Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 893 ◽  
Author(s):  
Li Wang ◽  
Ruifeng Li ◽  
Hezi Shi ◽  
Jingwen Sun ◽  
Lijun Zhao ◽  
...  

Environmental perception is a vital feature for service robots when working in an indoor environment for a long time. The general 3D reconstruction is a low-level geometric information description that cannot convey semantics. In contrast, higher level perception similar to humans requires more abstract concepts, such as objects and scenes. Moreover, the 2D object detection based on images always fails to provide the actual position and size of an object, which is quite important for a robot’s operation. In this paper, we focus on the 3D object detection to regress the object’s category, 3D size, and spatial position through a convolutional neural network (CNN). We propose a multi-channel CNN for 3D object detection, which fuses three input channels including RGB, depth, and bird’s eye view (BEV) images. We also propose a method to generate 3D proposals based on 2D ones in the RGB image and semantic prior. Training and test are conducted on the modified NYU V2 dataset and SUN RGB-D dataset in order to verify the effectiveness of the algorithm. We also carry out the actual experiments in a service robot to utilize the proposed 3D object detection method to enhance the environmental perception of the robot.


Sign in / Sign up

Export Citation Format

Share Document