scholarly journals Implementation of the PointPillars Network for 3D Object Detection in Reprogrammable Heterogeneous Devices Using FINN

Author(s):  
Joanna Stanisz ◽  
Konrad Lis ◽  
Marek Gorgon

AbstractIn this paper, we present a hardware-software implementation of a deep neural network for object detection based on a point cloud obtained by a LiDAR sensor. The PointPillars network was used in the research, as it is a reasonable compromise between detection accuracy and calculation complexity. The Brevitas / PyTorch tools were used for network quantisation (described in our previous paper) and the FINN tool for hardware implementation in the reprogrammable Zynq UltraScale+ MPSoC device. The obtained results show that quite a significant computation precision limitation along with a few network architecture simplifications allows the solution to be implemented on a heterogeneous embedded platform with maximum 19% AP loss in 3D, maximum 8% AP loss in BEV and execution time 375ms (the FPGA part takes 262ms). We have also compared our solution in terms of inference speed with a Vitis AI implementation proposed by Xilinx (19 Hz frame rate). Especially, we have thoroughly investigated the fundamental causes of differences in the frame rate of both solutions. The code is available at https://github.com/vision-agh/pp-finn.

2020 ◽  
Author(s):  
Joanna Stanisz ◽  
Konrad Lis ◽  
Tomasz Kryjak ◽  
Marek Gorgon

In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. Techniques like quantisation and pruning available in the Brevitas and PyTorch tools were used. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. The aim of this work was to propose a variant of the network which we will ultimately implement in an FPGA device. This will allow for real-time LiDAR data processing with low energy consumption. The obtained results indicate that even a significant quantisation from 32-bit floating point to 2-bit integer in the main part of the algorithm, results in 5%-9% decrease of the detection accuracy, while allowing for almost a 16-fold reduction in size of the model.


2020 ◽  
Author(s):  
Joanna Stanisz ◽  
Konrad Lis ◽  
Tomasz Kryjak ◽  
Marek Gorgon

In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. Techniques like quantisation and pruning available in the Brevitas and PyTorch tools were used. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. The aim of this work was to propose a variant of the network which we will ultimately implement in an FPGA device. This will allow for real-time LiDAR data processing with low energy consumption. The obtained results indicate that even a significant quantisation from 32-bit floating point to 2-bit integer in the main part of the algorithm, results in 5%-9% decrease of the detection accuracy, while allowing for almost a 16-fold reduction in size of the model.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Wanyi Zhang ◽  
Xiuhua Fu ◽  
Wei Li

3D object detection based on point cloud data in the unmanned driving scene has always been a research hotspot in unmanned driving sensing technology. With the development and maturity of deep neural networks technology, the method of using neural network to detect three-dimensional object target begins to show great advantages. The experimental results show that the mismatch between anchor and training samples would affect the detection accuracy, but it has not been well solved. The contributions of this paper are as follows. For the first time, deformable convolution is introduced into the point cloud object detection network, which enhances the adaptability of the network to vehicles with different directions and shapes. Secondly, a new generation method of anchor in RPN is proposed, which can effectively prevent the mismatching between the anchor and ground truth and remove the angle classification loss in the loss function. Compared with the state-of-the-art method, the AP and AOS of the detection results are improved.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Huaijin Liu ◽  
Jixiang Du ◽  
Yong Zhang ◽  
Hongbo Zhang

Currently, there are many kinds of voxel-based multisensor 3D object detectors, while point-based multisensor 3D object detectors have not been fully studied. In this paper, we propose a new 3D two-stage object detection method based on point cloud and image fusion to improve the detection accuracy. To address the problem of insufficient semantic information of point cloud, we perform multiscale deep fusion of LiDAR point and camera image in a point-wise manner to enhance point features. Due to the imbalance of LiDAR points, the object point cloud in the long-distance area is sparse. We design a point cloud completion module to predict the spatial shape of objects in the candidate boxes and extract the structural information to improve the feature representation ability to further refine the boxes. The framework is evaluated on widely used KITTI and SUN-RGBD dataset. Experimental results show that our method outperforms all state-of-the-art point-based 3D object detection methods and has comparable performance to voxel-based methods as well.


Author(s):  
Zhiyong Gao ◽  
Jianhong Xiang

Background: While detecting the object directly from the 3D point cloud, the natural 3D patterns and invariance of 3D data are often obscure. Objective: In this work, we aimed at studying the 3D object detection from discrete, disordered and sparse 3D point clouds. Methods: The CNN is composed of the frustum sequence module, 3D instance segmentation module S-NET, 3D point cloud transformation module T-NET, and 3D boundary box estimation module E-NET. The search space of the object is determined by the frustum sequence module. The instance segmentation of the point cloud is performed by the 3D instance segmentation module. The 3D coordinates of the object are confirmed by the transformation module and the 3D bounding box estimation module. Results: Evaluated on KITTI benchmark dataset, our method outperforms the state of the art by remarkable margins while having real-time capability. Conclusion: We achieve real-time 3D object detection by proposing an improved convolutional neural network (CNN) based on image-driven point clouds.


2021 ◽  
Author(s):  
Xinrui Yan ◽  
Yuhao Huang ◽  
Shitao Chen ◽  
Zhixiong Nan ◽  
Jingmin Xin ◽  
...  

Sensors ◽  
2019 ◽  
Vol 19 (19) ◽  
pp. 4093 ◽  
Author(s):  
Jun Xu ◽  
Yanxin Ma ◽  
Songhua He ◽  
Jiahua Zhu

Three-dimensional (3D) object detection is an important research in 3D computer vision with significant applications in many fields, such as automatic driving, robotics, and human–computer interaction. However, the low precision is an urgent problem in the field of 3D object detection. To solve it, we present a framework for 3D object detection in point cloud. To be specific, a designed Backbone Network is used to make fusion of low-level features and high-level features, which makes full use of various information advantages. Moreover, the two-dimensional (2D) Generalized Intersection over Union is extended to 3D use as part of the loss function in our framework. Empirical experiments of Car, Cyclist, and Pedestrian detection have been conducted respectively on the KITTI benchmark. Experimental results with average precision (AP) have shown the effectiveness of the proposed network.


Author(s):  
Xin Zhao ◽  
Zhe Liu ◽  
Ruolan Hu ◽  
Kaiqi Huang

3D object detection plays an important role in a large number of real-world applications. It requires us to estimate the localizations and the orientations of 3D objects in real scenes. In this paper, we present a new network architecture which focuses on utilizing the front view images and frustum point clouds to generate 3D detection results. On the one hand, a PointSIFT module is utilized to improve the performance of 3D segmentation. It can capture the information from different orientations in space and the robustness to different scale shapes. On the other hand, our network obtains the useful features and suppresses the features with less information by a SENet module. This module reweights channel features and estimates the 3D bounding boxes more effectively. Our method is evaluated on both KITTI dataset for outdoor scenes and SUN-RGBD dataset for indoor scenes. The experimental results illustrate that our method achieves better performance than the state-of-the-art methods especially when point clouds are highly sparse.


2020 ◽  
Vol 34 (07) ◽  
pp. 10478-10485 ◽  
Author(s):  
Yingjie Cai ◽  
Buyu Li ◽  
Zeyu Jiao ◽  
Hongsheng Li ◽  
Xingyu Zeng ◽  
...  

Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images. Since the location recovery in 3D space is quite difficult on account of absence of depth information, this paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task. Different from the widely studied 2D bounding boxes, the proposed novel structured polygon in the 2D image consists of several projected surfaces of the target object. Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection. In order to inversely project the predicted 2D structured polygon to a cuboid in the 3D physical world, the following depth recovery task uses the object height prior to complete the inverse projection transformation with the given camera projection matrix. Moreover, a fine-grained 3D box refinement scheme is proposed to further rectify the 3D detection results. Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.


Sign in / Sign up

Export Citation Format

Share Document