A Novel Regional Fusion Network for 3D Object Detection based on RGB Images and Point Clouds

Optimization of the PointPillars network for 3D object detection in point clouds

10.36227/techrxiv.12593555.v1 ◽

2020 ◽

Author(s):

Joanna Stanisz ◽

Konrad Lis ◽

Tomasz Kryjak ◽

Marek Gorgon

Keyword(s):

Object Detection ◽

Point Cloud ◽

Main Part ◽

Point Clouds ◽

Lidar Data ◽

Detection Accuracy ◽

3D Object ◽

Fold Reduction ◽

Low Energy Consumption ◽

3D Object Detection

In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. Techniques like quantisation and pruning available in the Brevitas and PyTorch tools were used. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. The aim of this work was to propose a variant of the network which we will ultimately implement in an FPGA device. This will allow for real-time LiDAR data processing with low energy consumption. The obtained results indicate that even a significant quantisation from 32-bit floating point to 2-bit integer in the main part of the algorithm, results in 5%-9% decrease of the detection accuracy, while allowing for almost a 16-fold reduction in size of the model.

Download Full-text

A Two-Phase Cross-Modality Fusion Network for Robust 3D Object Detection

Sensors ◽

10.3390/s20216043 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6043

Author(s):

Yujun Jiao ◽

Zhishuai Yin

Keyword(s):

Object Detection ◽

Point Cloud ◽

Point Clouds ◽

Second Phase ◽

Two Phase ◽

3D Object ◽

Rgb Images ◽

Fusion Scheme ◽

3D Object Detection ◽

Level Fusion

A two-phase cross-modality fusion detector is proposed in this study for robust and high-precision 3D object detection with RGB images and LiDAR point clouds. First, a two-stream fusion network is built into the framework of Faster RCNN to perform accurate and robust 2D detection. The visible stream takes the RGB images as inputs, while the intensity stream is fed with the intensity maps which are generated by projecting the reflection intensity of point clouds to the front view. A multi-layer feature-level fusion scheme is designed to merge multi-modal features across multiple layers in order to enhance the expressiveness and robustness of the produced features upon which region proposals are generated. Second, a decision-level fusion is implemented by projecting 2D proposals to the space of the point cloud to generate 3D frustums, on the basis of which the second-phase 3D detector is built to accomplish instance segmentation and 3D-box regression on the filtered point cloud. The results on the KITTI benchmark show that features extracted from RGB images and intensity maps complement each other, and our proposed detector achieves state-of-the-art performance on 3D object detection with a substantially lower running time as compared to available competitors.

Download Full-text

PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud

Sensors ◽

10.3390/s21010136 ◽

2020 ◽

Vol 21 (1) ◽

pp. 136

Author(s):

Fangyu Li ◽

Weizheng Jin ◽

Cien Fan ◽

Lian Zou ◽

Qingsheng Chen ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Multi Scale ◽

Backbone Network ◽

3D Object Detection ◽

Different Levels ◽

Fine Branch

3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird’s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.

Download Full-text

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds

Sensors ◽

10.3390/s20030704 ◽

2020 ◽

Vol 20 (3) ◽

pp. 704 ◽

Cited By ~ 6

Author(s):

Hongwu Kuang ◽

Bei Wang ◽

Jianping An ◽

Ming Zhang ◽

Zehan Zhang

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Cloud Data ◽

Multi Scale ◽

Feature Pyramid ◽

Point Data ◽

3D Object Detection

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Download Full-text

Scale-Aware Attention-Based PillarsNet (SAPN) Based 3D Object Detection for Point Cloud

Mathematical Problems in Engineering ◽

10.1155/2020/3927365 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Xiang Song ◽

Weiqin Zhan ◽

Xiaoyu Che ◽

Huilin Jiang ◽

Biao Yang

Keyword(s):

Object Detection ◽

Autonomous Navigation ◽

Point Clouds ◽

Detection Performance ◽

Detection Methods ◽

Feature Maps ◽

3D Object ◽

3D Point Clouds ◽

Detection Approach ◽

3D Object Detection

Three-dimensional object detection can provide precise positions of objects, which can be beneficial to many robotics applications, such as self-driving cars, housekeeping robots, and autonomous navigation. In this work, we focus on accurate object detection in 3D point clouds and propose a new detection pipeline called scale-aware attention-based PillarsNet (SAPN). SAPN is a one-stage 3D object detection approach similar to PointPillar. However, SAPN achieves better performance than PointPillar by introducing the following strategies. First, we extract multiresolution pillar-level features from the point clouds to make the detection approach more scale-aware. Second, a spatial-attention mechanism is used to highlight the object activations in the feature maps, which can improve detection performance. Finally, SE-attention is employed to reweight the features fed into the detection head, which performs 3D object detection in a multitask learning manner. Experiments on the KITTI benchmark show that SAPN achieved similar or better performance compared with several state-of-the-art LiDAR-based 3D detection methods. The ablation study reveals the effectiveness of each proposed strategy. Furthermore, strategies used in this work can be embedded easily into other LiDAR-based 3D detection approaches, which improve their detection performance with slight modifications.

Download Full-text

Optimization of the PointPillars network for 3D object detection in point clouds

10.36227/techrxiv.12593555 ◽

2020 ◽

Author(s):

Joanna Stanisz ◽

Konrad Lis ◽

Tomasz Kryjak ◽

Marek Gorgon

Keyword(s):

Object Detection ◽

Point Cloud ◽

Main Part ◽

Point Clouds ◽

Lidar Data ◽

Detection Accuracy ◽

3D Object ◽

Fold Reduction ◽

Low Energy Consumption ◽

3D Object Detection

In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. Techniques like quantisation and pruning available in the Brevitas and PyTorch tools were used. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. The aim of this work was to propose a variant of the network which we will ultimately implement in an FPGA device. This will allow for real-time LiDAR data processing with low energy consumption. The obtained results indicate that even a significant quantisation from 32-bit floating point to 2-bit integer in the main part of the algorithm, results in 5%-9% decrease of the detection accuracy, while allowing for almost a 16-fold reduction in size of the model.

Download Full-text

CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving

Image and Vision Computing ◽

10.1016/j.imavis.2020.103955 ◽

2020 ◽

Vol 100 ◽

pp. 103955

Author(s):

Dza-Shiang Hong ◽

Hung-Hao Chen ◽

Pei-Yung Hsiao ◽

Li-Chen Fu ◽

Siang-Min Siao

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

3D Object ◽

Rgb Images ◽

3D Object Detection

Download Full-text

KDA3D: Key-Point Densification and Multi-Attention Guidance for 3D Object Detection

Remote Sensing ◽

10.3390/rs12111895 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1895 ◽

Cited By ~ 1

Author(s):

Jiarong Wang ◽

Ming Zhu ◽

Bo Wang ◽

Deyao Sun ◽

Hua Wei ◽

...

Keyword(s):

Object Detection ◽

Network Architecture ◽

Feature Learning ◽

Semantic Segmentation ◽

Point Clouds ◽

Learning Networks ◽

3D Object ◽

Rgb Images ◽

Bounding Boxes ◽

3D Object Detection

In this paper, we propose a novel 3D object detector KDA3D, which achieves high-precision and robust classification, segmentation, and localization with the help of key-point densification and multi-attention guidance. The proposed end-to-end neural network architecture takes LIDAR point clouds as the main inputs that can be optionally complemented by RGB images. It consists of three parts: part-1 segments 3D foreground points and generates reliable proposals; part-2 (optional) enhances point cloud density and reconstructs the more compact full-point feature map; part-3 refines 3D bounding boxes and adds semantic segmentation as extra supervision. Our designed lightweight point-wise and channel-wise attention modules can adaptively strengthen the “skeleton” and “distinctiveness” point-features to help feature learning networks capture more representative or finer patterns. The proposed key-point densification component can generate pseudo-point clouds containing target information from monocular images through the distance preference strategy and K-means clustering so as to balance the density distribution and enrich sparse features. Extensive experiments on the KITTI and nuScenes 3D object detection benchmarks show that our KDA3D produces state-of-the-art results while running in near real-time with a low memory footprint.

Download Full-text

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Sensors ◽

10.3390/s20247243 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7243

Author(s):

Zhiyu Chen ◽

Qiong Lin ◽

Jing Sun ◽

Yujian Feng ◽

Shangdong Liu ◽

...

Keyword(s):

Object Detection ◽

Back Propagation ◽

Point Clouds ◽

Semantic Features ◽

3D Object ◽

Multi Scale ◽

Scale Point ◽

The Difference ◽

Bounding Boxes ◽

3D Object Detection

We focus on exploring the LIDAR-RGB fusion-based 3D object detection in this paper. This task is still challenging in two aspects: (1) the difference of data formats and sensor positions contributes to the misalignment of reasoning between the semantic features of images and the geometric features of point clouds. (2) The optimization of traditional IoU is not equal to the regression loss of bounding boxes, resulting in biased back-propagation for non-overlapping cases. In this work, we propose a cascaded cross-modality fusion network (CCFNet), which includes a cascaded multi-scale fusion module (CMF) and a novel center 3D IoU loss to resolve these two issues. Our CMF module is developed to reinforce the discriminative representation of objects by reasoning the relation of corresponding LIDAR geometric capability and RGB semantic capability of the object from two modalities. Specifically, CMF is added in a cascaded way between the RGB and LIDAR streams, which selects salient points and transmits multi-scale point cloud features to each stage of RGB streams. Moreover, our center 3D IoU loss incorporates the distance between anchor centers to avoid the oversimple optimization for non-overlapping bounding boxes. Extensive experiments on the KITTI benchmark have demonstrated that our proposed approach performs better than the compared methods.

Download Full-text

A Survey on Deep Learning Based Methods and Datasets for Monocular 3D Object Detection

Electronics ◽

10.3390/electronics10040517 ◽

2021 ◽

Vol 10 (4) ◽

pp. 517

Author(s):

Seong-heum Kim ◽

Youngbae Hwang

Keyword(s):

Deep Learning ◽

Object Detection ◽

Low Cost ◽

Detection Methods ◽

Future Research ◽

3D Object ◽

Practical Applications ◽

Depth Sensors ◽

Significant Research ◽

3D Object Detection

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.

Download Full-text