A Two-Phase Cross-Modality Fusion Network for Robust 3D Object Detection

A two-phase cross-modality fusion detector is proposed in this study for robust and high-precision 3D object detection with RGB images and LiDAR point clouds. First, a two-stream fusion network is built into the framework of Faster RCNN to perform accurate and robust 2D detection. The visible stream takes the RGB images as inputs, while the intensity stream is fed with the intensity maps which are generated by projecting the reflection intensity of point clouds to the front view. A multi-layer feature-level fusion scheme is designed to merge multi-modal features across multiple layers in order to enhance the expressiveness and robustness of the produced features upon which region proposals are generated. Second, a decision-level fusion is implemented by projecting 2D proposals to the space of the point cloud to generate 3D frustums, on the basis of which the second-phase 3D detector is built to accomplish instance segmentation and 3D-box regression on the filtered point cloud. The results on the KITTI benchmark show that features extracted from RGB images and intensity maps complement each other, and our proposed detector achieves state-of-the-art performance on 3D object detection with a substantially lower running time as compared to available competitors.

Download Full-text

Real-Time 3D object detection using improved convolutional neural network based on image-driven point cloud

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096514666211026142721 ◽

2021 ◽

Vol 14 ◽

Author(s):

Zhiyong Gao ◽

Jianhong Xiang

Keyword(s):

Neural Network ◽

Object Detection ◽

Convolutional Neural Network ◽

Real Time ◽

Point Cloud ◽

Point Clouds ◽

3D Point Cloud ◽

3D Object ◽

3D Object Detection ◽

Instance Segmentation

Background: While detecting the object directly from the 3D point cloud, the natural 3D patterns and invariance of 3D data are often obscure. Objective: In this work, we aimed at studying the 3D object detection from discrete, disordered and sparse 3D point clouds. Methods: The CNN is composed of the frustum sequence module, 3D instance segmentation module S-NET, 3D point cloud transformation module T-NET, and 3D boundary box estimation module E-NET. The search space of the object is determined by the frustum sequence module. The instance segmentation of the point cloud is performed by the 3D instance segmentation module. The 3D coordinates of the object are confirmed by the transformation module and the 3D bounding box estimation module. Results: Evaluated on KITTI benchmark dataset, our method outperforms the state of the art by remarkable margins while having real-time capability. Conclusion: We achieve real-time 3D object detection by proposing an improved convolutional neural network (CNN) based on image-driven point clouds.

Download Full-text

Optimization of the PointPillars network for 3D object detection in point clouds

10.36227/techrxiv.12593555.v1 ◽

2020 ◽

Author(s):

Joanna Stanisz ◽

Konrad Lis ◽

Tomasz Kryjak ◽

Marek Gorgon

Keyword(s):

Object Detection ◽

Point Cloud ◽

Main Part ◽

Point Clouds ◽

Lidar Data ◽

Detection Accuracy ◽

3D Object ◽

Fold Reduction ◽

Low Energy Consumption ◽

3D Object Detection

In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. Techniques like quantisation and pruning available in the Brevitas and PyTorch tools were used. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. The aim of this work was to propose a variant of the network which we will ultimately implement in an FPGA device. This will allow for real-time LiDAR data processing with low energy consumption. The obtained results indicate that even a significant quantisation from 32-bit floating point to 2-bit integer in the main part of the algorithm, results in 5%-9% decrease of the detection accuracy, while allowing for almost a 16-fold reduction in size of the model.

Download Full-text

A Novel Regional Fusion Network for 3D Object Detection based on RGB Images and Point Clouds

10.5121/csit.2021.111812 ◽

2021 ◽

Author(s):

Hung-Hao Chen ◽

Chia-Hung Wang ◽

Hsueh-Wei Chen ◽

Pei-Yung Hsiao ◽

Li-Chen Fu ◽

...

Keyword(s):

Object Detection ◽

Receptive Fields ◽

Point Clouds ◽

Detection Methods ◽

Lidar Data ◽

3D Object ◽

Multi Scale ◽

Interest Level ◽

Rgb Images ◽

3D Object Detection

The current fusion-based methods transform LiDAR data into bird’s eye view (BEV) representations or 3D voxel, leading to information loss and heavy computation cost of 3D convolution. In contrast, we directly consume raw point clouds and perform fusion between two modalities. We employ the concept of region proposal network to generate proposals from two streams, respectively. In order to make two sensors compensate the weakness of each other, we utilize the calibration parameters to project proposals from one stream onto the other. With the proposed multi-scale feature aggregation module, we are able to combine the extracted regionof-interest-level (RoI-level) features of RGB stream from different receptive fields, resulting in fertilizing feature richness. Experiments on KITTI dataset show that our proposed network outperforms other fusion-based methods with meaningful improvements as compared to 3D object detection methods under challenging setting.

Download Full-text

Optimization of the PointPillars network for 3D object detection in point clouds

10.36227/techrxiv.12593555 ◽

2020 ◽

Author(s):

Joanna Stanisz ◽

Konrad Lis ◽

Tomasz Kryjak ◽

Marek Gorgon

Keyword(s):

Object Detection ◽

Point Cloud ◽

Main Part ◽

Point Clouds ◽

Lidar Data ◽

Detection Accuracy ◽

3D Object ◽

Fold Reduction ◽

Low Energy Consumption ◽

3D Object Detection

Download Full-text

CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving

Image and Vision Computing ◽

10.1016/j.imavis.2020.103955 ◽

2020 ◽

Vol 100 ◽

pp. 103955

Author(s):

Dza-Shiang Hong ◽

Hung-Hao Chen ◽

Pei-Yung Hsiao ◽

Li-Chen Fu ◽

Siang-Min Siao

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

3D Object ◽

Rgb Images ◽

3D Object Detection

Download Full-text

KDA3D: Key-Point Densification and Multi-Attention Guidance for 3D Object Detection

Remote Sensing ◽

10.3390/rs12111895 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1895 ◽

Cited By ~ 1

Author(s):

Jiarong Wang ◽

Ming Zhu ◽

Bo Wang ◽

Deyao Sun ◽

Hua Wei ◽

...

Keyword(s):

Object Detection ◽

Network Architecture ◽

Feature Learning ◽

Semantic Segmentation ◽

Point Clouds ◽

Learning Networks ◽

3D Object ◽

Rgb Images ◽

Bounding Boxes ◽

3D Object Detection

In this paper, we propose a novel 3D object detector KDA3D, which achieves high-precision and robust classification, segmentation, and localization with the help of key-point densification and multi-attention guidance. The proposed end-to-end neural network architecture takes LIDAR point clouds as the main inputs that can be optionally complemented by RGB images. It consists of three parts: part-1 segments 3D foreground points and generates reliable proposals; part-2 (optional) enhances point cloud density and reconstructs the more compact full-point feature map; part-3 refines 3D bounding boxes and adds semantic segmentation as extra supervision. Our designed lightweight point-wise and channel-wise attention modules can adaptively strengthen the “skeleton” and “distinctiveness” point-features to help feature learning networks capture more representative or finer patterns. The proposed key-point densification component can generate pseudo-point clouds containing target information from monocular images through the distance preference strategy and K-means clustering so as to balance the density distribution and enrich sparse features. Extensive experiments on the KITTI and nuScenes 3D object detection benchmarks show that our KDA3D produces state-of-the-art results while running in near real-time with a low memory footprint.

Download Full-text

Learning Deformable Network for 3D Object Detection on Point Clouds

Mobile Information Systems ◽

10.1155/2021/3163470 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Wanyi Zhang ◽

Xiuhua Fu ◽

Wei Li

Keyword(s):

Object Detection ◽

Point Cloud ◽

Three Dimensional ◽

Ground Truth ◽

Point Clouds ◽

Detection Accuracy ◽

3D Object ◽

Cloud Data ◽

Dimensional Object ◽

3D Object Detection

3D object detection based on point cloud data in the unmanned driving scene has always been a research hotspot in unmanned driving sensing technology. With the development and maturity of deep neural networks technology, the method of using neural network to detect three-dimensional object target begins to show great advantages. The experimental results show that the mismatch between anchor and training samples would affect the detection accuracy, but it has not been well solved. The contributions of this paper are as follows. For the first time, deformable convolution is introduced into the point cloud object detection network, which enhances the adaptability of the network to vehicles with different directions and shapes. Secondly, a new generation method of anchor in RPN is proposed, which can effectively prevent the mismatching between the anchor and ground truth and remove the angle classification loss in the loss function. Compared with the state-of-the-art method, the AP and AOS of the detection results are improved.

Download Full-text

Strong-Weak Feature Alignment for 3D Object Detection

Electronics ◽

10.3390/electronics10101205 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1205

Author(s):

Zhiyu Wang ◽

Li Wang ◽

Bin Dai

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Representation ◽

Alignment Algorithm ◽

3D Object ◽

3D Point Clouds ◽

Object Feature ◽

3D Object Detection ◽

Feature Alignment

Object detection in 3D point clouds is still a challenging task in autonomous driving. Due to the inherent occlusion and density changes of the point cloud, the data distribution of the same object will change dramatically. Especially, the incomplete data with sparsity or occlusion can not represent the complete characteristics of the object. In this paper, we proposed a novel strong–weak feature alignment algorithm between complete and incomplete objects for 3D object detection, which explores the correlations within the data. It is an end-to-end adaptive network that does not require additional data and can be easily applied to other object detection networks. Through a complete object feature extractor, we achieve a robust feature representation of the object. It serves as a guarding feature to help the incomplete object feature generator to generate effective features. The strong–weak feature alignment algorithm reduces the gap between different states of the same object and enhances the ability to represent the incomplete object. The proposed adaptation framework is validated on the KITTI object benchmark and gets about 6% improvement in detection average precision on 3D moderate difficulty compared to the basic model. The results show that our adaptation method improves the detection performance of incomplete 3D objects.

Download Full-text