scholarly journals A Survey on Deep Learning Based Methods and Datasets for Monocular 3D Object Detection

Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 517
Author(s):  
Seong-heum Kim ◽  
Youngbae Hwang

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.

Author(s):  
Georgios Zamanakos ◽  
Lazaros Tsochatzidis ◽  
Angelos Amanatiadis ◽  
Ioannis Pratikakis

2021 ◽  
Author(s):  
Hung-Hao Chen ◽  
Chia-Hung Wang ◽  
Hsueh-Wei Chen ◽  
Pei-Yung Hsiao ◽  
Li-Chen Fu ◽  
...  

The current fusion-based methods transform LiDAR data into bird’s eye view (BEV) representations or 3D voxel, leading to information loss and heavy computation cost of 3D convolution. In contrast, we directly consume raw point clouds and perform fusion between two modalities. We employ the concept of region proposal network to generate proposals from two streams, respectively. In order to make two sensors compensate the weakness of each other, we utilize the calibration parameters to project proposals from one stream onto the other. With the proposed multi-scale feature aggregation module, we are able to combine the extracted regionof-interest-level (RoI-level) features of RGB stream from different receptive fields, resulting in fertilizing feature richness. Experiments on KITTI dataset show that our proposed network outperforms other fusion-based methods with meaningful improvements as compared to 3D object detection methods under challenging setting.


Author(s):  
Wadim Kehl ◽  
Fausto Milletari ◽  
Federico Tombari ◽  
Slobodan Ilic ◽  
Nassir Navab

2019 ◽  
Vol 20 (10) ◽  
pp. 3782-3795 ◽  
Author(s):  
Eduardo Arnold ◽  
Omar Y. Al-Jarrah ◽  
Mehrdad Dianati ◽  
Saber Fallah ◽  
David Oxtoby ◽  
...  

2020 ◽  
Vol 1518 ◽  
pp. 012049
Author(s):  
Junhui Wu ◽  
Dong Yin ◽  
Jie Chen ◽  
Yusheng Wu ◽  
Huiping Si ◽  
...  

2021 ◽  
Vol 13 (16) ◽  
pp. 3099
Author(s):  
Stephan Nebiker ◽  
Jonas Meyer ◽  
Stefan Blaser ◽  
Manuela Ammann ◽  
Severin Rhyner

A successful application of low-cost 3D cameras in combination with artificial intelligence (AI)-based 3D object detection algorithms to outdoor mobile mapping would offer great potential for numerous mapping, asset inventory, and change detection tasks in the context of smart cities. This paper presents a mobile mapping system mounted on an electric tricycle and a procedure for creating on-street parking statistics, which allow government agencies and policy makers to verify and adjust parking policies in different city districts. Our method combines georeferenced red-green-blue-depth (RGB-D) imagery from two low-cost 3D cameras with state-of-the-art 3D object detection algorithms for extracting and mapping parked vehicles. Our investigations demonstrate the suitability of the latest generation of low-cost 3D cameras for real-world outdoor applications with respect to supported ranges, depth measurement accuracy, and robustness under varying lighting conditions. In an evaluation of suitable algorithms for detecting vehicles in the noisy and often incomplete 3D point clouds from RGB-D cameras, the 3D object detection network PointRCNN, which extends region-based convolutional neural networks (R-CNNs) to 3D point clouds, clearly outperformed all other candidates. The results of a mapping mission with 313 parking spaces show that our method is capable of reliably detecting parked cars with a precision of 100% and a recall of 97%. It can be applied to unslotted and slotted parking and different parking types including parallel, perpendicular, and angle parking.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Xiang Song ◽  
Weiqin Zhan ◽  
Xiaoyu Che ◽  
Huilin Jiang ◽  
Biao Yang

Three-dimensional object detection can provide precise positions of objects, which can be beneficial to many robotics applications, such as self-driving cars, housekeeping robots, and autonomous navigation. In this work, we focus on accurate object detection in 3D point clouds and propose a new detection pipeline called scale-aware attention-based PillarsNet (SAPN). SAPN is a one-stage 3D object detection approach similar to PointPillar. However, SAPN achieves better performance than PointPillar by introducing the following strategies. First, we extract multiresolution pillar-level features from the point clouds to make the detection approach more scale-aware. Second, a spatial-attention mechanism is used to highlight the object activations in the feature maps, which can improve detection performance. Finally, SE-attention is employed to reweight the features fed into the detection head, which performs 3D object detection in a multitask learning manner. Experiments on the KITTI benchmark show that SAPN achieved similar or better performance compared with several state-of-the-art LiDAR-based 3D detection methods. The ablation study reveals the effectiveness of each proposed strategy. Furthermore, strategies used in this work can be embedded easily into other LiDAR-based 3D detection approaches, which improve their detection performance with slight modifications.


Sign in / Sign up

Export Citation Format

Share Document