Task-Aware Monocular Depth Estimation for 3D Object Detection

Xinlong Wang; Wei Yin; Tao Kong; Yuning Jiang; Lei Li; Chunhua Shen

doi:10.1609/aaai.v34i07.6908

Task-Aware Monocular Depth Estimation for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6908 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12257-12264 ◽

Cited By ~ 1

Author(s):

Xinlong Wang ◽

Wei Yin ◽

Tao Kong ◽

Yuning Jiang ◽

Lei Li ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Depth Estimation ◽

3D Perception ◽

Research Attention ◽

3D Object ◽

Depth Prediction ◽

Monocular Depth ◽

Almost All ◽

3D Object Detection

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.

Download Full-text

Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360$$^\circ $$∘ Panoramic Imagery

Computer Vision – ECCV 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01261-8_48 ◽

2018 ◽

pp. 812-830 ◽

Cited By ~ 14

Author(s):

Grégoire Payen de La Garanderie ◽

Amir Atapour Abarghouei ◽

Toby P. Breckon

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Blind Spot ◽

3D Object ◽

Monocular Depth ◽

3D Object Detection

Download Full-text

Deep Optics for Monocular Depth Estimation and 3D Object Detection

2019 IEEE/CVF International Conference on Computer Vision (ICCV) ◽

10.1109/iccv.2019.01029 ◽

2019 ◽

Cited By ~ 16

Author(s):

Julie Chang ◽

Gordon Wetzstein

Keyword(s):

Object Detection ◽

Depth Estimation ◽

3D Object ◽

Monocular Depth ◽

3D Object Detection

Download Full-text

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6618 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10478-10485 ◽

Cited By ~ 3

Author(s):

Yingjie Cai ◽

Buyu Li ◽

Zeyu Jiao ◽

Hongsheng Li ◽

Xingyu Zeng ◽

...

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Physical World ◽

Depth Information ◽

Detection Accuracy ◽

Unified Framework ◽

3D Object ◽

Depth Recovery ◽

Bounding Boxes ◽

3D Object Detection

Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images. Since the location recovery in 3D space is quite difficult on account of absence of depth information, this paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task. Different from the widely studied 2D bounding boxes, the proposed novel structured polygon in the 2D image consists of several projected surfaces of the target object. Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection. In order to inversely project the predicted 2D structured polygon to a cuboid in the 3D physical world, the following depth recovery task uses the object height prior to complete the inverse projection transformation with the given camera projection matrix. Moreover, a fine-grained 3D box refinement scheme is proposed to further rectify the 3D detection results. Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.

Download Full-text

Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimation and 3D Object Detection

10.1049/icp.2021.1442 ◽

2021 ◽

Author(s):

N.-A.-M. Mai ◽

P. Duthon ◽

L. Khoudour ◽

A. Crouzil ◽

S. A. Velastin

Keyword(s):

Object Detection ◽

Depth Estimation ◽

3D Object ◽

3D Object Detection

Download Full-text

PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6933 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12460-12467

Author(s):

Liang Xie ◽

Chao Xiang ◽

Zhengxu Yu ◽

Guodong Xu ◽

Zheng Yang ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Point Clouds ◽

Semantic Features ◽

Feature Maps ◽

3D Object ◽

Detection Algorithms ◽

Full Resolution ◽

Fusion Methods ◽

3D Object Detection

LIDAR point clouds and RGB-images are both extremely essential for 3D object detection. So many state-of-the-art 3D detection algorithms dedicate in fusing these two types of data effectively. However, their fusion methods based on Bird's Eye View (BEV) or voxel format are not accurate. In this paper, we propose a novel fusion approach named Point-based Attentive Cont-conv Fusion(PACF) module, which fuses multi-sensor features directly on 3D points. Except for continuous convolution, we additionally add a Point-Pooling and an Attentive Aggregation to make the fused features more expressive. Moreover, based on the PACF module, we propose a 3D multi-sensor multi-task network called Pointcloud-Image RCNN(PI-RCNN as brief), which handles the image segmentation and 3D object detection tasks. PI-RCNN employs a segmentation sub-network to extract full-resolution semantic feature maps from images and then fuses the multi-sensor features via powerful PACF module. Beneficial from the effectiveness of the PACF module and the expressive semantic features from the segmentation module, PI-RCNN can improve much in 3D object detection. We demonstrate the effectiveness of the PACF module and PI-RCNN on the KITTI 3D Detection benchmark, and our method can achieve state-of-the-art on the metric of 3D AP.

Download Full-text

Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.00864 ◽

2019 ◽

Cited By ~ 66

Author(s):

Yan Wang ◽

Wei-Lun Chao ◽

Divyansh Garg ◽

Bharath Hariharan ◽

Mark Campbell ◽

...

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Autonomous Driving ◽

Visual Depth ◽

3D Object ◽

3D Object Detection

Download Full-text

A Kinect-Based 3D Object Detection and Recognition System with Enhanced Depth Estimation Algorithm

2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) ◽

10.1109/iemcon.2018.8615020 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ahmed Fawzy Elaraby ◽

Ayman Hamdy ◽

Mohamed Rehan

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Estimation Algorithm ◽

Recognition System ◽

3D Object ◽

3D Object Detection ◽

Detection And Recognition

Download Full-text

Kernel Point Convolution LSTM Networks for Radar Point Cloud Segmentation

Applied Sciences ◽

10.3390/app11062599 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2599

Author(s):

Felix Nobis ◽

Felix Fent ◽

Johannes Betz ◽

Markus Lienkamp

Keyword(s):

Object Detection ◽

Point Cloud ◽

State Of The Art ◽

Radar Data ◽

Quality Data ◽

Data Set ◽

3D Object ◽

The Public ◽

3D Object Detection

State-of-the-art 3D object detection for autonomous driving is achieved by processing lidar sensor data with deep-learning methods. However, the detection quality of the state of the art is still far from enabling safe driving in all conditions. Additional sensor modalities need to be used to increase the confidence and robustness of the overall detection result. Researchers have recently explored radar data as an additional input source for universal 3D object detection. This paper proposes artificial neural network architectures to segment sparse radar point cloud data. Segmentation is an intermediate step towards radar object detection as a complementary concept to lidar object detection. Conceptually, we adapt Kernel Point Convolution (KPConv) layers for radar data. Additionally, we introduce a long short-term memory (LSTM) variant based on KPConv layers to make use of the information content in the time dimension of radar data. This is motivated by classical radar processing, where tracking of features over time is imperative to generate confident object proposals. We benchmark several variants of the network on the public nuScenes data set against a state-of-the-art pointnet-based approach. The performance of the networks is limited by the quality of the publicly available data. The radar data and radar-label quality is of great importance to the training and evaluation of machine learning models. Therefore, the advantages and disadvantages of the available data set, regarding its radar data, are discussed in detail. The need for a radar-focused data set for object detection is expressed. We assume that higher segmentation scores should be achievable with better-quality data for all models compared, and differences between the models should manifest more clearly. To facilitate research with additional radar data, the modular code for this research will be made available to the public.

Download Full-text

Confidence Guided Stereo 3D Object Detection with Split Depth Estimation

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9341188 ◽

2020 ◽

Author(s):

Chengyao Li ◽

Jason Ku ◽

Steven L. Waslander

Keyword(s):

Object Detection ◽

Depth Estimation ◽

3D Object ◽

3D Object Detection

Download Full-text

2D-to-3D Projection for Monocular and Multi-View 3D Multi-class Object Detection in Indoor Scenes

PROGRAMMNAYA INGENERIA ◽

10.17587/prin.12.459-469 ◽

2021 ◽

Vol 12 (9) ◽

pp. 459-469

Author(s):

D. D. Rukhovich ◽

Keyword(s):

Path Planning ◽

Object Detection ◽

Mobile Robots ◽

3D Object ◽

Indoor Scenes ◽

Novel Method ◽

Semantic Scene ◽

Almost All ◽

3D Object Detection ◽

2D To 3D

In this paper, we propose a novel method of joint 3D object detection and room layout estimation. The proposed method surpasses all existing methods of 3D object detection from monocular images on the indoor SUN RGB-D dataset. Moreover, the proposed method shows competitive results on the ScanNet dataset in multi-view mode. Both these datasets are collected in various residential, administrative, educational and industrial spaces, and altogether they cover almost all possible use cases. Moreover, we are the first to formulate and solve a problem of multi-class 3D object detection from multi-view inputs in indoor scenes. The proposed method can be integrated into the controlling systems of mobile robots. The results of this study can be used to address a navigation task, as well as path planning, capturing and manipulating scene objects, and semantic scene mapping.

Download Full-text