Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.

Download Full-text

Strong-Weak Feature Alignment for 3D Object Detection

Electronics ◽

10.3390/electronics10101205 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1205

Author(s):

Zhiyu Wang ◽

Li Wang ◽

Bin Dai

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Representation ◽

Alignment Algorithm ◽

3D Object ◽

3D Point Clouds ◽

Object Feature ◽

3D Object Detection ◽

Feature Alignment

Object detection in 3D point clouds is still a challenging task in autonomous driving. Due to the inherent occlusion and density changes of the point cloud, the data distribution of the same object will change dramatically. Especially, the incomplete data with sparsity or occlusion can not represent the complete characteristics of the object. In this paper, we proposed a novel strong–weak feature alignment algorithm between complete and incomplete objects for 3D object detection, which explores the correlations within the data. It is an end-to-end adaptive network that does not require additional data and can be easily applied to other object detection networks. Through a complete object feature extractor, we achieve a robust feature representation of the object. It serves as a guarding feature to help the incomplete object feature generator to generate effective features. The strong–weak feature alignment algorithm reduces the gap between different states of the same object and enhances the ability to represent the incomplete object. The proposed adaptation framework is validated on the KITTI object benchmark and gets about 6% improvement in detection average precision on 3D moderate difficulty compared to the basic model. The results show that our adaptation method improves the detection performance of incomplete 3D objects.

Download Full-text

Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360$$^\circ $$∘ Panoramic Imagery

Computer Vision – ECCV 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01261-8_48 ◽

2018 ◽

pp. 812-830 ◽

Cited By ~ 14

Author(s):

Grégoire Payen de La Garanderie ◽

Amir Atapour Abarghouei ◽

Toby P. Breckon

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Blind Spot ◽

3D Object ◽

Monocular Depth ◽

3D Object Detection

Download Full-text

Monocular 3D Object Detection for Autonomous Driving

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2016.236 ◽

2016 ◽

Cited By ~ 221

Author(s):

Xiaozhi Chen ◽

Kaustav Kundu ◽

Ziyu Zhang ◽

Huimin Ma ◽

Sanja Fidler ◽

...

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

3D Object ◽

3D Object Detection

Download Full-text

Task-Aware Monocular Depth Estimation for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6908 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12257-12264 ◽

Cited By ~ 1

Author(s):

Xinlong Wang ◽

Wei Yin ◽

Tao Kong ◽

Yuning Jiang ◽

Lei Li ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Depth Estimation ◽

3D Perception ◽

Research Attention ◽

3D Object ◽

Depth Prediction ◽

Monocular Depth ◽

Almost All ◽

3D Object Detection

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.

Download Full-text

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6618 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10478-10485 ◽

Cited By ~ 3

Author(s):

Yingjie Cai ◽

Buyu Li ◽

Zeyu Jiao ◽

Hongsheng Li ◽

Xingyu Zeng ◽

...

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Physical World ◽

Depth Information ◽

Detection Accuracy ◽

Unified Framework ◽

3D Object ◽

Depth Recovery ◽

Bounding Boxes ◽

3D Object Detection

Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images. Since the location recovery in 3D space is quite difficult on account of absence of depth information, this paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task. Different from the widely studied 2D bounding boxes, the proposed novel structured polygon in the 2D image consists of several projected surfaces of the target object. Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection. In order to inversely project the predicted 2D structured polygon to a cuboid in the 3D physical world, the following depth recovery task uses the object height prior to complete the inverse projection transformation with the given camera projection matrix. Moreover, a fine-grained 3D box refinement scheme is proposed to further rectify the 3D detection results. Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.

Download Full-text

Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimation and 3D Object Detection

10.1049/icp.2021.1442 ◽

2021 ◽

Author(s):

N.-A.-M. Mai ◽

P. Duthon ◽

L. Khoudour ◽

A. Crouzil ◽

S. A. Velastin

Keyword(s):

Object Detection ◽

Depth Estimation ◽

3D Object ◽

3D Object Detection

Download Full-text

R-CNN Based 3D Object Detection for Autonomous Driving

CICTP 2020 ◽

10.1061/9780784483053.077 ◽

2020 ◽

Author(s):

Hongyu Hu ◽

Tongtong Zhao ◽

Qi Wang ◽

Fei Gao ◽

Lei He

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

3D Object ◽

3D Object Detection

Download Full-text

Stereo R-CNN Based 3D Object Detection for Autonomous Driving

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.00783 ◽

2019 ◽

Cited By ~ 48

Author(s):

Peiliang Li ◽

Xiaozhi Chen ◽

Shaojie Shen

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

3D Object ◽

3D Object Detection

Download Full-text

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6945 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12557-12564 ◽

Cited By ~ 4

Author(s):

Zhenbo Xu ◽

Wei Zhang ◽

Xiaoqing Ye ◽

Xiao Tan ◽

Wei Yang ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Disparity Estimation ◽

3D Object ◽

Detection Model ◽

Occluded Objects ◽

Bounding Boxes ◽

Detection Quality ◽

3D Object Detection

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

Download Full-text