Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

Yingjie Cai; Buyu Li; Zeyu Jiao; Hongsheng Li; Xingyu Zeng; Xiaogang Wang

doi:10.1609/aaai.v34i07.6618

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6618 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10478-10485 ◽

Cited By ~ 3

Author(s):

Yingjie Cai ◽

Buyu Li ◽

Zeyu Jiao ◽

Hongsheng Li ◽

Xingyu Zeng ◽

...

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Physical World ◽

Depth Information ◽

Detection Accuracy ◽

Unified Framework ◽

3D Object ◽

Depth Recovery ◽

Bounding Boxes ◽

3D Object Detection

Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images. Since the location recovery in 3D space is quite difficult on account of absence of depth information, this paper proposes a novel unified framework which decomposes the detection problem into a structured polygon prediction task and a depth recovery task. Different from the widely studied 2D bounding boxes, the proposed novel structured polygon in the 2D image consists of several projected surfaces of the target object. Compared to the widely-used 3D bounding box proposals, it is shown to be a better representation for 3D detection. In order to inversely project the predicted 2D structured polygon to a cuboid in the 3D physical world, the following depth recovery task uses the object height prior to complete the inverse projection transformation with the given camera projection matrix. Moreover, a fine-grained 3D box refinement scheme is proposed to further rectify the 3D detection results. Experiments are conducted on the challenging KITTI benchmark, in which our method achieves state-of-the-art detection accuracy.

Download Full-text

Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360$$^\circ $$∘ Panoramic Imagery

Computer Vision – ECCV 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-01261-8_48 ◽

2018 ◽

pp. 812-830 ◽

Cited By ~ 14

Author(s):

Grégoire Payen de La Garanderie ◽

Amir Atapour Abarghouei ◽

Toby P. Breckon

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Blind Spot ◽

3D Object ◽

Monocular Depth ◽

3D Object Detection

Download Full-text

3D Object Detection Using Scale Invariant and Feature Reweighting Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019267 ◽

2019 ◽

Vol 33 ◽

pp. 9267-9274 ◽

Cited By ~ 6

Author(s):

Xin Zhao ◽

Zhe Liu ◽

Ruolan Hu ◽

Kaiqi Huang

Keyword(s):

Object Detection ◽

Network Architecture ◽

Point Clouds ◽

Scale Invariant ◽

3D Object ◽

Outdoor Scenes ◽

Indoor Scenes ◽

Bounding Boxes ◽

The One ◽

3D Object Detection

3D object detection plays an important role in a large number of real-world applications. It requires us to estimate the localizations and the orientations of 3D objects in real scenes. In this paper, we present a new network architecture which focuses on utilizing the front view images and frustum point clouds to generate 3D detection results. On the one hand, a PointSIFT module is utilized to improve the performance of 3D segmentation. It can capture the information from different orientations in space and the robustness to different scale shapes. On the other hand, our network obtains the useful features and suppresses the features with less information by a SENet module. This module reweights channel features and estimates the 3D bounding boxes more effectively. Our method is evaluated on both KITTI dataset for outdoor scenes and SUN-RGBD dataset for indoor scenes. The experimental results illustrate that our method achieves better performance than the state-of-the-art methods especially when point clouds are highly sparse.

Download Full-text

Task-Aware Monocular Depth Estimation for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6908 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12257-12264 ◽

Cited By ~ 1

Author(s):

Xinlong Wang ◽

Wei Yin ◽

Tao Kong ◽

Yuning Jiang ◽

Lei Li ◽

...

Keyword(s):

Object Detection ◽

State Of The Art ◽

Depth Estimation ◽

3D Perception ◽

Research Attention ◽

3D Object ◽

Depth Prediction ◽

Monocular Depth ◽

Almost All ◽

3D Object Detection

Monocular depth estimation enables 3D perception from a single 2D image, thus attracting much research attention for years. Almost all methods treat foreground and background regions (“things and stuff”) in an image equally. However, not all pixels are equal. Depth of foreground objects plays a crucial role in 3D object recognition and localization. To date how to boost the depth prediction accuracy of foreground objects is rarely discussed. In this paper, we first analyze the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground and background depth using separate optimization objectives and decoders. Our method significantly improves the depth estimation performance on foreground objects. Applying ForeSeE to 3D object detection, we achieve 7.5 AP gains and set new state-of-the-art results among other monocular methods. Code will be available at: https://github.com/WXinlong/ForeSeE.

Download Full-text

Optimization of the PointPillars network for 3D object detection in point clouds

10.36227/techrxiv.12593555.v1 ◽

2020 ◽

Author(s):

Joanna Stanisz ◽

Konrad Lis ◽

Tomasz Kryjak ◽

Marek Gorgon

Keyword(s):

Object Detection ◽

Point Cloud ◽

Main Part ◽

Point Clouds ◽

Lidar Data ◽

Detection Accuracy ◽

3D Object ◽

Fold Reduction ◽

Low Energy Consumption ◽

3D Object Detection

In this paper we present our research on the optimisation of a deep neural network for 3D object detection in a point cloud. Techniques like quantisation and pruning available in the Brevitas and PyTorch tools were used. We performed the experiments for the PointPillars network, which offers a reasonable compromise between detection accuracy and calculation complexity. The aim of this work was to propose a variant of the network which we will ultimately implement in an FPGA device. This will allow for real-time LiDAR data processing with low energy consumption. The obtained results indicate that even a significant quantisation from 32-bit floating point to 2-bit integer in the main part of the algorithm, results in 5%-9% decrease of the detection accuracy, while allowing for almost a 16-fold reduction in size of the model.

Download Full-text

Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth Estimation and 3D Object Detection

10.1049/icp.2021.1442 ◽

2021 ◽

Author(s):

N.-A.-M. Mai ◽

P. Duthon ◽

L. Khoudour ◽

A. Crouzil ◽

S. A. Velastin

Keyword(s):

Object Detection ◽

Depth Estimation ◽

3D Object ◽

3D Object Detection

Download Full-text

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6945 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12557-12564 ◽

Cited By ~ 4

Author(s):

Zhenbo Xu ◽

Wei Zhang ◽

Xiaoqing Ye ◽

Xiao Tan ◽

Wei Yang ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Disparity Estimation ◽

3D Object ◽

Detection Model ◽

Occluded Objects ◽

Bounding Boxes ◽

Detection Quality ◽

3D Object Detection

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

Download Full-text

Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.00864 ◽

2019 ◽

Cited By ~ 66

Author(s):

Yan Wang ◽

Wei-Lun Chao ◽

Divyansh Garg ◽

Bharath Hariharan ◽

Mark Campbell ◽

...

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Autonomous Driving ◽

Visual Depth ◽

3D Object ◽

3D Object Detection

Download Full-text

A Kinect-Based 3D Object Detection and Recognition System with Enhanced Depth Estimation Algorithm

2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) ◽

10.1109/iemcon.2018.8615020 ◽

2018 ◽

Cited By ~ 4

Author(s):

Ahmed Fawzy Elaraby ◽

Ayman Hamdy ◽

Mohamed Rehan

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Estimation Algorithm ◽

Recognition System ◽

3D Object ◽

3D Object Detection ◽

Detection And Recognition

Download Full-text

Confidence Guided Stereo 3D Object Detection with Split Depth Estimation

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9341188 ◽

2020 ◽

Author(s):

Chengyao Li ◽

Jason Ku ◽

Steven L. Waslander

Keyword(s):

Object Detection ◽

Depth Estimation ◽

3D Object ◽

3D Object Detection

Download Full-text

3D object detection: Learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images

Information Sciences ◽

10.1016/j.ins.2018.09.040 ◽

2019 ◽

Vol 476 ◽

pp. 147-158 ◽

Cited By ~ 11

Author(s):

Mohammad Muntasir Rahman ◽

Yanhao Tan ◽

Jian Xue ◽

Ling Shao ◽

Ke Lu

Keyword(s):

Object Detection ◽

3D Object ◽

Bounding Boxes ◽

3D Object Detection

Download Full-text