A Two-Stage Data Association Approach for 3D Multi-Object Tracking

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.

Download Full-text

Strong-Weak Feature Alignment for 3D Object Detection

Electronics ◽

10.3390/electronics10101205 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1205

Author(s):

Zhiyu Wang ◽

Li Wang ◽

Bin Dai

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Representation ◽

Alignment Algorithm ◽

3D Object ◽

3D Point Clouds ◽

Object Feature ◽

3D Object Detection ◽

Feature Alignment

Object detection in 3D point clouds is still a challenging task in autonomous driving. Due to the inherent occlusion and density changes of the point cloud, the data distribution of the same object will change dramatically. Especially, the incomplete data with sparsity or occlusion can not represent the complete characteristics of the object. In this paper, we proposed a novel strong–weak feature alignment algorithm between complete and incomplete objects for 3D object detection, which explores the correlations within the data. It is an end-to-end adaptive network that does not require additional data and can be easily applied to other object detection networks. Through a complete object feature extractor, we achieve a robust feature representation of the object. It serves as a guarding feature to help the incomplete object feature generator to generate effective features. The strong–weak feature alignment algorithm reduces the gap between different states of the same object and enhances the ability to represent the incomplete object. The proposed adaptation framework is validated on the KITTI object benchmark and gets about 6% improvement in detection average precision on 3D moderate difficulty compared to the basic model. The results show that our adaptation method improves the detection performance of incomplete 3D objects.

Download Full-text

Monocular 3D Object Detection for Autonomous Driving

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2016.236 ◽

2016 ◽

Cited By ~ 221

Author(s):

Xiaozhi Chen ◽

Kaustav Kundu ◽

Ziyu Zhang ◽

Huimin Ma ◽

Sanja Fidler ◽

...

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

3D Object ◽

3D Object Detection

Download Full-text

R-CNN Based 3D Object Detection for Autonomous Driving

CICTP 2020 ◽

10.1061/9780784483053.077 ◽

2020 ◽

Author(s):

Hongyu Hu ◽

Tongtong Zhao ◽

Qi Wang ◽

Fei Gao ◽

Lei He

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

3D Object ◽

3D Object Detection

Download Full-text

Stereo R-CNN Based 3D Object Detection for Autonomous Driving

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.00783 ◽

2019 ◽

Cited By ~ 48

Author(s):

Peiliang Li ◽

Xiaozhi Chen ◽

Shaojie Shen

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

3D Object ◽

3D Object Detection

Download Full-text

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6945 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12557-12564 ◽

Cited By ~ 4

Author(s):

Zhenbo Xu ◽

Wei Zhang ◽

Xiaoqing Ye ◽

Xiao Tan ◽

Wei Yang ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Disparity Estimation ◽

3D Object ◽

Detection Model ◽

Occluded Objects ◽

Bounding Boxes ◽

Detection Quality ◽

3D Object Detection

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

Download Full-text

Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.00864 ◽

2019 ◽

Cited By ~ 66

Author(s):

Yan Wang ◽

Wei-Lun Chao ◽

Divyansh Garg ◽

Bharath Hariharan ◽

Mark Campbell ◽

...

Keyword(s):

Object Detection ◽

Depth Estimation ◽

Autonomous Driving ◽

Visual Depth ◽

3D Object ◽

3D Object Detection

Download Full-text

3D object detection and tracking for autonomous driving

10.14711/thesis-991012879761803412 ◽

2020 ◽

Author(s):

Peiliang Li

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

3D Object ◽

Object Detection And Tracking ◽

Detection And Tracking ◽

3D Object Detection

Download Full-text

PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud

Sensors ◽

10.3390/s21010136 ◽

2020 ◽

Vol 21 (1) ◽

pp. 136

Author(s):

Fangyu Li ◽

Weizheng Jin ◽

Cien Fan ◽

Lian Zou ◽

Qingsheng Chen ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Multi Scale ◽

Backbone Network ◽

3D Object Detection ◽

Different Levels ◽

Fine Branch

3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird’s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.

Download Full-text

A Survey on 3D Object Detection Methods for Autonomous Driving Applications

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2019.2892405 ◽

2019 ◽

Vol 20 (10) ◽

pp. 3782-3795 ◽

Cited By ~ 38

Author(s):

Eduardo Arnold ◽

Omar Y. Al-Jarrah ◽

Mehrdad Dianati ◽

Saber Fallah ◽

David Oxtoby ◽

...

Keyword(s):

Object Detection ◽

Autonomous Driving ◽

Detection Methods ◽

3D Object ◽

3D Object Detection

Download Full-text

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds

Sensors ◽

10.3390/s20030704 ◽

2020 ◽

Vol 20 (3) ◽

pp. 704 ◽

Cited By ~ 6

Author(s):

Hongwu Kuang ◽

Bei Wang ◽

Jianping An ◽

Ming Zhang ◽

Zehan Zhang

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Cloud Data ◽

Multi Scale ◽

Feature Pyramid ◽

Point Data ◽

3D Object Detection

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Download Full-text