Strong-Weak Feature Alignment for 3D Object Detection

Zhiyu Wang; Li Wang; Bin Dai

doi:10.3390/electronics10101205

Strong-Weak Feature Alignment for 3D Object Detection

Electronics ◽

10.3390/electronics10101205 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1205

Author(s):

Zhiyu Wang ◽

Li Wang ◽

Bin Dai

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Representation ◽

Alignment Algorithm ◽

3D Object ◽

3D Point Clouds ◽

Object Feature ◽

3D Object Detection ◽

Feature Alignment

Object detection in 3D point clouds is still a challenging task in autonomous driving. Due to the inherent occlusion and density changes of the point cloud, the data distribution of the same object will change dramatically. Especially, the incomplete data with sparsity or occlusion can not represent the complete characteristics of the object. In this paper, we proposed a novel strong–weak feature alignment algorithm between complete and incomplete objects for 3D object detection, which explores the correlations within the data. It is an end-to-end adaptive network that does not require additional data and can be easily applied to other object detection networks. Through a complete object feature extractor, we achieve a robust feature representation of the object. It serves as a guarding feature to help the incomplete object feature generator to generate effective features. The strong–weak feature alignment algorithm reduces the gap between different states of the same object and enhances the ability to represent the incomplete object. The proposed adaptation framework is validated on the KITTI object benchmark and gets about 6% improvement in detection average precision on 3D moderate difficulty compared to the basic model. The results show that our adaptation method improves the detection performance of incomplete 3D objects.

ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6945 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12557-12564 ◽

Cited By ~ 4

Author(s):

Zhenbo Xu ◽

Wei Zhang ◽

Xiaoqing Ye ◽

Xiao Tan ◽

Wei Yang ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Disparity Estimation ◽

3D Object ◽

Detection Model ◽

Occluded Objects ◽

Bounding Boxes ◽

Detection Quality ◽

3D Object Detection

3D object detection is an essential task in autonomous driving and robotics. Though great progress has been made, challenges remain in estimating 3D pose for distant and occluded objects. In this paper, we present a novel framework named ZoomNet for stereo imagery-based 3D detection. The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes. To further exploit the abundant texture cues in rgb images for more accurate disparity estimation, we introduce a conceptually straight-forward module – adaptive zooming, which simultaneously resizes 2D instance bounding boxes to a unified resolution and adjusts the camera intrinsic parameters accordingly. In this way, we are able to estimate higher-quality disparity maps from the resized box images then construct dense point clouds for both nearby and distant objects. Moreover, we introduce to learn part locations as complementary features to improve the resistance against occlusion and put forward the 3D fitting score to better estimate the 3D detection quality. Extensive experiments on the popular KITTI 3D detection dataset indicate ZoomNet surpasses all previous state-of-the-art methods by large margins (improved by 9.4% on APbv (IoU=0.7) over pseudo-LiDAR). Ablation study also demonstrates that our adaptive zooming strategy brings an improvement of over 10% on AP3d (IoU=0.7). In addition, since the official KITTI benchmark lacks fine-grained annotations like pixel-wise part locations, we also present our KFG dataset by augmenting KITTI with detailed instance-wise annotations including pixel-wise part location, pixel-wise disparity, etc.. Both the KFG dataset and our codes will be publicly available at https://github.com/detectRecog/ZoomNet.

PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud

Sensors ◽

10.3390/s21010136 ◽

2020 ◽

Vol 21 (1) ◽

pp. 136

Author(s):

Fangyu Li ◽

Weizheng Jin ◽

Cien Fan ◽

Lian Zou ◽

Qingsheng Chen ◽

...

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Multi Scale ◽

Backbone Network ◽

3D Object Detection ◽

Different Levels ◽

Fine Branch

3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird’s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds

Sensors ◽

10.3390/s20030704 ◽

2020 ◽

Vol 20 (3) ◽

pp. 704 ◽

Cited By ~ 6

Author(s):

Hongwu Kuang ◽

Bei Wang ◽

Jianping An ◽

Ming Zhang ◽

Zehan Zhang

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Cloud Data ◽

Multi Scale ◽

Feature Pyramid ◽

Point Data ◽

3D Object Detection

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Outdoor Mobile Mapping and AI-Based 3D Object Detection with Low-Cost RGB-D Cameras: The Use Case of On-Street Parking Statistics

Remote Sensing ◽

10.3390/rs13163099 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3099

Author(s):

Stephan Nebiker ◽

Jonas Meyer ◽

Stefan Blaser ◽

Manuela Ammann ◽

Severin Rhyner

Keyword(s):

Object Detection ◽

Smart Cities ◽

Low Cost ◽

Point Clouds ◽

Mobile Mapping ◽

3D Object ◽

Depth Measurement ◽

Detection Algorithms ◽

3D Point Clouds ◽

3D Object Detection

A successful application of low-cost 3D cameras in combination with artificial intelligence (AI)-based 3D object detection algorithms to outdoor mobile mapping would offer great potential for numerous mapping, asset inventory, and change detection tasks in the context of smart cities. This paper presents a mobile mapping system mounted on an electric tricycle and a procedure for creating on-street parking statistics, which allow government agencies and policy makers to verify and adjust parking policies in different city districts. Our method combines georeferenced red-green-blue-depth (RGB-D) imagery from two low-cost 3D cameras with state-of-the-art 3D object detection algorithms for extracting and mapping parked vehicles. Our investigations demonstrate the suitability of the latest generation of low-cost 3D cameras for real-world outdoor applications with respect to supported ranges, depth measurement accuracy, and robustness under varying lighting conditions. In an evaluation of suitable algorithms for detecting vehicles in the noisy and often incomplete 3D point clouds from RGB-D cameras, the 3D object detection network PointRCNN, which extends region-based convolutional neural networks (R-CNNs) to 3D point clouds, clearly outperformed all other candidates. The results of a mapping mission with 313 parking spaces show that our method is capable of reliably detecting parked cars with a precision of 100% and a recall of 97%. It can be applied to unslotted and slotted parking and different parking types including parallel, perpendicular, and angle parking.

3D Object Detection Using Frustums and Attention Modules for Images and Point Clouds

Signals ◽

10.3390/signals2010009 ◽

2021 ◽

Vol 2 (1) ◽

pp. 98-107

Author(s):

Yiran Li ◽

Han Xie ◽

Hyunchul Shin

Keyword(s):

Object Detection ◽

Three Dimensional ◽

Point Clouds ◽

Practical Method ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

On The Road ◽

3D Object Detection ◽

Hard Cases

Three-dimensional (3D) object detection is essential in autonomous driving. Three-dimensional (3D) Lidar sensor can capture three-dimensional objects, such as vehicles, cycles, pedestrians, and other objects on the road. Although Lidar can generate point clouds in 3D space, it still lacks the fine resolution of 2D information. Therefore, Lidar and camera fusion has gradually become a practical method for 3D object detection. Previous strategies focused on the extraction of voxel points and the fusion of feature maps. However, the biggest challenge is in extracting enough edge information to detect small objects. To solve this problem, we found that attention modules are beneficial in detecting small objects. In this work, we developed Frustum ConvNet and attention modules for the fusion of images from a camera and point clouds from a Lidar. Multilayer Perceptron (MLP) and tanh activation functions were used in the attention modules. Furthermore, the attention modules were designed on PointNet to perform multilayer edge detection for 3D object detection. Compared with a previous well-known method, Frustum ConvNet, our method achieved competitive results, with an improvement of 0.27%, 0.43%, and 0.36% in Average Precision (AP) for 3D object detection in easy, moderate, and hard cases, respectively, and an improvement of 0.21%, 0.27%, and 0.01% in AP for Bird’s Eye View (BEV) object detection in easy, moderate, and hard cases, respectively, on the KITTI detection benchmarks. Our method also obtained the best results in four cases in AP on the indoor SUN-RGBD dataset for 3D object detection.

Scale-Aware Attention-Based PillarsNet (SAPN) Based 3D Object Detection for Point Cloud

Mathematical Problems in Engineering ◽

10.1155/2020/3927365 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Xiang Song ◽

Weiqin Zhan ◽

Xiaoyu Che ◽

Huilin Jiang ◽

Biao Yang

Keyword(s):

Object Detection ◽

Autonomous Navigation ◽

Point Clouds ◽

Detection Performance ◽

Detection Methods ◽

Feature Maps ◽

3D Object ◽

3D Point Clouds ◽

Detection Approach ◽

3D Object Detection

Three-dimensional object detection can provide precise positions of objects, which can be beneficial to many robotics applications, such as self-driving cars, housekeeping robots, and autonomous navigation. In this work, we focus on accurate object detection in 3D point clouds and propose a new detection pipeline called scale-aware attention-based PillarsNet (SAPN). SAPN is a one-stage 3D object detection approach similar to PointPillar. However, SAPN achieves better performance than PointPillar by introducing the following strategies. First, we extract multiresolution pillar-level features from the point clouds to make the detection approach more scale-aware. Second, a spatial-attention mechanism is used to highlight the object activations in the feature maps, which can improve detection performance. Finally, SE-attention is employed to reweight the features fed into the detection head, which performs 3D object detection in a multitask learning manner. Experiments on the KITTI benchmark show that SAPN achieved similar or better performance compared with several state-of-the-art LiDAR-based 3D detection methods. The ablation study reveals the effectiveness of each proposed strategy. Furthermore, strategies used in this work can be embedded easily into other LiDAR-based 3D detection approaches, which improve their detection performance with slight modifications.

A Novel Interactive Fusion Method with Images and Point Clouds for 3D Object Detection

Applied Sciences ◽

10.3390/app9061065 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1065 ◽

Cited By ~ 1

Author(s):

Kai Xu ◽

Zhile Yang ◽

Yangjie Xu ◽

Liangbing Feng

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Image Feature ◽

Fusion Method ◽

Feature Maps ◽

Fusion Algorithm ◽

3D Object ◽

Novel Structure ◽

3D Object Detection

This paper aims at tackling the task of fusion feature from images and their corresponding point clouds for 3D object detection in autonomous driving scenarios based on AVOD, an Aggregate View Object Detection network. The proposed fusion algorithms fuse features targeted from Bird’s Eye View (BEV) LIDAR point clouds and their corresponding RGB images. Differing in existing fusion methods, which are simply the adoption of the concatenation module, the element-wise sum module or the element-wise mean module, our proposed fusion algorithms enhance the interaction between BEV feature maps and their corresponding image feature maps by designing a novel structure, where single level feature maps and utilize multilevel feature maps. Experiments show that our proposed fusion algorithm produces better results on 3D mAP and AHS with less speed loss compared to the existing fusion method used on the KITTI 3D object detection benchmark.

CrossFusion net: Deep 3D object detection based on RGB images and point clouds in autonomous driving

Image and Vision Computing ◽

10.1016/j.imavis.2020.103955 ◽

2020 ◽

Vol 100 ◽

pp. 103955

Author(s):

Dza-Shiang Hong ◽

Hung-Hao Chen ◽

Pei-Yung Hsiao ◽

Li-Chen Fu ◽

Siang-Min Siao

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

3D Object ◽

Rgb Images ◽

3D Object Detection

A Novel Interactive Fusion Method with Images and Point Clouds for 3D Object Detection

10.20944/preprints201902.0105.v1 ◽

2019 ◽

Author(s):

Kai Xu ◽

Zhile Yang ◽

Yangjie Xu ◽

Liangbing Feng

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Fusion Method ◽

Feature Maps ◽

Fusion Algorithm ◽

3D Object ◽

Novel Structure ◽

Fusion Methods ◽

3D Object Detection

This paper aims at tackling with the task of fusion feature from images and its corresponding point clouds for 3D object detection in autonomous driving scenarios basing on AVOD, an Aggregate View Object Detection network. The proposed fusion algorithms fuse features targeted from Bird’s Eye View (BEV) LIDAR point clouds and its corresponding RGB images. Differs in existing fusion methods, which are simply the adoptions of concatenation module, element-wise sum module or element-wise mean module, our proposed fusion algorithms enhance the interaction between BEV feature maps and its corresponding images feature maps by designing a novel structure, where single level feature maps and another utilizes multilevel feature maps. Experiments show that our proposed fusion algorithm produces better results on 3D mAP and AHS with less speed loss comparing to existing fusion method used on the KITTI 3D object detection benchmark.

A Two-Stage Data Association Approach for 3D Multi-Object Tracking

Sensors ◽

10.3390/s21092894 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2894

Author(s):

Minh-Quan Dao ◽

Vincent Frémont

Keyword(s):

Object Detection ◽

Object Tracking ◽

Moving Objects ◽

Data Association ◽

Autonomous Driving ◽

Tracking Accuracy ◽

Two Stage ◽

Bipartite Matching ◽

3D Object ◽

3D Object Detection

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.