scholarly journals Visual-Inertial-Semantic Scene Representation for 3D Object Detection

Author(s):  
Jingming Dong ◽  
Xiaohan Fei ◽  
Stefano Soatto
2021 ◽  
Vol 12 (9) ◽  
pp. 459-469
Author(s):  
D. D. Rukhovich ◽  

In this paper, we propose a novel method of joint 3D object detection and room layout estimation. The proposed method surpasses all existing methods of 3D object detection from monocular images on the indoor SUN RGB-D dataset. Moreover, the proposed method shows competitive results on the ScanNet dataset in multi-view mode. Both these datasets are collected in various residential, administrative, educational and industrial spaces, and altogether they cover almost all possible use cases. Moreover, we are the first to formulate and solve a problem of multi-class 3D object detection from multi-view inputs in indoor scenes. The proposed method can be integrated into the controlling systems of mobile robots. The results of this study can be used to address a navigation task, as well as path planning, capturing and manipulating scene objects, and semantic scene mapping.


2021 ◽  
Vol 12 (7) ◽  
pp. 373-384
Author(s):  
D. D. Rukhovich ◽  

In this article, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. In a multi-view formulation of the 3D object detection problem, several images of a static scene are used to detect objects in the scene. To address the 3D object detection problem in a multi-view formulation, we propose a novel 3D object detection method named ImVoxelNet. ImVoxelNet is based on a fully convolutional neural network. Unlike existing 3D object detection methods, ImVoxelNet works directly with 3D representations and does not mediate 3D object detection through 2D object detection. The proposed method accepts multi-view inputs. The number of monocular images in each multi-view input can vary during training and inference; actually, this number might be unique for each multi-view input. Moreover, we propose to treat a single RGB image as a special case of a multi-view input. Accordingly, the proposed method can also accept monocular inputs with no modifications. Through extensive evaluation, we demonstrate that the proposed method successfully handles a variety of outdoor scenes. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. The proposed method operates in real-time, which makes it possible to integrate it into the navigation systems of autonomous devices. The results of this study can be used to address tasks of navigation, path planning, and semantic scene mapping.


Author(s):  
Xiaoqing Shang ◽  
Zhiwei Cheng ◽  
Su Shi ◽  
Zhuanghao Cheng ◽  
Hongcheng Huang

Author(s):  
Xu Liu ◽  
Jiayan Cao ◽  
Qianqian Bi ◽  
Jian Wang ◽  
Boxin Shi ◽  
...  

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Can Chen ◽  
Luca Zanotti Fragonara ◽  
Antonios Tsourdos

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2894
Author(s):  
Minh-Quan Dao ◽  
Vincent Frémont

Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.


Sign in / Sign up

Export Citation Format

Share Document