A 3D Convolutional Neural Network Towards Real-Time Amodal 3D Object Detection

Author(s):  
Hao Sun ◽  
Zehui Meng ◽  
Xinxin Du ◽  
Marcelo H. Ang
Author(s):  
Zhiyong Gao ◽  
Jianhong Xiang

Background: While detecting the object directly from the 3D point cloud, the natural 3D patterns and invariance of 3D data are often obscure. Objective: In this work, we aimed at studying the 3D object detection from discrete, disordered and sparse 3D point clouds. Methods: The CNN is composed of the frustum sequence module, 3D instance segmentation module S-NET, 3D point cloud transformation module T-NET, and 3D boundary box estimation module E-NET. The search space of the object is determined by the frustum sequence module. The instance segmentation of the point cloud is performed by the 3D instance segmentation module. The 3D coordinates of the object are confirmed by the transformation module and the 3D bounding box estimation module. Results: Evaluated on KITTI benchmark dataset, our method outperforms the state of the art by remarkable margins while having real-time capability. Conclusion: We achieve real-time 3D object detection by proposing an improved convolutional neural network (CNN) based on image-driven point clouds.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 171461-171470
Author(s):  
Dianwei Wang ◽  
Yanhui He ◽  
Ying Liu ◽  
Daxiang Li ◽  
Shiqian Wu ◽  
...  

2021 ◽  
Vol 1979 (1) ◽  
pp. 012020
Author(s):  
Gadug Sudhansu ◽  
A N Mohamed Zabeeulla ◽  
M N Nachappa

Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 893 ◽  
Author(s):  
Li Wang ◽  
Ruifeng Li ◽  
Hezi Shi ◽  
Jingwen Sun ◽  
Lijun Zhao ◽  
...  

Environmental perception is a vital feature for service robots when working in an indoor environment for a long time. The general 3D reconstruction is a low-level geometric information description that cannot convey semantics. In contrast, higher level perception similar to humans requires more abstract concepts, such as objects and scenes. Moreover, the 2D object detection based on images always fails to provide the actual position and size of an object, which is quite important for a robot’s operation. In this paper, we focus on the 3D object detection to regress the object’s category, 3D size, and spatial position through a convolutional neural network (CNN). We propose a multi-channel CNN for 3D object detection, which fuses three input channels including RGB, depth, and bird’s eye view (BEV) images. We also propose a method to generate 3D proposals based on 2D ones in the RGB image and semantic prior. Training and test are conducted on the modified NYU V2 dataset and SUN RGB-D dataset in order to verify the effectiveness of the algorithm. We also carry out the actual experiments in a service robot to utilize the proposed 3D object detection method to enhance the environmental perception of the robot.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6779
Author(s):  
Byung-Gil Han ◽  
Joon-Goo Lee ◽  
Kil-Taek Lim ◽  
Doo-Hyun Choi

With the increase in research cases of the application of a convolutional neural network (CNN)-based object detection technology, studies on the light-weight CNN models that can be performed in real time on the edge-computing devices are also increasing. This paper proposed scalable convolutional blocks that can be easily designed CNN networks of You Only Look Once (YOLO) detector which have the balanced processing speed and accuracy of the target edge-computing devices considering different performances by exchanging the proposed blocks simply. The maximum number of kernels of the convolutional layer was determined through simple but intuitive speed comparison tests for three edge-computing devices to be considered. The scalable convolutional blocks were designed in consideration of the limited maximum number of kernels to detect objects in real time on these edge-computing devices. Three scalable and fast YOLO detectors (SF-YOLO) which designed using the proposed scalable convolutional blocks compared the processing speed and accuracy with several conventional light-weight YOLO detectors on the edge-computing devices. When compared with YOLOv3-tiny, SF-YOLO was seen to be 2 times faster than the previous processing speed but with the same accuracy as YOLOv3-tiny, and also, a 48% improved processing speed than the YOLOv3-tiny-PRN which is the processing speed improvement model. Also, even in the large SF-YOLO model that focuses on the accuracy performance, it achieved a 10% faster processing speed with better accuracy of 40.4% [email protected] in the MS COCO dataset than YOLOv4-tiny model.


Sensors ◽  
2019 ◽  
Vol 19 (6) ◽  
pp. 1434 ◽  
Author(s):  
Minle Li ◽  
Yihua Hu ◽  
Nanxiang Zhao ◽  
Qishu Qian

Three-dimensional (3D) object detection has important applications in robotics, automatic loading, automatic driving and other scenarios. With the improvement of devices, people can collect multi-sensor/multimodal data from a variety of sensors such as Lidar and cameras. In order to make full use of various information advantages and improve the performance of object detection, we proposed a Complex-Retina network, a convolution neural network for 3D object detection based on multi-sensor data fusion. Firstly, a unified architecture with two feature extraction networks was designed, and the feature extraction of point clouds and images from different sensors realized synchronously. Then, we set a series of 3D anchors and projected them to the feature maps, which were cropped into 2D anchors with the same size and fused together. Finally, the object classification and 3D bounding box regression were carried out on the multipath of fully connected layers. The proposed network is a one-stage convolution neural network, which achieves the balance between the accuracy and speed of object detection. The experiments on KITTI datasets show that the proposed network is superior to the contrast algorithms in average precision (AP) and time consumption, which shows the effectiveness of the proposed network.


Sign in / Sign up

Export Citation Format

Share Document