Efficient representation and feature extraction for neural network-based 3D object pose estimation

2013 ◽  
Vol 120 ◽  
pp. 90-100 ◽  
Author(s):  
Rigas Kouskouridas ◽  
Antonios Gasteratos ◽  
Christos Emmanouilidis
2021 ◽  
Vol 218 ◽  
pp. 106839
Author(s):  
Pengshuai Yin ◽  
Jiayong Ye ◽  
Guoshen Lin ◽  
Qingyao Wu

Sensors ◽  
2019 ◽  
Vol 19 (6) ◽  
pp. 1434 ◽  
Author(s):  
Minle Li ◽  
Yihua Hu ◽  
Nanxiang Zhao ◽  
Qishu Qian

Three-dimensional (3D) object detection has important applications in robotics, automatic loading, automatic driving and other scenarios. With the improvement of devices, people can collect multi-sensor/multimodal data from a variety of sensors such as Lidar and cameras. In order to make full use of various information advantages and improve the performance of object detection, we proposed a Complex-Retina network, a convolution neural network for 3D object detection based on multi-sensor data fusion. Firstly, a unified architecture with two feature extraction networks was designed, and the feature extraction of point clouds and images from different sensors realized synchronously. Then, we set a series of 3D anchors and projected them to the feature maps, which were cropped into 2D anchors with the same size and fused together. Finally, the object classification and 3D bounding box regression were carried out on the multipath of fully connected layers. The proposed network is a one-stage convolution neural network, which achieves the balance between the accuracy and speed of object detection. The experiments on KITTI datasets show that the proposed network is superior to the contrast algorithms in average precision (AP) and time consumption, which shows the effectiveness of the proposed network.


Author(s):  
Vassileios Balntas ◽  
Andreas Doumanoglou ◽  
Caner Sahin ◽  
Juil Sock ◽  
Rigas Kouskouridas ◽  
...  

2020 ◽  
Vol 2020 (8) ◽  
pp. 221-1-221-7
Author(s):  
Jianhang Chen ◽  
Daniel Mas Montserrat ◽  
Qian Lin ◽  
Edward J. Delp ◽  
Jan P. Allebach

We introduce a new image dataset for object detection and 6D pose estimation, named Extra FAT. The dataset consists of 825K photorealistic RGB images with annotations of groundtruth location and rotation for both the virtual camera and the objects. A registered pixel-level object segmentation mask is also provided for object detection and segmentation tasks. The dataset includes 110 different 3D object models. The object models were rendered in five scenes with diverse illumination, reflection, and occlusion conditions.


Sign in / Sign up

Export Citation Format

Share Document