Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator

Over the past decade, significant efforts have been made to improve the trade-off between speed and accuracy of surface normal estimators (SNEs). This paper introduces an accurate and ultrafast SNE for structured range data. The proposed approach computes surface normals by simply performing three filtering operations, namely, two image gradient filters (in horizontal and vertical directions, respectively) and a mean/median filter, on an inverse depth image or a disparity image. Despite the simplicity of the method, no similar method already exists in the literature. In our experiments, we created three large-scale synthetic datasets (easy, medium and hard) using 24 3-dimensional (3D) mesh models. Each mesh model is used to generate 1800--2500 pairs of 480x640 pixel depth images and the corresponding surface normal ground truth from different views. The average angular errors with respect to the easy, medium and hard datasets are 1.6 degrees, 5.6 degrees and 15.3 degrees, respectively. Our C++ and CUDA implementations achieve a processing speed of over 260 Hz and 21 kHz, respectively. Our proposed SNE achieves a better overall performance than all other existing computer vision-based SNEs. Our datasets and source code are publicly available at: sites.google.com/view/3f2n.

Download Full-text

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator

10.36227/techrxiv.12362249.v1 ◽

2020 ◽

Author(s):

Rui Fan ◽

Hengli Wang ◽

Bohuan Xue ◽

Huaiyang Huang ◽

Yuan Wang, ◽

...

Keyword(s):

Large Scale ◽

Median Filter ◽

Ground Truth ◽

Depth Image ◽

Range Data ◽

3 Dimensional ◽

Depth Images ◽

Mesh Model ◽

Surface Normal ◽

Synthetic Datasets

Over the past decade, significant efforts have been made to improve the trade-off between speed and accuracy of surface normal estimators (SNEs). This paper introduces an accurate and ultrafast SNE for structured range data. The proposed approach computes surface normals by simply performing three filtering operations, namely, two image gradient filters (in horizontal and vertical directions, respectively) and a mean/median filter, on an inverse depth image or a disparity image. Despite the simplicity of the method, no similar method already exists in the literature. In our experiments, we created three large-scale synthetic datasets (easy, medium and hard) using 24 3-dimensional (3D) mesh models. Each mesh model is used to generate 1800--2500 pairs of 480x640 pixel depth images and the corresponding surface normal ground truth from different views. The average angular errors with respect to the easy, medium and hard datasets are 1.6 degrees, 5.6 degrees and 15.3 degrees, respectively. Our C++ and CUDA implementations achieve a processing speed of over 260 Hz and 21 kHz, respectively. Our proposed SNE achieves a better overall performance than all other existing computer vision-based SNEs. Our datasets and source code are publicly available at: sites.google.com/view/3f2n.

Download Full-text

Three-Filters-to-Normal: An Accurate and Ultrafast Surface Normal Estimator

10.36227/techrxiv.12362249 ◽

2020 ◽

Author(s):

Rui Fan ◽

Hengli Wang ◽

Bohuan Xue ◽

Huaiyang Huang ◽

Yuan Wang ◽

...

Keyword(s):

Large Scale ◽

Median Filter ◽

Ground Truth ◽

Depth Image ◽

Range Data ◽

3 Dimensional ◽

Depth Images ◽

Mesh Model ◽

Surface Normal ◽

Synthetic Datasets

Over the past decade, significant efforts have been made to improve the trade-off between speed and accuracy of surface normal estimators (SNEs). This paper introduces an accurate and ultrafast SNE for structured range data. The proposed approach computes surface normals by simply performing three filtering operations, namely, two image gradient filters (in horizontal and vertical directions, respectively) and a mean/median filter, on an inverse depth image or a disparity image. Despite the simplicity of the method, no similar method already exists in the literature. In our experiments, we created three large-scale synthetic datasets (easy, medium and hard) using 24 3-dimensional (3D) mesh models. Each mesh model is used to generate 1800--2500 pairs of 480x640 pixel depth images and the corresponding surface normal ground truth from different views. The average angular errors with respect to the easy, medium and hard datasets are 1.6 degrees, 5.6 degrees and 15.3 degrees, respectively. Our C++ and CUDA implementations achieve a processing speed of over 260 Hz and 21 kHz, respectively. Our proposed SNE achieves a better overall performance than all other existing computer vision-based SNEs. Our datasets and source code are publicly available at: sites.google.com/view/3f2n.

Download Full-text

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Sensors ◽

10.3390/s21041299 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1299

Author(s):

Honglin Yuan ◽

Tim Hoogenkamp ◽

Remco C. Veltkamp

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

3D Models ◽

Depth Image ◽

Great Success ◽

Estimation Algorithms ◽

Depth Images ◽

Object Pose Estimation ◽

Image Pairs ◽

Bounding Boxes

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Download Full-text

DMESH: FAST DEPTH-IMAGE MESHING AND WARPING

International Journal of Image and Graphics ◽

10.1142/s0219467804001580 ◽

2004 ◽

Vol 04 (04) ◽

pp. 653-681 ◽

Cited By ~ 15

Author(s):

RENATO PAJAROLA ◽

MIGUEL SAINZ ◽

YU MENG

Keyword(s):

Large Scale ◽

Image Representation ◽

Depth Image ◽

Mesh Segmentation ◽

Object Representations ◽

Triangle Mesh ◽

Depth Images ◽

Segmentation Methods ◽

Real Time Applications ◽

Visualization Systems

In this paper we present a novel and efficient depth-image representation and warping technique called DMesh which is based on a piece-wise linear approximation of the depth-image as a textured and simplified triangle mesh. We describe the application of a hierarchical multiresolution triangulation method to generate adaptively triangulated depth-meshes efficiently from reference depth-images, discuss depth-mesh segmentation methods to avoid occlusion artifacts and propose a new hardware accelerated depth-image rendering technique that supports per-pixel weighted blending of multiple depth-images in real-time. Applications of our technique include image-based object representations and the use of depth-images in large scale walk-through visualization systems.

Download Full-text

Customized Synthetic Dataset for Deep Learning Noise Filtering for Time-of-Flight Indoor Navigation Applications

Advances in Transdisciplinary Engineering - Transdisciplinary Engineering for Complex Socio-technical Systems – Real-life Applications ◽

10.3233/atde200110 ◽

2020 ◽

Author(s):

Vinícius da Silva Ramalho ◽

Rômulo Francisco Lepinsk Lopes ◽

Ricardo Luhm Silva ◽

Marcelo Rudek

Keyword(s):

Deep Learning ◽

Random Noise ◽

Time Of Flight ◽

Ground Truth ◽

Noise Removal ◽

Synthetic Dataset ◽

Ground Truth Data ◽

Performance Benchmarking ◽

Depth Images ◽

Synthetic Datasets

Synthetic datasets have been used to train 2D and 3D image-based deep learning models, and they serve as also as performance benchmarking. Although some authors already use 3D models for the development of navigation systems, their applications do not consider noise sources, which affects 3D sensors. Time-of-Flight sensors are susceptible to noise and conventional filters have limitations depending on the scenario it will be applied. On the other hand, deep learning filters can be more invariant to changes and take into consideration contextual information to attenuate noise. However, to train a deep learning filter a noiseless ground truth is required, but highly accurate hardware would be need. Synthetic datasets are provided with ground truth data, and similar noise can be applied to it, creating a noisy dataset for a deep learning approach. This research explores the training of a noise removal application using deep learning trained only with the Flying Things synthetic dataset with ground truth data and applying random noise to it. The trained model is validated with the Middlebury dataset which contains real-world data. The research results show that training the deep learning architecture for noise removal with only a synthetic dataset is capable to achieve near state of art performance, and the proposed model is able to process 12bit resolution depth images instead of 8bit images. Future studies will evaluate the algorithm performance regarding real-time noise removal to allow embedded applications.

Download Full-text

Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6781 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11221-11228

Author(s):

Yueying Kao ◽

Weiming Li ◽

Qiang Wang ◽

Zhouchen Lin ◽

Wooshik Kim ◽

...

Keyword(s):

Pose Estimation ◽

Large Scale ◽

Synthetic Data ◽

Real Data ◽

Depth Image ◽

Depth Images ◽

In The Wild ◽

Object Pose Estimation ◽

Image Pairs ◽

Rgb Image

Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.

Download Full-text

Texture Synthesis Repair of RealSense D435i Depth Images with Object-Oriented RGB Image Segmentation

Sensors ◽

10.3390/s20236725 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6725

Author(s):

Longyu Zhang ◽

Hao Xia ◽

Yanyou Qiao

Keyword(s):

Texture Synthesis ◽

Low Cost ◽

Similarity Index ◽

Object Oriented ◽

Ground Truth ◽

Depth Image ◽

Depth Camera ◽

Distance Information ◽

Depth Images ◽

Rgb Image

A depth camera is a kind of sensor that can directly collect distance information between an object and the camera. The RealSense D435i is a low-cost depth camera that is currently in widespread use. When collecting data, an RGB image and a depth image are acquired simultaneously. The quality of the RGB image is good, whereas the depth image typically has many holes. In a lot of applications using depth images, these holes can lead to serious problems. In this study, a repair method of depth images was proposed. The depth image is repaired using the texture synthesis algorithm with the RGB image, which is segmented through a multi-scale object-oriented method. The object difference parameter is added to the process of selecting the best sample block. In contrast with previous methods, the experimental results show that the proposed method avoids the error filling of holes, the edge of the filled holes is consistent with the edge of RGB images, and the repair accuracy is better. The root mean square error, peak signal-to-noise ratio, and structural similarity index measure from the repaired depth images and ground-truth image were better than those obtained by two other methods. We believe that the repair of the depth image can improve the effects of depth image applications.

Download Full-text

Real-Time Height Measurement for Moving Pedestrians

Complexity ◽

10.1155/2020/5708593 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Wenju Zhou ◽

Fulong Yao ◽

Wei Feng ◽

Haikuan Wang

Keyword(s):

Real Time ◽

Ground Truth ◽

Pso Algorithm ◽

Depth Image ◽

Head Region ◽

Height Measurement ◽

The Real ◽

Depth Images ◽

Criminal Suspect ◽

Tof Camera

Height measurement for moving pedestrians is quite significant in many scenarios, such as pedestrian positioning, criminal suspect tracking, and virtual reality. Although some existing height measurement methods can detect the height of the static people, it is hard to measure height accurately for moving pedestrians. Considering the height fluctuations in dynamic situation, this paper proposes a real-time height measurement based on the Time-of-Flight (TOF) camera. Depth images in a continuous sequence are addressed to obtain the real-time height of the pedestrian with moving. Firstly, a normalization equation is presented to convert the depth image into the grey image for a lower time cost and better performance. Secondly, a difference-particle swarm optimization (D-PSO) algorithm is proposed to remove the complex background and reduce the noises. Thirdly, a segmentation algorithm based on the maximally stable extremal regions (MSERs) is introduced to extract the pedestrian head region. Then, a novel multilayer iterative average algorithm (MLIA) is developed for obtaining the height of dynamic pedestrians. Finally, Kalman filtering is used to improve the measurement accuracy by combining the current measurement and the height at the last moment. In addition, the VICON system is adopted as the ground truth to verify the proposed method, and the result shows that our method can accurately measure the real-time height of moving pedestrians.

Download Full-text

The Blackbird UAV dataset

The International Journal of Robotics Research ◽

10.1177/0278364920908331 ◽

2020 ◽

Vol 39 (10-11) ◽

pp. 1346-1364

Author(s):

Amado Antonini ◽

Winter Guerra ◽

Varun Murali ◽

Thomas Sayre-McCord ◽

Sertac Karaman

Keyword(s):

High Performance ◽

Large Scale ◽

Ground Truth ◽

Depth Estimation ◽

Sensor Data ◽

Measurement Unit ◽

Motor Speed ◽

Depth Cameras ◽

Autonomous Operation ◽

Segmented Images

This article describes the Blackbird unmanned aerial vehicle (UAV) Dataset, a large-scale suite of sensor data and corresponding ground truth from a custom-built quadrotor platform equipped with an inertial measurement unit (IMU), rotor tachometers, and virtual color, grayscale, and depth cameras. Motivated by the increasing demand for agile, autonomous operation of aerial vehicles, this dataset is designed to facilitate the development and evaluation of high-performance UAV perception algorithms. The dataset contains over 10 hours of data from our quadrotor tracing 18 different trajectories at varying maximum speeds (0.5 to 13.8 ms-1) through 5 different visual environments for a total of 176 unique flights. For each flight, we provide 120 Hz grayscale, 60 Hz RGB-D, and 60 Hz semantically segmented images from forward stereo and downward-facing photorealistic virtual cameras in addition to 100 Hz IMU, ~190 Hz motor speed sensors, and 360 Hz millimeter-accurate motion capture ground truth. The Blackbird UAV dataset is therefore well suited to the development of algorithms for visual inertial navigation, 3D reconstruction, and depth estimation. As a benchmark for future algorithms, the performance of two state-of-the-art visual odometry algorithms are reported and scripts for comparing against the benchmarks are included with the dataset. The dataset is available for download at http://blackbird-dataset.mit.edu/ .

Download Full-text