CLUBS: An RGB-D dataset with cluttered box scenes containing household objects

With the progress of machine learning, the demand for realistic data with high-quality annotations has been thriving. In order to generalize well, considerable amounts of data are required, especially realistic ground-truth data, for tasks such as object detection and scene segmentation. Such data can be difficult, time-consuming, and expensive to collect. This article presents a dataset of household objects and box scenes commonly found in warehouse environments. The dataset was obtained using a robotic setup with four different cameras. It contains reconstructed objects and scenes, as well as raw RGB and depth images, camera poses, pixel-wise labels of objects directly in the RGB images, and 3D bounding boxes with poses in the world frame. Furthermore, raw calibration data are provided, together with the intrinsic and extrinsic parameters for all the sensors. By providing object labels as pixel-wise masks, 3D, and 2D object bounding boxes, this dataset is useful for both object recognition and instance segmentation. The realistic scenes provided will serve for learning-based algorithms applied to scenarios where boxes of objects are often found, such as in the logistics sector. Both the dataset and the tools for data processing are published and available online.

Download Full-text

RobotP: A Benchmark Dataset for 6D Object Pose Estimation

Sensors ◽

10.3390/s21041299 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1299

Author(s):

Honglin Yuan ◽

Tim Hoogenkamp ◽

Remco C. Veltkamp

Keyword(s):

Pose Estimation ◽

Ground Truth ◽

3D Models ◽

Depth Image ◽

Great Success ◽

Estimation Algorithms ◽

Depth Images ◽

Object Pose Estimation ◽

Image Pairs ◽

Bounding Boxes

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.

Download Full-text

Enhancing unsupervised video-based vehicle tracking and modeling for traffic data collection

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2019-0087 ◽

2020 ◽

Vol 47 (8) ◽

pp. 982-997

Author(s):

Mohamed H. Zaki ◽

Tarek Sayed ◽

Moataz Billeh

Keyword(s):

Data Collection ◽

Video Analysis ◽

Three Dimensional ◽

Ground Truth ◽

Traffic Data ◽

Ground Truth Data ◽

The Road ◽

Traffic Data Collection ◽

Bounding Boxes ◽

Automated Video Analysis

Video-based traffic analysis is a leading technology for streamlining transportation data collection. With traffic records from video cameras, unsupervised automated video analysis can detect various vehicle measures such as vehicle spatial coordinates and subsequently lane positions, speed, and other dynamic measures without the need of any physical interconnections to the road infrastructure. This paper contributes to the unsupervised automated video analysis by addressing two main shortcomings of the approach. The first objective is to alleviate tracking problems of over-segmentation and over-grouping by integrating region-based detection with feature-based tracking. This information, when combined with spatiotemporal constraints of grouping, can reduce the effects of these problems. This fusion approach offers a superior decision procedure for grouping objects and discriminating between trajectories of objects. The second objective is to model three-dimensional bounding boxes for the vehicles, leading to a better estimate of their geometry and consequently accurate measures of their position and travel information. This improvement leads to more precise measurement of traffic parameters such as average speed, gap time, and headway. The paper describes the various steps of the proposed improvements. It evaluates the effectiveness of the refinement process on data collected from traffic cameras in three different locations in Canada and validates the results with ground truth data. It illustrates the effectiveness of the improved unsupervised automated video analysis with a case study on 10 h of traffic data collection such as volume and headway measurements.

Download Full-text

Unsupervised Deep Learning-Based RGB-D Visual Odometry

Applied Sciences ◽

10.3390/app10165426 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5426 ◽

Cited By ~ 1

Author(s):

Qiang Liu ◽

Haidong Zhang ◽

Yiming Xu ◽

Li Wang

Keyword(s):

Deep Learning ◽

Feature Matching ◽

Ground Truth ◽

Visual Odometry ◽

Depth Images ◽

Network Training ◽

Stream Structure ◽

Unsupervised Deep Learning ◽

Rgb Images ◽

Learning Frameworks

Recently, deep learning frameworks have been deployed in visual odometry systems and achieved comparable results to traditional feature matching based systems. However, most deep learning-based frameworks inevitably need labeled data as ground truth for training. On the other hand, monocular odometry systems are incapable of restoring absolute scale. External or prior information has to be introduced for scale recovery. To solve these problems, we present a novel deep learning-based RGB-D visual odometry system. Our two main contributions are: (i) during network training and pose estimation, the depth images are fed into the network to form a dual-stream structure with the RGB images, and a dual-stream deep neural network is proposed. (ii) the system adopts an unsupervised end-to-end training method, thus the labor-intensive data labeling task is not required. We have tested our system on the KITTI dataset, and results show that the proposed RGB-D Visual Odometry (VO) system has obvious advantages over other state-of-the-art systems in terms of both translation and rotation errors.

Download Full-text

Customized Synthetic Dataset for Deep Learning Noise Filtering for Time-of-Flight Indoor Navigation Applications

Advances in Transdisciplinary Engineering - Transdisciplinary Engineering for Complex Socio-technical Systems – Real-life Applications ◽

10.3233/atde200110 ◽

2020 ◽

Author(s):

Vinícius da Silva Ramalho ◽

Rômulo Francisco Lepinsk Lopes ◽

Ricardo Luhm Silva ◽

Marcelo Rudek

Keyword(s):

Deep Learning ◽

Random Noise ◽

Time Of Flight ◽

Ground Truth ◽

Noise Removal ◽

Synthetic Dataset ◽

Ground Truth Data ◽

Performance Benchmarking ◽

Depth Images ◽

Synthetic Datasets

Synthetic datasets have been used to train 2D and 3D image-based deep learning models, and they serve as also as performance benchmarking. Although some authors already use 3D models for the development of navigation systems, their applications do not consider noise sources, which affects 3D sensors. Time-of-Flight sensors are susceptible to noise and conventional filters have limitations depending on the scenario it will be applied. On the other hand, deep learning filters can be more invariant to changes and take into consideration contextual information to attenuate noise. However, to train a deep learning filter a noiseless ground truth is required, but highly accurate hardware would be need. Synthetic datasets are provided with ground truth data, and similar noise can be applied to it, creating a noisy dataset for a deep learning approach. This research explores the training of a noise removal application using deep learning trained only with the Flying Things synthetic dataset with ground truth data and applying random noise to it. The trained model is validated with the Middlebury dataset which contains real-world data. The research results show that training the deep learning architecture for noise removal with only a synthetic dataset is capable to achieve near state of art performance, and the proposed model is able to process 12bit resolution depth images instead of 8bit images. Future studies will evaluate the algorithm performance regarding real-time noise removal to allow embedded applications.

Download Full-text

Virtual Reality Providing Ground Truth for Machine Learning and Programming by Demonstration

Volume 2: 32nd Computers and Information in Engineering Conference, Parts A and B ◽

10.1115/detc2012-70375 ◽

2012 ◽

Cited By ~ 2

Author(s):

Jürgen Roßmann ◽

Christian Schlette ◽

Nils Wantia

Keyword(s):

Virtual Reality ◽

Space Station ◽

Ground Truth ◽

Cognitive System ◽

Automated System ◽

Ground Truth Data ◽

Free Data ◽

Funded Project ◽

Manual Tasks ◽

Bounding Boxes

In this contribution, it is demonstrated how the development, parameterization and evaluation of an intelligent cognitive system can be supported by means of Virtual Reality (VR). Based on new approaches in object and action recognition, the consortium in the EU funded project IntellAct is developing such an automated system for the intelligent understanding of manual tasks and robot-based manipulations. In this context, the VR setup of the Institute for Man-Machine Interaction (MMI) is able to deliver ideal data, such as detailed positions and joint data of objects and kinematics involved. In addition, it allows for the synthesis of distinct and error-free data, such as collisions, bounding boxes and contact events. This way, Virtual Reality serves as a reference and source for “ground truth” data for designing, training and benchmarking the intelligent cognitive system during its development and parameterization. Furthermore this approach enables descriptive and secure ways for the visualization of results in the targeted environments, such as the highly automated laboratories on-board the International Space Station (see Fig. 1).

Download Full-text

3D Vehicle Trajectory Extraction Using DCNN in an Overlapping Multi-Camera Crossroad Scene

Sensors ◽

10.3390/s21237879 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7879

Author(s):

Jinyeong Heo ◽

Yongjin (James) Kwon

Keyword(s):

Heavy Traffic ◽

Ground Plane ◽

Ground Truth ◽

Autonomous Driving ◽

Linear Interpolation ◽

Single Camera ◽

Ground Truth Data ◽

Wide Range ◽

Vehicle Trajectory ◽

Bounding Boxes

The 3D vehicle trajectory in complex traffic conditions such as crossroads and heavy traffic is practically very useful in autonomous driving. In order to accurately extract the 3D vehicle trajectory from a perspective camera in a crossroad where the vehicle has an angular range of 360 degrees, problems such as the narrow visual angle in single-camera scene, vehicle occlusion under conditions of low camera perspective, and lack of vehicle physical information must be solved. In this paper, we propose a method for estimating the 3D bounding boxes of vehicles and extracting trajectories using a deep convolutional neural network (DCNN) in an overlapping multi-camera crossroad scene. First, traffic data were collected using overlapping multi-cameras to obtain a wide range of trajectories around the crossroad. Then, 3D bounding boxes of vehicles were estimated and tracked in each single-camera scene through DCNN models (YOLOv4, multi-branch CNN) combined with camera calibration. Using the abovementioned information, the 3D vehicle trajectory could be extracted on the ground plane of the crossroad by calculating results obtained from the overlapping multi-camera with a homography matrix. Finally, in experiments, the errors of extracted trajectories were corrected through a simple linear interpolation and regression, and the accuracy of the proposed method was verified by calculating the difference with ground-truth data. Compared with other previously reported methods, our approach is shown to be more accurate and more practical.

Download Full-text

Forest aboveground volume assessments with terrestrial laser scanning: a ground-truth validation experiment in temperate, managed forests

Annals of Botany ◽

10.1093/aob/mcab110 ◽

2021 ◽

Author(s):

Miro Demol ◽

Kim Calders ◽

Hans Verbeeck ◽

Bert Gielen

Keyword(s):

Laser Scanning ◽

Basic Density ◽

Terrestrial Laser Scanning ◽

Ground Truth ◽

Stem Volume ◽

Calibration Data ◽

Ground Truth Data ◽

Allometric Models ◽

Volumetric Errors ◽

Validation Experiment

Abstract Background and Aims Quantifying the Earth’s forest aboveground biomass (AGB) is indispensable for effective climate action and developing forest policy. Yet, current allometric scaling models (ASM) to estimate AGB suffer several drawbacks related to model selection and calibration data traceability uncertainties. Terrestrial laser scanning (TLS) offers a promising non-destructive alternative. Tree volume is reconstructed from TLS point clouds with Quantitative Structure Models (QSM) and converted to AGB with wood basic density. Earlier studies have found overall TLS-derived forest volume estimates to be accurate, but highlighted problems for reconstructing finer branches. Our objective was to evaluate TLS for estimating tree volumes by comparison with reference volumes and volumes from ASMs. Methods We quantified the woody volume of 65 trees in Belgium (77 – 2.800 L; Pinus sylvestris, Fagus sylvatica, Larix decidua, Fraxinus excelsior) with QSMs and destructive reference measurements. We tested a volume expansion factor (VEF) approach by multiplying the solid and merchantable volume from QSM with literature VEF values. Key Results Stem volume was reliably estimated with TLS. Total volume was overestimated by +21% using original QSMs, by +9% and -12% using two sets of VEF-augmented QSMs, and by -7.3% using best-available allometric models. The most accurate method differed per site, and the prediction errors for each method varied considerably between sites. Conclusions VEF-augmented QSMs were only slightly better than original QSMs for estimating tree volume for common species in temperate forests. Despite satisfying estimates with ASMs, the model choice was a large source of uncertainty, and species-specific models did not always exist. Therefore, we advocate for further improving tree volume reconstructions with QSMs, especially for fine branches, instead of collecting more ground-truth data to calibrate VEF and allometric models. Promising developments such as improved coregistration and smarter filtering approaches are ongoing to further constrain volumetric errors in TLS-derived estimates.

Download Full-text

Assessing Wildfire Burn Severity and Its Relationship with Environmental Factors: A Case Study in Interior Alaska Boreal Forest

Remote Sensing ◽

10.3390/rs13101966 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1966

Author(s):

Christopher W Smith ◽

Santosh K Panda ◽

Uma S Bhatt ◽

Franz J Meyer ◽

Anushree Badola ◽

...

Keyword(s):

Boreal Forest ◽

Ground Truth ◽

Burn Severity ◽

Classification Methods ◽

Spectral Indices ◽

Ground Truth Data ◽

Burn Scar ◽

Interior Alaska ◽

Remote Sensing Methods ◽

The Relationship

In recent years, there have been rapid improvements in both remote sensing methods and satellite image availability that have the potential to massively improve burn severity assessments of the Alaskan boreal forest. In this study, we utilized recent pre- and post-fire Sentinel-2 satellite imagery of the 2019 Nugget Creek and Shovel Creek burn scars located in Interior Alaska to both assess burn severity across the burn scars and test the effectiveness of several remote sensing methods for generating accurate map products: Normalized Difference Vegetation Index (NDVI), Normalized Burn Ratio (NBR), and Random Forest (RF) and Support Vector Machine (SVM) supervised classification. We used 52 Composite Burn Index (CBI) plots from the Shovel Creek burn scar and 28 from the Nugget Creek burn scar for training classifiers and product validation. For the Shovel Creek burn scar, the RF and SVM machine learning (ML) classification methods outperformed the traditional spectral indices that use linear regression to separate burn severity classes (RF and SVM accuracy, 83.33%, versus NBR accuracy, 73.08%). However, for the Nugget Creek burn scar, the NDVI product (accuracy: 96%) outperformed the other indices and ML classifiers. In this study, we demonstrated that when sufficient ground truth data is available, the ML classifiers can be very effective for reliable mapping of burn severity in the Alaskan boreal forest. Since the performance of ML classifiers are dependent on the quantity of ground truth data, when sufficient ground truth data is available, the ML classification methods would be better at assessing burn severity, whereas with limited ground truth data the traditional spectral indices would be better suited. We also looked at the relationship between burn severity, fuel type, and topography (aspect and slope) and found that the relationship is site-dependent.

Download Full-text

Spectroscopic and deep learning-based approaches to identify and quantify cerebral microhemorrhages

Scientific Reports ◽

10.1038/s41598-021-88236-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Christian Crouzet ◽

Gwangjin Jeong ◽

Rachel H. Chae ◽

Krystal T. LoPresti ◽

Cody E. Dunn ◽

...

Keyword(s):

Deep Learning ◽

Prussian Blue ◽

Processing Speed ◽

Digital Pathology ◽

Ground Truth ◽

Individual Variability ◽

Rgb Images ◽

Cerebral Microhemorrhages ◽

Phasor Analysis ◽

Better Than

AbstractCerebral microhemorrhages (CMHs) are associated with cerebrovascular disease, cognitive impairment, and normal aging. One method to study CMHs is to analyze histological sections (5–40 μm) stained with Prussian blue. Currently, users manually and subjectively identify and quantify Prussian blue-stained regions of interest, which is prone to inter-individual variability and can lead to significant delays in data analysis. To improve this labor-intensive process, we developed and compared three digital pathology approaches to identify and quantify CMHs from Prussian blue-stained brain sections: (1) ratiometric analysis of RGB pixel values, (2) phasor analysis of RGB images, and (3) deep learning using a mask region-based convolutional neural network. We applied these approaches to a preclinical mouse model of inflammation-induced CMHs. One-hundred CMHs were imaged using a 20 × objective and RGB color camera. To determine the ground truth, four users independently annotated Prussian blue-labeled CMHs. The deep learning and ratiometric approaches performed better than the phasor analysis approach compared to the ground truth. The deep learning approach had the most precision of the three methods. The ratiometric approach has the most versatility and maintained accuracy, albeit with less precision. Our data suggest that implementing these methods to analyze CMH images can drastically increase the processing speed while maintaining precision and accuracy.

Download Full-text

Automatic Evaluation of Wheat Resistance to Fusarium Head Blight Using Dual Mask-RCNN Deep Learning Frameworks in Computer Vision

Remote Sensing ◽

10.3390/rs13010026 ◽

2020 ◽

Vol 13 (1) ◽

pp. 26

Author(s):

Wen-Hao Su ◽

Jiajing Zhang ◽

Ce Yang ◽

Rae Page ◽

Tamas Szinyei ◽

...

Keyword(s):

Fusarium Head Blight ◽

Ground Truth ◽

Wheat Breeding ◽

Head Blight ◽

Detection Rates ◽

Ground Truth Data ◽

Resistant Cultivars ◽

Feature Pyramid ◽

Rater Error ◽

Wheat Lines

In many regions of the world, wheat is vulnerable to severe yield and quality losses from the fungus disease of Fusarium head blight (FHB). The development of resistant cultivars is one means of ameliorating the devastating effects of this disease, but the breeding process requires the evaluation of hundreds of lines each year for reaction to the disease. These field evaluations are laborious, expensive, time-consuming, and are prone to rater error. A phenotyping cart that can quickly capture images of the spikes of wheat lines and their level of FHB infection would greatly benefit wheat breeding programs. In this study, mask region convolutional neural network (Mask-RCNN) allowed for reliable identification of the symptom location and the disease severity of wheat spikes. Within a wheat line planted in the field, color images of individual wheat spikes and their corresponding diseased areas were labeled and segmented into sub-images. Images with annotated spikes and sub-images of individual spikes with labeled diseased areas were used as ground truth data to train Mask-RCNN models for automatic image segmentation of wheat spikes and FHB diseased areas, respectively. The feature pyramid network (FPN) based on ResNet-101 network was used as the backbone of Mask-RCNN for constructing the feature pyramid and extracting features. After generating mask images of wheat spikes from full-size images, Mask-RCNN was performed to predict diseased areas on each individual spike. This protocol enabled the rapid recognition of wheat spikes and diseased areas with the detection rates of 77.76% and 98.81%, respectively. The prediction accuracy of 77.19% was achieved by calculating the ratio of the wheat FHB severity value of prediction over ground truth. This study demonstrates the feasibility of rapidly determining levels of FHB in wheat spikes, which will greatly facilitate the breeding of resistant cultivars.

Download Full-text