A tool for semi-automatic ground truth annotation of traffic videos

2020 ◽  
Vol 2020 (16) ◽  
pp. 200-1-200-7
Author(s):  
Florian Groh ◽  
Dominik Schörkhuber ◽  
Margrit Gelautz

We have developed a semi-automatic annotation tool – “CVL Annotator” – for bounding box ground truth generation in videos. Our research is particularly motivated by the need for reference annotations of challenging nighttime traffic scenes with highly dynamic lighting conditions due to reflections, headlights and halos from oncoming traffic. Our tool incorporates a suite of different state-of-the-art tracking algorithms in order to minimize the amount of human input necessary to generate high-quality ground truth data. We focus our user interface on the premise of minimizing user interaction and visualizing all information relevant to the user at a glance. We perform a preliminary user study to measure the amount of time and clicks necessary to produce ground truth annotations of video traffic scenes and evaluate the accuracy of the final annotation results.

2021 ◽  
Vol 7 (2) ◽  
pp. 21
Author(s):  
Roland Perko ◽  
Manfred Klopschitz ◽  
Alexander Almer ◽  
Peter M. Roth

Many scientific studies deal with person counting and density estimation from single images. Recently, convolutional neural networks (CNNs) have been applied for these tasks. Even though often better results are reported, it is often not clear where the improvements are resulting from, and if the proposed approaches would generalize. Thus, the main goal of this paper was to identify the critical aspects of these tasks and to show how these limit state-of-the-art approaches. Based on these findings, we show how to mitigate these limitations. To this end, we implemented a CNN-based baseline approach, which we extended to deal with identified problems. These include the discovery of bias in the reference data sets, ambiguity in ground truth generation, and mismatching of evaluation metrics w.r.t. the training loss function. The experimental results show that our modifications allow for significantly outperforming the baseline in terms of the accuracy of person counts and density estimation. In this way, we get a deeper understanding of CNN-based person density estimation beyond the network architecture. Furthermore, our insights would allow to advance the field of person density estimation in general by highlighting current limitations in the evaluation protocols.


Author(s):  
Thibault Laugel ◽  
Marie-Jeanne Lesot ◽  
Christophe Marsala ◽  
Xavier Renard ◽  
Marcin Detyniecki

Post-hoc interpretability approaches have been proven to be powerful tools to generate explanations for the predictions made by a trained black-box model. However, they create the risk of having explanations that are a result of some artifacts learned by the model instead of actual knowledge from the data. This paper focuses on the case of counterfactual explanations and asks whether the generated instances can be justified, i.e. continuously connected to some ground-truth data. We evaluate the risk of generating unjustified counterfactual examples by investigating the local neighborhoods of instances whose predictions are to be explained and show that this risk is quite high for several datasets. Furthermore, we show that most state of the art approaches do not differentiate justified from unjustified counterfactual examples, leading to less useful explanations.


2021 ◽  
Vol 13 (13) ◽  
pp. 2619
Author(s):  
Joao Fonseca ◽  
Georgios Douzas ◽  
Fernando Bacao

In remote sensing, Active Learning (AL) has become an important technique to collect informative ground truth data ``on-demand'' for supervised classification tasks. Despite its effectiveness, it is still significantly reliant on user interaction, which makes it both expensive and time consuming to implement. Most of the current literature focuses on the optimization of AL by modifying the selection criteria and the classifiers used. Although improvements in these areas will result in more effective data collection, the use of artificial data sources to reduce human--computer interaction remains unexplored. In this paper, we introduce a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL. The implementation of the proposed AL framework is done using Geometric SMOTE as the data generator. We compare the new AL framework to the original one using similar acquisition functions and classifiers over three AL-specific performance metrics in seven benchmark datasets. We show that this modification of the AL framework significantly reduces cost and time requirements for a successful AL implementation in all of the datasets used in the experiment.


Author(s):  
L. Thapa ◽  
H. Naseer ◽  
S. El-Kaiy ◽  
T. Bartoschek

<p><strong>Abstract.</strong> Geoinformatics (GI) education is widely used as a spatial visualization-interdisciplinary tools for its ability to understand the geographical phenomenon around us in the past and model the future scenario. Its global importance and usage have made the need of disseminating the education with public and school students. The MSc. Students of different backgrounds at Institute for Geoinformatics in the University of Munster were involved in one of such works through the seminar cum project on ‘Transdisciplinary education in Geoinformatics’ through GI@School Lab with the aim of engaging high school students on applying GI knowledge on Agriculture. The grade 12 students were presented with the ongoing GI empowered research projects at first such that the school students developed the project ideas of their interests to use GI on agricultural sectors based on which MSc students developed 4 different projects and Growth Condition (Sensors) is one of them. This project aims to determine the best suited condition for Salad plants growth based on the size of the Salad leaves measured after monitoring the growth of the plants by planting them on 4 plastic boxes filled with same soil type but in different lighting conditions and water conditions to be measured by the concerned sensors to after the 8 weeks of indoor growth. The project execution week took place as the 5-day workshop and feedbacks were taken as questionnaire surveys from the participated students and concerned teachers for the project evaluation. The sensors-collected data could even serve as the ground truth data of a citizen observatory projects for Copernicus in-situ component. The whole project aims at reducing generational gaps between the students by bringing them the opportunity for knowledge co-creation through transdisciplinary projects on agricultural sector using GI technologies.</p>


Author(s):  
M. Galar ◽  
R. Sesma ◽  
C. Ayala ◽  
L. Albizua ◽  
C. Aranda

Abstract. Copernicus program via its Sentinel missions is making earth observation more accessible and affordable for everybody. Sentinel-2 images provide multi-spectral information every 5 days for each location. However, the maximum spatial resolution of its bands is 10m for RGB and near-infrared bands. Increasing the spatial resolution of Sentinel-2 images without additional costs, would make any posterior analysis more accurate. Most approaches on super-resolution for Sentinel-2 have focused on obtaining 10m resolution images for those at lower resolutions (20m and 60m), taking advantage of the information provided by bands of finer resolutions (10m). Otherwise, our focus is on increasing the resolution of the 10m bands, that is, super-resolving 10m bands to 2.5m resolution, where no additional information is available. This problem is known as single-image super-resolution and deep learning-based approaches have become the state-of-the-art for this problem on standard images. Obviously, models learned for standard images do not translate well to satellite images. Hence, the problem is how to train a deep learning model for super-resolving Sentinel-2 images when no ground truth exist (Sentinel-2 images at 2.5m). We propose a methodology for learning Convolutional Neural Networks for Sentinel-2 image super-resolution making use of images from other sensors having a high similarity with Sentinel-2 in terms of spectral bands, but greater spatial resolution. Our proposal is tested with a state-of-the-art neural network showing that it can be useful for learning to increase the spatial resolution of RGB and near-infrared bands of Sentinel-2.


AI ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 444-463
Author(s):  
Daniel Weber ◽  
Clemens Gühmann ◽  
Thomas Seel

Inertial-sensor-based attitude estimation is a crucial technology in various applications, from human motion tracking to autonomous aerial and ground vehicles. Application scenarios differ in characteristics of the performed motion, presence of disturbances, and environmental conditions. Since state-of-the-art attitude estimators do not generalize well over these characteristics, their parameters must be tuned for the individual motion characteristics and circumstances. We propose RIANN, a ready-to-use, neural network-based, parameter-free, real-time-capable inertial attitude estimator, which generalizes well across different motion dynamics, environments, and sampling rates, without the need for application-specific adaptations. We gather six publicly available datasets of which we exploit two datasets for the method development and the training, and we use four datasets for evaluation of the trained estimator in three different test scenarios with varying practical relevance. Results show that RIANN outperforms state-of-the-art attitude estimation filters in the sense that it generalizes much better across a variety of motions and conditions in different applications, with different sensor hardware and different sampling frequencies. This is true even if the filters are tuned on each individual test dataset, whereas RIANN was trained on completely separate data and has never seen any of these test datasets. RIANN can be applied directly without adaptations or training and is therefore expected to enable plug-and-play solutions in numerous applications, especially when accuracy is crucial but no ground-truth data is available for tuning or when motion and disturbance characteristics are uncertain. We made RIANN publicly available.


2020 ◽  
Vol 2020 (17) ◽  
pp. 36-1-36-7
Author(s):  
Umamaheswaran RAMAN KUMAR ◽  
Inge COUDRON ◽  
Steven PUTTEMANS ◽  
Patrick VANDEWALLE

Applications ranging from simple visualization to complex design require 3D models of indoor environments. This has given rise to advancements in the field of automated reconstruction of such models. In this paper, we review several state-of-the-art metrics proposed for geometric comparison of 3D models of building interiors. We evaluate their performance on a real-world dataset and propose one tailored metric which can be used to assess the quality of the reconstructed model. In addition, the proposed metric can also be easily visualized to highlight the regions or structures where the reconstruction failed. To demonstrate the versatility of the proposed metric we conducted experiments on various interior models by comparison with ground truth data created by expert Blender artists. The results of the experiments were then used to improve the reconstruction pipeline.


2017 ◽  
Author(s):  
D. Michael Ando ◽  
Cory Y. McLean ◽  
Marc Berndl

AbstractImage-based screening is a powerful technique to reveal how chemical, genetic, and environmental perturbations affect cellular state. Its potential is restricted by the current analysis algorithms that target a small number of cellular phenotypes and rely on expert-engineered image features. Newer algorithms that learn how to represent an image are limited by the small amount of labeled data for ground-truth, a common problem for scientific projects. We demonstrate a sensitive and robust method for distinguishing cellular phenotypes that requires no additional ground-truth data or training. It achieves state-of-the-art performance classifying drugs by similar molecular mechanism, using a Deep Metric Network that has been pre-trained on consumer images and a transformation that improves sensitivity to biological variation. However, our method is not limited to classification into predefined categories. It provides a continuous measure of the similarity between cellular phenotypes that can also detect subtle differences such as from increasing dose. The rich, biologically-meaningful image representation that our method provides can help therapy development by supporting high-throughput investigations, even exploratory ones, with more sophisticated and disease-relevant models.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1165
Author(s):  
Fangming Wu ◽  
Bingfang Wu ◽  
Miao Zhang ◽  
Hongwei Zeng ◽  
Fuyou Tian

In situ ground truth data are an important requirement for producing accurate cropland type map, and this is precisely what is lacking at vast scales. Although volunteered geographic information (VGI) has been proven as a possible solution for in situ data acquisition, processing and extracting valuable information from millions of pictures remains challenging. This paper targets the detection of specific crop types from crowdsourced road view photos. A first large, public, multiclass road view crop photo dataset named iCrop was established for the development of crop type detection with deep learning. Five state-of-the-art deep convolutional neural networks including InceptionV4, DenseNet121, ResNet50, MobileNetV2, and ShuffleNetV2 were employed to compare the baseline performance. ResNet50 outperformed the others according to the overall accuracy (87.9%), and ShuffleNetV2 outperformed the others according to the efficiency (13 FPS). The decision fusion schemes major voting was used to further improve crop identification accuracy. The results clearly demonstrate the superior accuracy of the proposed decision fusion over the other non-fusion-based methods in crop type detection of imbalanced road view photos dataset. The voting method achieved higher mean accuracy (90.6–91.1%) and can be leveraged to classify crop type in crowdsourced road view photos.


2019 ◽  
Vol 11 (10) ◽  
pp. 1157 ◽  
Author(s):  
Jorge Fuentes-Pacheco ◽  
Juan Torres-Olivares ◽  
Edgar Roman-Rangel ◽  
Salvador Cervantes ◽  
Porfirio Juarez-Lopez ◽  
...  

Crop segmentation is an important task in Precision Agriculture, where the use of aerial robots with an on-board camera has contributed to the development of new solution alternatives. We address the problem of fig plant segmentation in top-view RGB (Red-Green-Blue) images of a crop grown under open-field difficult circumstances of complex lighting conditions and non-ideal crop maintenance practices defined by local farmers. We present a Convolutional Neural Network (CNN) with an encoder-decoder architecture that classifies each pixel as crop or non-crop using only raw colour images as input. Our approach achieves a mean accuracy of 93.85% despite the complexity of the background and a highly variable visual appearance of the leaves. We make available our CNN code to the research community, as well as the aerial image data set and a hand-made ground truth segmentation with pixel precision to facilitate the comparison among different algorithms.


Sign in / Sign up

Export Citation Format

Share Document