Video Frame Synthesis Combining Conventional and Event Cameras

Author(s):  
Stefano Pini ◽  
Guido Borghi ◽  
Roberto Vezzani

Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to conventional cameras, their use is limited due to the scarce compatibility of asynchronous event streams with traditional data processing and vision algorithms. In this regard, we present a framework that synthesizes RGB frames from the output stream of an event camera and an initial or a periodic set of color key-frames. The deep learning-based frame synthesis framework consists of an adversarial image-to-image architecture and a recurrent module. Two public event-based datasets, DDD17 and MVSEC, are used to obtain qualitative and quantitative per-pixel and perceptual results. In addition, we converted into event frames two additional well-known datasets, namely Kitti and Cityscapes, in order to present semantic results, in terms of object detection and semantic segmentation accuracy. Extensive experimental evaluation confirms the quality and the capability of the proposed approach of synthesizing frame sequences from color key-frames and sequences of intermediate events.

2021 ◽  
Vol 32 (3) ◽  
Author(s):  
Dimitrios Bellos ◽  
Mark Basham ◽  
Tony Pridmore ◽  
Andrew P. French

AbstractOver recent years, many approaches have been proposed for the denoising or semantic segmentation of X-ray computed tomography (CT) scans. In most cases, high-quality CT reconstructions are used; however, such reconstructions are not always available. When the X-ray exposure time has to be limited, undersampled tomograms (in terms of their component projections) are attained. This low number of projections offers low-quality reconstructions that are difficult to segment. Here, we consider CT time-series (i.e. 4D data), where the limited time for capturing fast-occurring temporal events results in the time-series tomograms being necessarily undersampled. Fortunately, in these collections, it is common practice to obtain representative highly sampled tomograms before or after the time-critical portion of the experiment. In this paper, we propose an end-to-end network that can learn to denoise and segment the time-series’ undersampled CTs, by training with the earlier highly sampled representative CTs. Our single network can offer two desired outputs while only training once, with the denoised output improving the accuracy of the final segmentation. Our method is able to outperform state-of-the-art methods in the task of semantic segmentation and offer comparable results in regard to denoising. Additionally, we propose a knowledge transfer scheme using synthetic tomograms. This not only allows accurate segmentation and denoising using less real-world data, but also increases segmentation accuracy. Finally, we make our datasets, as well as the code, publicly available.


2021 ◽  
Vol 6 (1) ◽  
pp. e000898
Author(s):  
Andrea Peroni ◽  
Anna Paviotti ◽  
Mauro Campigotto ◽  
Luis Abegão Pinto ◽  
Carlo Alberto Cutolo ◽  
...  

ObjectiveTo develop and test a deep learning (DL) model for semantic segmentation of anatomical layers of the anterior chamber angle (ACA) in digital gonio-photographs.Methods and analysisWe used a pilot dataset of 274 ACA sector images, annotated by expert ophthalmologists to delineate five anatomical layers: iris root, ciliary body band, scleral spur, trabecular meshwork and cornea. Narrow depth-of-field and peripheral vignetting prevented clinicians from annotating part of each image with sufficient confidence, introducing a degree of subjectivity and features correlation in the ground truth. To overcome these limitations, we present a DL model, designed and trained to perform two tasks simultaneously: (1) maximise the segmentation accuracy within the annotated region of each frame and (2) identify a region of interest (ROI) based on local image informativeness. Moreover, our calibrated model provides results interpretability returning pixel-wise classification uncertainty through Monte Carlo dropout.ResultsThe model was trained and validated in a 5-fold cross-validation experiment on ~90% of available data, achieving ~91% average segmentation accuracy within the annotated part of each ground truth image of the hold-out test set. An appropriate ROI was successfully identified in all test frames. The uncertainty estimation module located correctly inaccuracies and errors of segmentation outputs.ConclusionThe proposed model improves the only previously published work on gonio-photographs segmentation and may be a valid support for the automatic processing of these images to evaluate local tissue morphology. Uncertainty estimation is expected to facilitate acceptance of this system in clinical settings.


2021 ◽  
Vol 2099 (1) ◽  
pp. 012021
Author(s):  
A V Dobshik ◽  
A A Tulupov ◽  
V B Berikov

Abstract This paper presents an automatic algorithm for the segmentation of areas affected by an acute stroke in the non-contrast computed tomography brain images. The proposed algorithm is designed for learning in a weakly supervised scenario when some images are labeled accurately, and some images are labeled inaccurately. Wrong labels appear as a result of inaccuracy made by a radiologist in the process of manual annotation of computed tomography images. We propose methods for solving the segmentation problem in the case of inaccurately labeled training data. We use the U-Net neural network architecture with several modifications. Experiments on real computed tomography scans show that the proposed methods increase the segmentation accuracy.


Sensor Review ◽  
2021 ◽  
Vol 41 (4) ◽  
pp. 382-389
Author(s):  
Laura Duarte ◽  
Mohammad Safeea ◽  
Pedro Neto

Purpose This paper proposes a novel method for human hands tracking using data from an event camera. The event camera detects changes in brightness, measuring motion, with low latency, no motion blur, low power consumption and high dynamic range. Captured frames are analysed using lightweight algorithms reporting three-dimensional (3D) hand position data. The chosen pick-and-place scenario serves as an example input for collaborative human–robot interactions and in obstacle avoidance for human–robot safety applications. Design/methodology/approach Events data are pre-processed into intensity frames. The regions of interest (ROI) are defined through object edge event activity, reducing noise. ROI features are extracted for use in-depth perception. Findings Event-based tracking of human hand demonstrated feasible, in real time and at a low computational cost. The proposed ROI-finding method reduces noise from intensity images, achieving up to 89% of data reduction in relation to the original, while preserving the features. The depth estimation error in relation to ground truth (measured with wearables), measured using dynamic time warping and using a single event camera, is from 15 to 30 millimetres, depending on the plane it is measured. Originality/value Tracking of human hands in 3 D space using a single event camera data and lightweight algorithms to define ROI features (hands tracking in space).


2020 ◽  
Vol 8 (3) ◽  
pp. 188
Author(s):  
Fangfang Liu ◽  
Ming Fang

Image semantic segmentation technology has been increasingly applied in many fields, for example, autonomous driving, indoor navigation, virtual reality and augmented reality. However, underwater scenes, where there is a huge amount of marine biological resources and irreplaceable biological gene banks that need to be researched and exploited, are limited. In this paper, image semantic segmentation technology is exploited to study underwater scenes. We extend the current state-of-the-art semantic segmentation network DeepLabv3 + and employ it as the basic framework. First, the unsupervised color correction method (UCM) module is introduced to the encoder structure of the framework to improve the quality of the image. Moreover, two up-sampling layers are added to the decoder structure to retain more target features and object boundary information. The model is trained by fine-tuning and optimizing relevant parameters. Experimental results indicate that the image obtained by our method demonstrates better performance in improving the appearance of the segmented target object and avoiding its pixels from mingling with other class’s pixels, enhancing the segmentation accuracy of the target boundaries and retaining more feature information. Compared with the original method, our method improves the segmentation accuracy by 3%.


2019 ◽  
Vol 10 (1) ◽  
pp. 13 ◽  
Author(s):  
Shichao Zhang ◽  
Zhe Zhang ◽  
Libo Sun ◽  
Wenhu Qin

Generally, most approaches using methods such as cropping, rotating, and flipping achieve more data to train models for improving the accuracy of detection and segmentation. However, due to the difficulties of labeling such data especially semantic segmentation data, those traditional data augmentation methodologies cannot help a lot when the training set is really limited. In this paper, a model named OFA-Net (One For All Network) is proposed to combine object detection and semantic segmentation tasks. Meanwhile, using a strategy called “1-N Alternation” to train the OFA-Net model, which can make a fusion of features from detection and segmentation data. The results show that object detection data can be recruited to better the segmentation accuracy performance, and furthermore, segmentation data assist a lot to enhance the confidence of predictions for object detection. Finally, the OFA-Net model is trained without traditional data augmentation methodologies and tested on the KITTI test server. The model works well on the KITTI Road Segmentation challenge and can do a good job on the object detection task.


Author(s):  
Yuting Qin ◽  
Yuren Chen ◽  
Kunhui Lin

Roads should deliver appropriate information to drivers and thus induce safer driving behavior. This concept is also known as “self-explaining roads” (SERs). Previous studies have demonstrated that understanding how road characteristics affect drivers’ speed choices is the key to SERs. Thus, in order to reduce traffic casualties via engineering methods, this study aimed to establish a speed decision model based on visual road information and to propose an innovative method of SER design. It was assumed that driving speed is determined by road geometry and modified by the environment. Lane fitting and image semantic segmentation techniques were used to extract road features. Field experiments were conducted in Tibet, China, and 1375 typical road scenarios were picked out. By controlling variables, the driving speed stimulated by each piece of information was evaluated. Prediction models for geometry-determined speed and environment-modified speed were built using the random forest algorithm and convolutional neural network. Results showed that the curvature of the right boundary in “near scene” and “middle scene”, and the density of roadside greenery and residences play an important role in regulating driving speed. The findings of this research could provide qualitative and quantitative suggestions for the optimization of road design that would guide drivers to choose more reasonable driving speeds.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7730
Author(s):  
◽  

Semantic segmentation is one of the most active research topics in computer vision with the goal to assign dense semantic labels for all pixels in a given image. In this paper, we introduce HFEN (Hierarchical Feature Extraction Network), a lightweight network to reach a balance between inference speed and segmentation accuracy. Our architecture is based on an encoder-decoder framework. The input images are down-sampled through an efficient encoder to extract multi-layer features. Then the extracted features are fused via a decoder, where the global contextual information and spatial information are aggregated for final segmentations with real-time performance. Extensive experiments have been conducted on two standard benchmarks, Cityscapes and Camvid, where our network achieved superior performance on NVIDIA 2080Ti.


2021 ◽  
Author(s):  
Akhmedkhan Shabanov ◽  
Daja Schichler ◽  
Constantin Pape ◽  
Sara Cuylen-Haering ◽  
Anna Kreshuk

We introduce a simple mechanism by which a CNN trained to perform semantic segmentation of individual images can be re-trained - with no additional annotations - to improve its performance for segmentation of videos. We put the segmentation CNN in a Siamese setup with shared weights and train both for segmentation accuracy on annotated images and for segmentation similarity on unlabelled consecutive video frames. Our main application is live microscopy imaging of membrane-less organelles where the fluorescent groundtruth for virtual staining can only be acquired for individual frames. The method is directly applicable to other microscopy modalities, as we demonstrate by experiments on the Cell Segmentation Benchmark. Our code is available at https://github.com/kreshuklab/ learning-temporal-consistency.


2021 ◽  
Author(s):  
Stepan Tulyakov ◽  
Daniel Gehrig ◽  
Stamatios Georgoulis ◽  
Julius Erbach ◽  
Mathias Gehrig ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document