scholarly journals Pyramid Bayesian Method for Model Uncertainty Evaluation of Semantic Segmentation in Autonomous Driving

Author(s):  
Yang Zhao ◽  
Wei Tian ◽  
Hong Cheng

AbstractWith the fast-developing deep learning models in the field of autonomous driving, the research on the uncertainty estimation of deep learning models has also prevailed. Herein, a pyramid Bayesian deep learning method is proposed for the model uncertainty evaluation of semantic segmentation. Semantic segmentation is one of the most important perception problems in understanding visual scene, which is critical for autonomous driving. This study to optimize Bayesian SegNet for uncertainty evaluation. This paper first simplifies the network structure of Bayesian SegNet by reducing the number of MC-Dropout layer and then introduces the pyramid pooling module to improve the performance of Bayesian SegNet. mIoU and mPAvPU are used as evaluation matrics to test the proposed method on the public Cityscapes dataset. The experimental results show that the proposed method improves the sampling effect of the Bayesian SegNet, shortens the sampling time, and improves the network performance.

2021 ◽  
Vol 13 (13) ◽  
pp. 2524
Author(s):  
Ziyi Chen ◽  
Dilong Li ◽  
Wentao Fan ◽  
Haiyan Guan ◽  
Cheng Wang ◽  
...  

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.


2021 ◽  
Author(s):  
Benjamin Kellenberger ◽  
Devis Tuia ◽  
Dan Morris

<p>Ecological research like wildlife censuses increasingly relies on data on the scale of Terabytes. For example, modern camera trap datasets contain millions of images that require prohibitive amounts of manual labour to be annotated with species, bounding boxes, and the like. Machine learning, especially deep learning [3], could greatly accelerate this task through automated predictions, but involves expansive coding and expert knowledge.</p><p>In this abstract we present AIDE, the Annotation Interface for Data-driven Ecology [2]. In a first instance, AIDE is a web-based annotation suite for image labelling with support for concurrent access and scalability, up to the cloud. In a second instance, it tightly integrates deep learning models into the annotation process through active learning [7], where models learn from user-provided labels and in turn select the most relevant images for review from the large pool of unlabelled ones (Fig. 1). The result is a system where users only need to label what is required, which saves time and decreases errors due to fatigue.</p><p><img src="https://contentmanager.copernicus.org/fileStorageProxy.php?f=gnp.0402be60f60062057601161/sdaolpUECMynit/12UGE&app=m&a=0&c=131251398e575ac9974634bd0861fadc&ct=x&pn=gnp.elif&d=1" alt=""></p><p><em>Fig. 1: AIDE offers concurrent web image labelling support and uses annotations and deep learning models in an active learning loop.</em></p><p>AIDE includes a comprehensive set of built-in models, such as ResNet [1] for image classification, Faster R-CNN [5] and RetinaNet [4] for object detection, and U-Net [6] for semantic segmentation. All models can be customised and used without having to write a single line of code. Furthermore, AIDE accepts any third-party model with minimal implementation requirements. To complete the package, AIDE offers both user annotation and model prediction evaluation, access control, customisable model training, and more, all through the web browser.</p><p>AIDE is fully open source and available under https://github.com/microsoft/aerial_wildlife_detection.</p><p> </p><p><strong>References</strong></p>


Author(s):  
Bi-ke Chen ◽  
Chen Gong ◽  
Jian Yang

Semantic Segmentation (SS) partitions an image into several coherent semantically meaningful parts, and classifies each part into one of the pre-determined classes. In this paper, we argue that existing SS methods cannot be reliably applied to autonomous driving system as they ignore the different importance levels of distinct classes for safe-driving. For example, pedestrians in the scene are much more important than sky when driving a car, so their segmentations should be as accurate as possible. To incorporate the importance information possessed by various object classes, this paper designs an "Importance-Aware Loss" (IAL) that specifically emphasizes the critical objects for autonomous driving. IAL operates under a hierarchical structure, and the classes with different importance are located in different levels so that they are assigned distinct weights. Furthermore, we derive the forward and backward propagation rules for IAL and apply them to deep neural networks for realizing SS in intelligent driving system. The experiments on CamVid and Cityscapes datasets reveal that by employing the proposed loss function, the existing deep learning models including FCN, SegNet and ENet are able to consistently obtain the improved segmentation results on the pre-defined important classes for safe-driving.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8072
Author(s):  
Yu-Bang Chang ◽  
Chieh Tsai ◽  
Chang-Hong Lin ◽  
Poki Chen

As the techniques of autonomous driving become increasingly valued and universal, real-time semantic segmentation has become very popular and challenging in the field of deep learning and computer vision in recent years. However, in order to apply the deep learning model to edge devices accompanying sensors on vehicles, we need to design a structure that has the best trade-off between accuracy and inference time. In previous works, several methods sacrificed accuracy to obtain a faster inference time, while others aimed to find the best accuracy under the condition of real time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to general semantic segmentation methods. As a result, we propose a network architecture based on a dual encoder and a self-attention mechanism. Compared with preceding works, we achieved a 78.6% mIoU with a speed of 39.4 FPS with a 1024 × 2048 resolution on a Cityscapes test submission.


Author(s):  
S. T. Yekeen ◽  
A.-L. Balogun

Abstract. This study developed a novel deep learning oil spill instance segmentation model using Mask-Region-based Convolutional Neural Network (Mask R-CNN) model which is a state-of-the-art computer vision model. A total of 2882 imageries containing oil spill, look-alike, ship, and land area after conducting different pre-processing activities were acquired. These images were subsequently sub-divided into 88% training and 12% for testing, equating to 2530 and 352 images respectively. The model training was conducted using transfer learning on a pre-trained ResNet 101 with COCO data as a backbone in combination with Feature Pyramid Network (FPN) architecture for the extraction of features at 30 epochs with 0.001 learning rate. The model’s performance was evaluated using precision, recall, and F1-measure which shows a higher performance than other existing models with value of 0.964, 0.969 and 0.968 respectively. As a specialized task, the study concluded that the developed deep learning instance segmentation model (Mask R-CNN) performs better than conventional machine learning models and semantic segmentation deep learning models in detection and segmentation of marine oil spill.


2021 ◽  
Vol 13 (16) ◽  
pp. 3087
Author(s):  
Seonkyeong Seong ◽  
Jaewan Choi

In this study, building extraction in aerial images was performed using csAG-HRNet by applying HRNet-v2 in combination with channel and spatial attention gates. HRNet-v2 consists of transition and fusion processes based on subnetworks according to various resolutions. The channel and spatial attention gates were applied in the network to efficiently learn important features. A channel attention gate assigns weights in accordance with the importance of each channel, and a spatial attention gate assigns weights in accordance with the importance of each pixel position for the entire channel. In csAG-HRNet, csAG modules consisting of a channel attention gate and a spatial attention gate were applied to each subnetwork of stage and fusion modules in the HRNet-v2 network. In experiments using two datasets, it was confirmed that csAG-HRNet could minimize false detections based on the shapes of large buildings and small nonbuilding objects compared to existing deep learning models.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
Y Li ◽  
S Rao ◽  
A Hassaine ◽  
R Ramakrishnan ◽  
Y Zhu ◽  
...  

Abstract Background Forecasting incident heart failure is a critical demand for prevention. Recent research suggested the superior performance of deep learning models on the prediction tasks using electronic health records. However, even with a relatively accurate predictive performance, the major impediments to the wider use of deep learning models for clinical decision making are the difficulties of assigning a level of confidence to model predictions and the interpretability of predictions. Purpose We aimed to develop a deep learning framework for more accurate incident heart failure prediction, with provision of measures of uncertainty and interpretability. Methods We used a longitudinal linked electronic health records dataset, Clinical Practice Research Datalink, involving 788,880 patients, 8.3% of whom had an incident heart failure diagnosis. To embed the uncertainty estimation mechanism into the deep learning models, we developed a probabilistic framework based on a novel transformer deep learning model: deep Bayesian Gaussian processes (DBGP). We investigated the performance of incident heart failure prediction and uncertainty estimation for the model and validated it using an external held-out dataset. Diagnoses, medications, and age for each encounter were included as predictors. By comparing the uncertainty, we investigated the possibility of identifying the correct predictions from wrong ones to avoid potential misclassification. Using model distillation meant to mimic a well-trained complex model with simple models, we investigated the importance of associations between diagnoses, medications and heart failure with an interpretable linear regression component learned from DBGP. Results The DBGP achieved high precision with 0.941 as AUROC for external validation. More importantly, it showed the uncertainty information could distinguish the correct predictions from wrong ones, with significant difference (p-value with 500 samples) between distribution of uncertainties for negative predictions (3.21e-69 between true negative and false negative), and positive predictions (3.39e-22 between true positive and false positive). Utilising the distilled model, we can specify the contribution of each diagnosis and medication to heart failure prediction. For instance, Losartan/Fosinopril, Bisoprolol and Left bundle-branch block showed strong association to heart failure incidence with coefficient 0.11 (95% CI: 0.10, 0.12), 0.09 (0.08, 0.11) and 0.09 (0.07, 0.11) respectively; Peritoneal adhesions, Trochanteric bursitis and Galactorrhea showed strong disassociations with coefficient −0.07 (−0.09, −0.05), −0.07 (−0.09, −0.04) and −0.06 (−0.08, −0.04) individually. Conclusions Our novel probabilistic deep learning framework adds a measure of uncertainty the prediction and helps to mitigate misclassification. Model distillation provides an opportunity to interpret deep learning models and offers a data-driven perspective for risk factor analysis. Funding Acknowledgement Type of funding source: Public Institution(s). Main funding source(s): Oxford Martin School,University of Oxford; NIHR Oxford Biomedical Research Centre, University of Oxford


Sign in / Sign up

Export Citation Format

Share Document