DropConnect is effective in modeling uncertainty of Bayesian deep networks

AbstractDeep neural networks (DNNs) have achieved state-of-the-art performance in many important domains, including medical diagnosis, security, and autonomous driving. In domains where safety is highly critical, an erroneous decision can result in serious consequences. While a perfect prediction accuracy is not always achievable, recent work on Bayesian deep networks shows that it is possible to know when DNNs are more likely to make mistakes. Knowing what DNNs do not know is desirable to increase the safety of deep learning technology in sensitive applications; Bayesian neural networks attempt to address this challenge. Traditional approaches are computationally intractable and do not scale well to large, complex neural network architectures. In this paper, we develop a theoretical framework to approximate Bayesian inference for DNNs by imposing a Bernoulli distribution on the model weights. This method called Monte Carlo DropConnect (MC-DropConnect) gives us a tool to represent the model uncertainty with little change in the overall model structure or computational cost. We extensively validate the proposed algorithm on multiple network architectures and datasets for classification and semantic segmentation tasks. We also propose new metrics to quantify uncertainty estimates. This enables an objective comparison between MC-DropConnect and prior approaches. Our empirical results demonstrate that the proposed framework yields significant improvement in both prediction accuracy and uncertainty estimation quality compared to the state of the art.

Download Full-text

The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation

International Journal of Computer Vision ◽

10.1007/s11263-021-01511-6 ◽

2021 ◽

Author(s):

Hermann Blum ◽

Paul-Edouard Sarlin ◽

Juan Nieto ◽

Roland Siegwart ◽

Cesar Cadena

Keyword(s):

Anomaly Detection ◽

Bayesian Learning ◽

State Of The Art ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Uncertainty Estimation ◽

Detection Methods ◽

Uncertainty Estimates ◽

Segmentation Models ◽

Blind Spots

AbstractDeep learning has enabled impressive progress in the accuracy of semantic segmentation. Yet, the ability to estimate uncertainty and detect failure is key for safety-critical applications like autonomous driving. Existing uncertainty estimates have mostly been evaluated on simple tasks, and it is unclear whether these methods generalize to more complex scenarios. We present Fishyscapes, the first public benchmark for anomaly detection in a real-world task of semantic segmentation for urban driving. It evaluates pixel-wise uncertainty estimates towards the detection of anomalous objects. We adapt state-of-the-art methods to recent semantic segmentation models and compare uncertainty estimation approaches based on softmax confidence, Bayesian learning, density estimation, image resynthesis, as well as supervised anomaly detection methods. Our results show that anomaly detection is far from solved even for ordinary situations, while our benchmark allows measuring advancements beyond the state-of-the-art. Results, data and submission information can be found at https://fishyscapes.com/.

Download Full-text

SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks

ACM Transactions on Graphics ◽

10.1145/3450284 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-13

Author(s):

Lumin Yang ◽

Jiajie Zhuang ◽

Hongbo Fu ◽

Xiangzhi Wei ◽

Kun Zhou ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Large Scale ◽

State Of The Art ◽

Semantic Segmentation ◽

Structure Information ◽

Graph Neural Networks ◽

Node Labels ◽

Point Level

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.

Download Full-text

Y-Net: Dual-branch Joint Network for Semantic Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3460940 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Yizhen Chen ◽

Haifeng Hu

Keyword(s):

Feature Vector ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Semantic Segmentation ◽

Global Context ◽

Multi Level ◽

The One ◽

Public Datasets ◽

High Level

Most existing segmentation networks are built upon a “ U -shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in improving segmentation performance, there are two main drawbacks. On the one hand, the introduction of low-level features brings a significant increase in calculations without an obvious performance gain. On the other hand, general strategies of feature aggregation such as addition and concatenation fuse features without considering the usefulness of each feature vector, which mixes the useful information with massive noises. In this article, we abandon the traditional “ U -shaped” architecture and propose Y-Net, a dual-branch joint network for accurate semantic segmentation. Specifically, it only aggregates the high-level features with low-resolution and utilizes the global context guidance generated by the first branch to refine the second branch. The dual branches are effectively connected through a Semantic Enhancing Module, which can be regarded as the combination of spatial attention and channel attention. We also design a novel Channel-Selective Decoder (CSD) to adaptively integrate features from different receptive fields by assigning specific channelwise weights, where the weights are input-dependent. Our Y-Net is capable of breaking through the limit of singe-branch network and attaining higher performance with less computational cost than “ U -shaped” structure. The proposed CSD can better integrate useful information and suppress interference noises. Comprehensive experiments are carried out on three public datasets to evaluate the effectiveness of our method. Eventually, our Y-Net achieves state-of-the-art performance on PASCAL VOC 2012, PASCAL Person-Part, and ADE20K dataset without pre-training on extra datasets.

Download Full-text

Semantic Image Segmentation with Deep Convolutional Neural Networks and Quick Shift

Symmetry ◽

10.3390/sym12030427 ◽

2020 ◽

Vol 12 (3) ◽

pp. 427 ◽

Cited By ~ 1

Author(s):

Sanxing Zhang ◽

Zhenhuan Ma ◽

Gang Zhang ◽

Tao Lei ◽

Rui Zhang ◽

...

Keyword(s):

Neural Networks ◽

Image Segmentation ◽

Convolutional Neural Networks ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Input Image ◽

Feature Representation ◽

Segmentation Algorithm ◽

Deep Convolutional Neural Networks ◽

Semantic Image Segmentation

Semantic image segmentation, as one of the most popular tasks in computer vision, has been widely used in autonomous driving, robotics and other fields. Currently, deep convolutional neural networks (DCNNs) are driving major advances in semantic segmentation due to their powerful feature representation. However, DCNNs extract high-level feature representations by strided convolution, which makes it impossible to segment foreground objects precisely, especially when locating object boundaries. This paper presents a novel semantic segmentation algorithm with DeepLab v3+ and super-pixel segmentation algorithm-quick shift. DeepLab v3+ is employed to generate a class-indexed score map for the input image. Quick shift is applied to segment the input image into superpixels. Outputs of them are then fed into a class voting module to refine the semantic segmentation results. Extensive experiments on proposed semantic image segmentation are performed over PASCAL VOC 2012 dataset, and results that the proposed method can provide a more efficient solution.

Download Full-text

A Novel Object-Based Deep Learning Framework for Semantic Segmentation of Very High-Resolution Remote Sensing Data: Comparison with Convolutional and Fully Convolutional Networks

Remote Sensing ◽

10.3390/rs11060684 ◽

2019 ◽

Vol 11 (6) ◽

pp. 684 ◽

Cited By ~ 17

Author(s):

Maria Papadomanolaki ◽

Maria Vakalopoulou ◽

Konstantinos Karantzalos

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Semantic Segmentation ◽

Novel Object ◽

Convolutional Networks ◽

Learning Framework ◽

Fully Convolutional Networks ◽

Object Based ◽

Deep Networks ◽

Very High

Deep learning architectures have received much attention in recent years demonstrating state-of-the-art performance in several segmentation, classification and other computer vision tasks. Most of these deep networks are based on either convolutional or fully convolutional architectures. In this paper, we propose a novel object-based deep-learning framework for semantic segmentation in very high-resolution satellite data. In particular, we exploit object-based priors integrated into a fully convolutional neural network by incorporating an anisotropic diffusion data preprocessing step and an additional loss term during the training process. Under this constrained framework, the goal is to enforce pixels that belong to the same object to be classified at the same semantic category. We compared thoroughly the novel object-based framework with the currently dominating convolutional and fully convolutional deep networks. In particular, numerous experiments were conducted on the publicly available ISPRS WGII/4 benchmark datasets, namely Vaihingen and Potsdam, for validation and inter-comparison based on a variety of metrics. Quantitatively, experimental results indicate that, overall, the proposed object-based framework slightly outperformed the current state-of-the-art fully convolutional networks by more than 1% in terms of overall accuracy, while intersection over union results are improved for all semantic categories. Qualitatively, man-made classes with more strict geometry such as buildings were the ones that benefit most from our method, especially along object boundaries, highlighting the great potential of the developed approach.

Download Full-text

Adaptive Context Encoding Module for Semantic Segmentation

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.10.ipas-027 ◽

2020 ◽

Vol 2020 (10) ◽

pp. 27-1-27-7

Author(s):

Congcong Wang ◽

Faouzi Alaya Cheikh ◽

Azeddine Beghdadi ◽

Ole Jakob Elle

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Experimental Studies ◽

Semantic Segmentation ◽

Multiple Scale ◽

Context Information ◽

Convolution Operation ◽

Sampling Locations ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

The object sizes in images are diverse, therefore, capturing multiple scale context information is essential for semantic segmentation. Existing context aggregation methods such as pyramid pooling module (PPM) and atrous spatial pyramid pooling (ASPP) employ different pooling size or atrous rate, such that multiple scale information is captured. However, the pooling sizes and atrous rates are chosen empirically. Rethinking of ASPP leads to our observation that learnable sampling locations of the convolution operation can endow the network learnable fieldof- view, thus the ability of capturing object context information adaptively. Following this observation, in this paper, we propose an adaptive context encoding (ACE) module based on deformable convolution operation where sampling locations of the convolution operation are learnable. Our ACE module can be embedded into other Convolutional Neural Networks (CNNs) easily for context aggregation. The effectiveness of the proposed module is demonstrated on Pascal-Context and ADE20K datasets. Although our proposed ACE only consists of three deformable convolution blocks, it outperforms PPM and ASPP in terms of mean Intersection of Union (mIoU) on both datasets. All the experimental studies confirm that our proposed module is effective compared to the state-of-the-art methods.

Download Full-text

Interpolation Consistency Training for Semi-supervised Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/504 ◽

2019 ◽

Cited By ~ 39

Author(s):

Vikas Verma ◽

Alex Lamb ◽

Juho Kannala ◽

Yoshua Bengio ◽

David Lopez-Paz

Keyword(s):

Neural Network ◽

Neural Networks ◽

Supervised Learning ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Distribution ◽

Network Architectures ◽

Low Density ◽

Decision Boundary ◽

Classification Problems

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.

Download Full-text

Singular Learning of Deep Multilayer Perceptrons for EEG-Based Emotion Recognition

Frontiers in Computer Science ◽

10.3389/fcomp.2021.786964 ◽

2021 ◽

Vol 3 ◽

Author(s):

Weili Guo ◽

Guangyu Li ◽

Jianfeng Lu ◽

Jian Yang

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

State Of The Art ◽

High Reliability ◽

Multilayer Perceptrons ◽

Learning Technology ◽

Human Computer Interactions ◽

Training Process ◽

Specific Influence

Human emotion recognition is an important issue in human–computer interactions, and electroencephalograph (EEG) has been widely applied to emotion recognition due to its high reliability. In recent years, methods based on deep learning technology have reached the state-of-the-art performance in EEG-based emotion recognition. However, there exist singularities in the parameter space of deep neural networks, which may dramatically slow down the training process. It is very worthy to investigate the specific influence of singularities when applying deep neural networks to EEG-based emotion recognition. In this paper, we mainly focus on this problem, and analyze the singular learning dynamics of deep multilayer perceptrons theoretically and numerically. The results can help us to design better algorithms to overcome the serious influence of singularities in deep neural networks for EEG-based emotion recognition.

Download Full-text

An Analysis and Application of Fast Nonnegative Orthogonal Matching Pursuit for Image Categorization in Deep Networks

Mathematical Problems in Engineering ◽

10.1155/2015/180675 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9

Author(s):

Bo Wang ◽

Jichang Guo ◽

Yan Zhang

Keyword(s):

Large Scale ◽

State Of The Art ◽

Matching Pursuit ◽

Computational Cost ◽

Representation Learning ◽

Orthogonal Matching Pursuit ◽

Image Categorization ◽

Image Patches ◽

Deep Networks ◽

Shape Vector

Nonnegative orthogonal matching pursuit (NOMP) has been proven to be a more stable encoder for unsupervised sparse representation learning. However, previous research has shown that NOMP is suboptimal in terms of computational cost, as the coefficients selection and refinement using nonnegative least squares (NNLS) have been divided into two separate steps. It is found that this problem severely reduces the efficiency of encoding for large-scale image patches. In this work, we study fast nonnegative OMP (FNOMP) as an efficient encoder which can be accelerated by the implementation ofQRfactorization and iterations of coefficients in deep networks for full-size image categorization task. It is analyzed and demonstrated that using relatively simple gain-shape vector quantization for training dictionary, FNOMP not only performs more efficiently than NOMP for encoding but also significantly improves the classification accuracy compared to OMP based algorithm. In addition, FNOMP based algorithm is superior to other state-of-the-art methods on several publicly available benchmarks, that is, Oxford Flowers, UIUC-Sports, and Caltech101.

Download Full-text

Cascaded Deeply Supervised Convolutional Networks for Liver Lesion Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421520145 ◽

2021 ◽

pp. 2152014

Author(s):

Kaiyi Peng ◽

Bin Fang ◽

Mingliang Zhou

Keyword(s):

Computed Tomography ◽

Neural Networks ◽

Prediction Accuracy ◽

Deep Neural Networks ◽

Small Volume ◽

State Of The Art ◽

Liver Lesion ◽

Segmentation Method ◽

Lesion Segmentation ◽

Convolutional Networks

Liver lesion segmentation from abdomen computed tomography (CT) with deep neural networks remains challenging due to the small volume and the unclear boundary. To effectively tackle these problems, in this paper, we propose a cascaded deeply supervised convolutional networks (CDS-Net). The cascaded deep supervision (CDS) mechanism uses auxiliary losses to construct a cascaded segmentation method in a single network, focusing the network attention on pixels that are more difficult to classify, so that the network can segment the lesion more effectively. CDS mechanism can be easily integrated into standard CNN models and it helps to increase the model sensitivity and prediction accuracy. Based on CDS mechanism, we propose a cascaded deep supervised ResUNet, which is an end-to-end liver lesion segmentation network. We conduct experiments on LiTS and 3DIRCADb dataset. Our method has achieved competitive results compared with other state-of-the-art ones.

Download Full-text