DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation

We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration technique based on multi-label classification, namely, Nearest Label Filtration. In the proposed technique, we employ depth-wise separable convolution-based architectures. We propose both a deep network, that is, DTS-Net, and a lightweight network, DTS-Net-Lite, for real-time semantic segmentation; these networks employ Xception and MobileNetV2 architectures as the feature extractors, respectively. In addition, we explore the joint semantic segmentation and depth estimation task and demonstrate that the proposed technique can efficiently perform both tasks simultaneously, outperforming state-of-art (SOTA) methods. We train and evaluate the performance of the proposed method on the PASCAL VOC2012, NYUV2, and CITYSCAPES benchmarks. Hence, we obtain high mean intersection over union (mIOU) and mean pixel accuracy (Pix.acc.) values using simple and lightweight convolutional neural network architectures of the developed networks. Notably, the proposed method outperforms SOTA methods that depend on encoder–decoder architectures, although our implementation and computations are far simpler.

Download Full-text

Depth Estimation and Semantic Segmentation from a Single RGB Image Using a Hybrid Convolutional Neural Network

Sensors ◽

10.3390/s19081795 ◽

2019 ◽

Vol 19 (8) ◽

pp. 1795 ◽

Cited By ~ 5

Author(s):

Xiao Lin ◽

Dalila Sánchez-Escobedo ◽

Josep R. Casas ◽

Montse Pardàs

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Semantic Segmentation ◽

Depth Estimation ◽

Input Image ◽

Estimation Accuracy ◽

Single Task ◽

Qualitative And Quantitative ◽

Highly Correlated

Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.

Download Full-text

Efficient and Compact Convolutional Neural Network Architectures for Non-temporal Real-time Fire Detection

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla51294.2020.00030 ◽

2020 ◽

Author(s):

William Thomson ◽

Neelanjan Bhowmik ◽

Toby P. Breckon

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Real Time ◽

Fire Detection ◽

Network Architectures ◽

Neural Network Architectures

Download Full-text

Methods for Preventing Visual Attacks in Convolutional Neural Networks Based on Data Discard and Dimensionality Reduction

Applied Sciences ◽

10.3390/app11115235 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5235

Author(s):

Nikita Andriyanov

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Network Inference ◽

Recognition Accuracy ◽

Network Architectures ◽

Reduced Dimensions ◽

Different Types ◽

Rectangular Area ◽

Noise Impulse ◽

Selection Of

The article is devoted to the study of convolutional neural network inference in the task of image processing under the influence of visual attacks. Attacks of four different types were considered: simple, involving the addition of white Gaussian noise, impulse action on one pixel of an image, and attacks that change brightness values within a rectangular area. MNIST and Kaggle dogs vs. cats datasets were chosen. Recognition characteristics were obtained for the accuracy, depending on the number of images subjected to attacks and the types of attacks used in the training. The study was based on well-known convolutional neural network architectures used in pattern recognition tasks, such as VGG-16 and Inception_v3. The dependencies of the recognition accuracy on the parameters of visual attacks were obtained. Original methods were proposed to prevent visual attacks. Such methods are based on the selection of “incomprehensible” classes for the recognizer, and their subsequent correction based on neural network inference with reduced image sizes. As a result of applying these methods, gains in the accuracy metric by a factor of 1.3 were obtained after iteration by discarding incomprehensible images, and reducing the amount of uncertainty by 4–5% after iteration by applying the integration of the results of image analyses in reduced dimensions.

Download Full-text