scholarly journals DTS-Net: Depth-to-Space Networks for Fast and Accurate Semantic Object Segmentation

Sensors ◽  
2022 ◽  
Vol 22 (1) ◽  
pp. 337
Author(s):  
Hatem Ibrahem ◽  
Ahmed Salem ◽  
Hyun-Soo Kang

We propose Depth-to-Space Net (DTS-Net), an effective technique for semantic segmentation using the efficient sub-pixel convolutional neural network. This technique is inspired by depth-to-space (DTS) image reconstruction, which was originally used for image and video super-resolution tasks, combined with a mask enhancement filtration technique based on multi-label classification, namely, Nearest Label Filtration. In the proposed technique, we employ depth-wise separable convolution-based architectures. We propose both a deep network, that is, DTS-Net, and a lightweight network, DTS-Net-Lite, for real-time semantic segmentation; these networks employ Xception and MobileNetV2 architectures as the feature extractors, respectively. In addition, we explore the joint semantic segmentation and depth estimation task and demonstrate that the proposed technique can efficiently perform both tasks simultaneously, outperforming state-of-art (SOTA) methods. We train and evaluate the performance of the proposed method on the PASCAL VOC2012, NYUV2, and CITYSCAPES benchmarks. Hence, we obtain high mean intersection over union (mIOU) and mean pixel accuracy (Pix.acc.) values using simple and lightweight convolutional neural network architectures of the developed networks. Notably, the proposed method outperforms SOTA methods that depend on encoder–decoder architectures, although our implementation and computations are far simpler.

Sensors ◽  
2019 ◽  
Vol 19 (8) ◽  
pp. 1795 ◽  
Author(s):  
Xiao Lin ◽  
Dalila Sánchez-Escobedo ◽  
Josep R. Casas ◽  
Montse Pardàs

Semantic segmentation and depth estimation are two important tasks in computer vision, and many methods have been developed to tackle them. Commonly these two tasks are addressed independently, but recently the idea of merging these two problems into a sole framework has been studied under the assumption that integrating two highly correlated tasks may benefit each other to improve the estimation accuracy. In this paper, depth estimation and semantic segmentation are jointly addressed using a single RGB input image under a unified convolutional neural network. We analyze two different architectures to evaluate which features are more relevant when shared by the two tasks and which features should be kept separated to achieve a mutual improvement. Likewise, our approaches are evaluated under two different scenarios designed to review our results versus single-task and multi-task methods. Qualitative and quantitative experiments demonstrate that the performance of our methodology outperforms the state of the art on single-task approaches, while obtaining competitive results compared with other multi-task methods.


2021 ◽  
Vol 11 (11) ◽  
pp. 5235
Author(s):  
Nikita Andriyanov

The article is devoted to the study of convolutional neural network inference in the task of image processing under the influence of visual attacks. Attacks of four different types were considered: simple, involving the addition of white Gaussian noise, impulse action on one pixel of an image, and attacks that change brightness values within a rectangular area. MNIST and Kaggle dogs vs. cats datasets were chosen. Recognition characteristics were obtained for the accuracy, depending on the number of images subjected to attacks and the types of attacks used in the training. The study was based on well-known convolutional neural network architectures used in pattern recognition tasks, such as VGG-16 and Inception_v3. The dependencies of the recognition accuracy on the parameters of visual attacks were obtained. Original methods were proposed to prevent visual attacks. Such methods are based on the selection of “incomprehensible” classes for the recognizer, and their subsequent correction based on neural network inference with reduced image sizes. As a result of applying these methods, gains in the accuracy metric by a factor of 1.3 were obtained after iteration by discarding incomprehensible images, and reducing the amount of uncertainty by 4–5% after iteration by applying the integration of the results of image analyses in reduced dimensions.


2020 ◽  
Vol 1682 ◽  
pp. 012077
Author(s):  
Tingting Li ◽  
Chunshan Jiang ◽  
Zhenqi Bian ◽  
Mingchang Wang ◽  
Xuefeng Niu

Sign in / Sign up

Export Citation Format

Share Document