Real-Time Semantic Image Segmentation with Deep Learning for Autonomous Driving: A Survey

Semantic image segmentation for autonomous driving is a challenging task due to its requirement for both effectiveness and efficiency. Recent developments in deep learning have demonstrated important performance boosting in terms of accuracy. In this paper, we present a comprehensive overview of the state-of-the-art semantic image segmentation methods using deep-learning techniques aiming to operate in real time so that can efficiently support an autonomous driving scenario. To this end, the presented overview puts a particular emphasis on the presentation of all those approaches which permit inference time reduction, while an analysis of the existing methods is addressed by taking into account their end-to-end functionality, as well as a comparative study that relies upon a consistent evaluation framework. Finally, a fruitful discussion is presented that provides key insights for the current trend and future research directions in real-time semantic image segmentation with deep learning for autonomous driving.

Download Full-text

Deep Learning-Based Monocular Depth Estimation Methods—A State-of-the-Art Review

Sensors ◽

10.3390/s20082272 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2272 ◽

Cited By ~ 5

Author(s):

Faisal Khan ◽

Saqib Salahuddin ◽

Hossein Javidnia

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Research Work ◽

Depth Estimation ◽

Autonomous Driving ◽

Estimation Methods ◽

Future Research ◽

Comprehensive Overview ◽

Ill Posed ◽

Monocular Depth

Monocular depth estimation from Red-Green-Blue (RGB) images is a well-studied ill-posed problem in computer vision which has been investigated intensively over the past decade using Deep Learning (DL) approaches. The recent approaches for monocular depth estimation mostly rely on Convolutional Neural Networks (CNN). Estimating depth from two-dimensional images plays an important role in various applications including scene reconstruction, 3D object-detection, robotics and autonomous driving. This survey provides a comprehensive overview of this research topic including the problem representation and a short description of traditional methods for depth estimation. Relevant datasets and 13 state-of-the-art deep learning-based approaches for monocular depth estimation are reviewed, evaluated and discussed. We conclude this paper with a perspective towards future research work requiring further investigation in monocular depth estimation challenges.

Download Full-text

SegFast-V2: Semantic image segmentation with less parameters in deep learning for autonomous driving

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-019-01005-5 ◽

2019 ◽

Vol 10 (11) ◽

pp. 3145-3154 ◽

Cited By ~ 7

Author(s):

Swarnendu Ghosh ◽

Anisha Pal ◽

Shourya Jaiswal ◽

K. C. Santosh ◽

Nibaran Das ◽

...

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Autonomous Driving ◽

Semantic Image Segmentation

Download Full-text

Deep Learning for the Improvement of Object Detection in Augmented Reality

International Journal of Advances in Soft Computing and its Applications ◽

10.15849/ijasca.211128.10 ◽

2021 ◽

Vol 13 (3) ◽

pp. 130-143

Author(s):

Zainab Oufqir ◽

Lamiae Binan ◽

Abdellatif EL ABDERRAHMANI ◽

Khalid Satori

Keyword(s):

Deep Learning ◽

Augmented Reality ◽

Object Detection ◽

Real Time ◽

Detection Performance ◽

Future Research ◽

Complete Understanding ◽

Comprehensive Overview ◽

Future Directions ◽

Different Characteristics

In this article, we give a comprehensive overview of recent methods in object detection using deep learning and their uses in augmented reality. The objective is to present a complete understanding of these algorithms and how augmented reality functions and services can be improved by integrating these methods. We discuss in detail the different characteristics of each approach and their influence on real-time detection performance. Experimental analyses are provided to compare the performance of each method and make meaningful conclusions for their use in augmented reality. Two-stage detectors generally provide better detection performance, while single-stage detectors are significantly more time efficient and more applicable to real-time object detection. Finally, we discuss several future directions to facilitate and stimulate future research on object detection in augmented reality. Keywords: object detection, deep learning, convolutional neural network, augmented reality.

Download Full-text

On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data

Remote Sensing ◽

10.3390/rs13010089 ◽

2020 ◽

Vol 13 (1) ◽

pp. 89

Author(s):

Manuel Carranza-García ◽

Jesús Torres-Mateo ◽

Pedro Lara-Benítez ◽

Jorge García-Gutiérrez

Keyword(s):

Deep Learning ◽

Real Time ◽

Autonomous Vehicles ◽

Remote Sensing Data ◽

Autonomous Driving ◽

Two Stage ◽

Detection Systems ◽

One Stage ◽

Time Requirements ◽

Speed Accuracy

Object detection using remote sensing data is a key task of the perception systems of self-driving vehicles. While many generic deep learning architectures have been proposed for this problem, there is little guidance on their suitability when using them in a particular scenario such as autonomous driving. In this work, we aim to assess the performance of existing 2D detection systems on a multi-class problem (vehicles, pedestrians, and cyclists) with images obtained from the on-board camera sensors of a car. We evaluate several one-stage (RetinaNet, FCOS, and YOLOv3) and two-stage (Faster R-CNN) deep learning meta-architectures under different image resolutions and feature extractors (ResNet, ResNeXt, Res2Net, DarkNet, and MobileNet). These models are trained using transfer learning and compared in terms of both precision and efficiency, with special attention to the real-time requirements of this context. For the experimental study, we use the Waymo Open Dataset, which is the largest existing benchmark. Despite the rising popularity of one-stage detectors, our findings show that two-stage detectors still provide the most robust performance. Faster R-CNN models outperform one-stage detectors in accuracy, being also more reliable in the detection of minority classes. Faster R-CNN Res2Net-101 achieves the best speed/accuracy tradeoff but needs lower resolution images to reach real-time speed. Furthermore, the anchor-free FCOS detector is a slightly faster alternative to RetinaNet, with similar precision and lower memory usage.

Download Full-text

MSPPNet: A Lightweight Network for Real-time Semantic Image Segmentation

Journal of Physics Conference Series ◽

10.1088/1742-6596/2010/1/012128 ◽

2021 ◽

Vol 2010 (1) ◽

pp. 012128

Author(s):

Yuting Liang ◽

Tangtian Hang ◽

Jie Chen ◽

Lei Liu

Keyword(s):

Image Segmentation ◽

Real Time ◽

Semantic Image Segmentation

Download Full-text

Review and Evaluation of Deep Learning Architectures for Efficient Land Cover Mapping with UAS Hyper-Spatial Imagery: A Case Study Over a Wetland

Remote Sensing ◽

10.3390/rs12060959 ◽

2020 ◽

Vol 12 (6) ◽

pp. 959 ◽

Cited By ~ 7

Author(s):

Mohammad Pashaei ◽

Hamid Kamangir ◽

Michael J. Starek ◽

Philippe Tissot

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Land Cover ◽

Spatial Resolution ◽

Unmanned Aircraft ◽

High Temporal Resolution ◽

Semantic Image Segmentation ◽

Image Labeling ◽

Training Samples ◽

Learning Architectures

Deep learning has already been proved as a powerful state-of-the-art technique for many image understanding tasks in computer vision and other applications including remote sensing (RS) image analysis. Unmanned aircraft systems (UASs) offer a viable and economical alternative to a conventional sensor and platform for acquiring high spatial and high temporal resolution data with high operational flexibility. Coastal wetlands are among some of the most challenging and complex ecosystems for land cover prediction and mapping tasks because land cover targets often show high intra-class and low inter-class variances. In recent years, several deep convolutional neural network (CNN) architectures have been proposed for pixel-wise image labeling, commonly called semantic image segmentation. In this paper, some of the more recent deep CNN architectures proposed for semantic image segmentation are reviewed, and each model’s training efficiency and classification performance are evaluated by training it on a limited labeled image set. Training samples are provided using the hyper-spatial resolution UAS imagery over a wetland area and the required ground truth images are prepared by manual image labeling. Experimental results demonstrate that deep CNNs have a great potential for accurate land cover prediction task using UAS hyper-spatial resolution images. Some simple deep learning architectures perform comparable or even better than complex and very deep architectures with remarkably fewer training epochs. This performance is especially valuable when limited training samples are available, which is a common case in most RS applications.

Download Full-text

Generative Scatternet Hybrid Deep Learning (G-Shdl) Network with Structural Priors for Semantic Image Segmentation

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461542 ◽

2018 ◽

Author(s):

Amarjot Singh ◽

Nick Kingsbury

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Semantic Image Segmentation

Download Full-text

Semantic Image Segmentation with Deep Convolutional Neural Networks and Quick Shift

Symmetry ◽

10.3390/sym12030427 ◽

2020 ◽

Vol 12 (3) ◽

pp. 427 ◽

Cited By ~ 1

Author(s):

Sanxing Zhang ◽

Zhenhuan Ma ◽

Gang Zhang ◽

Tao Lei ◽

Rui Zhang ◽

...

Keyword(s):

Neural Networks ◽

Image Segmentation ◽

Convolutional Neural Networks ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Input Image ◽

Feature Representation ◽

Segmentation Algorithm ◽

Deep Convolutional Neural Networks ◽

Semantic Image Segmentation

Semantic image segmentation, as one of the most popular tasks in computer vision, has been widely used in autonomous driving, robotics and other fields. Currently, deep convolutional neural networks (DCNNs) are driving major advances in semantic segmentation due to their powerful feature representation. However, DCNNs extract high-level feature representations by strided convolution, which makes it impossible to segment foreground objects precisely, especially when locating object boundaries. This paper presents a novel semantic segmentation algorithm with DeepLab v3+ and super-pixel segmentation algorithm-quick shift. DeepLab v3+ is employed to generate a class-indexed score map for the input image. Quick shift is applied to segment the input image into superpixels. Outputs of them are then fed into a class voting module to refine the semantic segmentation results. Extensive experiments on proposed semantic image segmentation are performed over PASCAL VOC 2012 dataset, and results that the proposed method can provide a more efficient solution.

Download Full-text

Deep Learning: The Good, the Bad, and the Ugly

Annual Review of Vision Science ◽

10.1146/annurev-vision-091718-014951 ◽

2019 ◽

Vol 5 (1) ◽

pp. 399-426 ◽

Cited By ~ 27

Author(s):

Thomas Serre

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Machine Vision ◽

Artificial Vision ◽

Vision Systems ◽

Comprehensive Overview ◽

Biological Vision ◽

Recent Developments ◽

Visual Intelligence ◽

Modern Machine

Artificial vision has often been described as one of the key remaining challenges to be solved before machines can act intelligently. Recent developments in a branch of machine learning known as deep learning have catalyzed impressive gains in machine vision—giving a sense that the problem of vision is getting closer to being solved. The goal of this review is to provide a comprehensive overview of recent deep learning developments and to critically assess actual progress toward achieving human-level visual intelligence. I discuss the implications of the successes and limitations of modern machine vision algorithms for biological vision and the prospect for neuroscience to inform the design of future artificial vision systems.

Download Full-text

Optical Camera Communications: Principles, Modulations, Potential and Challenges

Electronics ◽

10.3390/electronics9091339 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1339

Author(s):

Willy Anugrah Cahyadi ◽

Yeon Ho Chung ◽

Zabih Ghassemlooy ◽

Navid Bani Hassan

Keyword(s):

Performance Enhancement ◽

Cost Effective ◽

Smart Devices ◽

Future Research ◽

Light Sources ◽

Visible Light Communications ◽

Comprehensive Overview ◽

Optical Wireless Communications ◽

Recent Developments ◽

Technical Issues

Optical wireless communications (OWC) are emerging as cost-effective and practical solutions to the congested radio frequency-based wireless technologies. As part of OWC, optical camera communications (OCC) have become very attractive, considering recent developments in cameras and the use of fitted cameras in smart devices. OCC together with visible light communications (VLC) is considered within the framework of the IEEE 802.15.7m standardization. OCCs based on both organic and inorganic light sources as well as cameras are being considered for low-rate transmissions and localization in indoor as well as outdoor short-range applications and within the framework of the IEEE 802.15.7m standardization together with VLC. This paper introduces the underlying principles of OCC and gives a comprehensive overview of this emerging technology with recent standardization activities in OCC. It also outlines the key technical issues such as mobility, coverage, interference, performance enhancement, etc. Future research directions and open issues are also presented.

Download Full-text