Semantic Segmentation for Aerial Images: A Literature Review

Semantic image segmentation is one of the fundamental applications of computer vision which can also be called pixel-level classification. Semantic image segmentation is the process of understanding the role of each pixel in an image. Over time, the model for completing Semantic Image Segmentation has developed very rapidly. Due to this rapid growth, many models related to Semantic Image Segmentation have been produced and have also been used or applied in many domains such as medical areas and intelligent transportation. Therefore, our motivation in making this paper is to contribute to the world of research by conducting a review of Semantic Image Segmentation which aims to provide a big picture related to the latest developments related to Semantic Image Segmentation. In addition, we also provide the results of performance measurements on each of the Semantic Image Segmentation methods that we discussed using the Intersectionover-Union (IoU) method. After that, we provide a comparison for each semantic image segmentation model that we discuss using the results of the IoU and then provide conclusions related to a model that has good performance. We hope this review paper can facilitate researchers in understanding the development of Semantic Image Segmentation in a shorter time, simplify understanding of the latest advancements in Semantic Image Segmentation, and can also be used as a reference for developing new Semantic Image Segmentation models in the future

Download Full-text

Semantic Image Segmentation with Deep Convolutional Neural Networks and Quick Shift

Symmetry ◽

10.3390/sym12030427 ◽

2020 ◽

Vol 12 (3) ◽

pp. 427 ◽

Cited By ~ 1

Author(s):

Sanxing Zhang ◽

Zhenhuan Ma ◽

Gang Zhang ◽

Tao Lei ◽

Rui Zhang ◽

...

Keyword(s):

Neural Networks ◽

Image Segmentation ◽

Convolutional Neural Networks ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Input Image ◽

Feature Representation ◽

Segmentation Algorithm ◽

Deep Convolutional Neural Networks ◽

Semantic Image Segmentation

Semantic image segmentation, as one of the most popular tasks in computer vision, has been widely used in autonomous driving, robotics and other fields. Currently, deep convolutional neural networks (DCNNs) are driving major advances in semantic segmentation due to their powerful feature representation. However, DCNNs extract high-level feature representations by strided convolution, which makes it impossible to segment foreground objects precisely, especially when locating object boundaries. This paper presents a novel semantic segmentation algorithm with DeepLab v3+ and super-pixel segmentation algorithm-quick shift. DeepLab v3+ is employed to generate a class-indexed score map for the input image. Quick shift is applied to segment the input image into superpixels. Outputs of them are then fed into a class voting module to refine the semantic segmentation results. Extensive experiments on proposed semantic image segmentation are performed over PASCAL VOC 2012 dataset, and results that the proposed method can provide a more efficient solution.

Download Full-text

Hybrid Deep Learning Models with Sparse Enhancement Technique for Detection of Newly Grown Tree Leaves

Sensors ◽

10.3390/s21062077 ◽

2021 ◽

Vol 21 (6) ◽

pp. 2077 ◽

Cited By ~ 1

Author(s):

Shih-Yu Chen ◽

Chinsu Lin ◽

Guan-Jie Li ◽

Yu-Chun Hsu ◽

Keng-Hao Liu

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Life Cycle ◽

Climate Changes ◽

Imbalanced Data ◽

Semantic Segmentation ◽

Effect Of Temperature ◽

Tree Leaves ◽

The Difference ◽

Segmentation Models

The life cycle of leaves, from sprout to senescence, is the phenomenon of regular changes such as budding, branching, leaf spreading, flowering, fruiting, leaf fall, and dormancy due to seasonal climate changes. It is the effect of temperature and moisture in the life cycle on physiological changes, so the detection of newly grown leaves (NGL) is helpful for the estimation of tree growth and even climate change. This study focused on the detection of NGL based on deep learning convolutional neural network (CNN) models with sparse enhancement (SE). As the NGL areas found in forest images have similar sparse characteristics, we used a sparse image to enhance the signal of the NGL. The difference between the NGL and the background could be further improved. We then proposed hybrid CNN models that combined U-net and SegNet features to perform image segmentation. As the NGL in the image were relatively small and tiny targets, in terms of data characteristics, they also belonged to the problem of imbalanced data. Therefore, this paper further proposed 3-Layer SegNet, 3-Layer U-SegNet, 2-Layer U-SegNet, and 2-Layer Conv-U-SegNet architectures to reduce the pooling degree of traditional semantic segmentation models, and used a loss function to increase the weight of the NGL. According to the experimental results, our proposed algorithms were indeed helpful for the image segmentation of NGL and could achieve better kappa results by 0.743.

Download Full-text

Rethinking Separable Convolutional Encoders for End-to-End Semantic Image Segmentation

Mathematical Problems in Engineering ◽

10.1155/2021/5566691 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Lin Wang ◽

Xingfu Wang ◽

Ammar Hawbani ◽

Yan Xiong ◽

Xu Zhang

Keyword(s):

Neural Network ◽

Image Segmentation ◽

Convolutional Neural Network ◽

Image Data ◽

Semantic Segmentation ◽

Semantic Features ◽

Processing Efficiency ◽

Semantic Image Segmentation ◽

Average Improvement ◽

Convolutional Encoders

With the development of science and technology, the middle volume and neural network in the semantic image segmentation of the codec show good development prospects. Its advantage is that it can extract richer semantic features, but this will cause high costs. In order to solve this problem, this article mainly introduces the codec based on a separable convolutional neural network for semantic image segmentation. This article proposes a codec based on a separable convolutional neural network for semantic image segmentation research methods, including the traditional convolutional neural network hierarchy into a separable convolutional neural network, which can reduce the cost of image data segmentation and improve processing efficiency. Moreover, this article builds a separable convolutional neural network codec structure and designs a semantic segmentation process, so that the codec based on a separable convolutional neural network is used for semantic image segmentation research experiments. The experimental results show that the average improvement of the dataset by the improved codec is 0.01, which proves the effectiveness of the improved SegProNet. The smaller the number of training set samples, the more obvious the performance improvement.

Download Full-text

Learning Multi-level Region Consistency with Dense Multi-label Networks for Semantic Segmentation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/377 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tong Shen ◽

Guosheng Lin ◽

Chunhua Shen ◽

Ian Reid

Keyword(s):

Image Segmentation ◽

Semantic Segmentation ◽

Image Understanding ◽

Network Module ◽

Convolutional Network ◽

Semantic Image Segmentation ◽

Fully Convolutional Network ◽

Multi Level ◽

Different Levels ◽

Semantic Labelling

Semantic image segmentation is a fundamental task in image understanding. Per-pixel semantic labelling of an image benefits greatly from the ability to consider region consistency both locally and globally. However, many Fully Convolutional Network based methods do not impose such consistency, which may give rise to noisy and implausible predictions. We address this issue by proposing a dense multi-label network module that is able to encourage the region consistency at different levels. This simple but effective module can be easily integrated into any semantic segmentation systems. With comprehensive experiments, we show that the dense multi-label can successfully remove the implausible labels and clear the confusion so as to boost the performance of semantic segmentation systems.

Download Full-text

A Dual-Path and Lightweight Convolutional Neural Network for High-Resolution Aerial Image Segmentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8120582 ◽

2019 ◽

Vol 8 (12) ◽

pp. 582 ◽

Cited By ~ 6

Author(s):

Gang Zhang ◽

Tao Lei ◽

Yi Cui ◽

Ping Jiang

Keyword(s):

Neural Network ◽

Image Segmentation ◽

High Resolution ◽

Convolutional Neural Network ◽

Feature Learning ◽

Semantic Segmentation ◽

Aerial Images ◽

Aerial Image ◽

Sensing Applications ◽

Edge Path

Semantic segmentation on high-resolution aerial images plays a significant role in many remote sensing applications. Although the Deep Convolutional Neural Network (DCNN) has shown great performance in this task, it still faces the following two challenges: intra-class heterogeneity and inter-class homogeneity. To overcome these two problems, a novel dual-path DCNN, which contains a spatial path and an edge path, is proposed for high-resolution aerial image segmentation. The spatial path, which combines the multi-level and global context features to encode the local and global information, is used to address the intra-class heterogeneity challenge. For inter-class homogeneity problem, a Holistically-nested Edge Detection (HED)-like edge path is employed to detect the semantic boundaries for the guidance of feature learning. Furthermore, we improve the computational efficiency of the network by employing the backbone of MobileNetV2. We enhance the performance of MobileNetV2 with two modifications: (1) replacing the standard convolution in the last four Bottleneck Residual Blocks (BRBs) with atrous convolution; and (2) removing the convolution stride of 2 in the first layer of BRBs 4 and 6. Experimental results on the ISPRS Vaihingen and Potsdam 2D labeling dataset show that the proposed DCNN achieved real-time inference speed on a single GPU card with better performance, compared with the state-of-the-art baselines.

Download Full-text

Sequentially Delineation of Rooftops with Holes from VHR Aerial Images Using a Convolutional Recurrent Neural Network

Remote Sensing ◽

10.3390/rs13214271 ◽

2021 ◽

Vol 13 (21) ◽

pp. 4271

Author(s):

Wei Huang ◽

Zeping Liu ◽

Hong Tang ◽

Jiayi Ge

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Semantic Segmentation ◽

Aerial Images ◽

Irregular Shapes ◽

Segmentation Methods ◽

High Resolution Images ◽

Novel Method ◽

Statistical Indicators ◽

Instance Segmentation

Semantic and instance segmentation methods are commonly used to build extraction from high-resolution images. The semantic segmentation method involves assigning a class label to each pixel in the image, thus ignoring the geometry of the building rooftop, which results in irregular shapes of the rooftop edges. As for instance segmentation, there is a strong assumption within this method that there exists only one outline polygon along the rooftop boundary. In this paper, we present a novel method to sequentially delineate exterior and interior contours of rooftops with holes from VHR aerial images, where most of the buildings have holes, by integrating semantic segmentation and polygon delineation. Specifically, semantic segmentation from the Mask R-CNN is used as a prior for hole detection. Then, the holes are used as objects for generating the internal contours of the rooftop. The external and internal contours of the rooftop are inferred separately using a convolutional recurrent neural network. Experimental results showed that the proposed method can effectively delineate the rooftops with both one and multiple polygons and outperform state-of-the-art methods in terms of the visual results and six statistical indicators, including IoU, OA, F1, BoundF, RE and Hd.

Download Full-text

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/682 ◽

2018 ◽

Cited By ~ 13

Author(s):

Zhang-Wei Hong ◽

Yu-Ming Chen ◽

Hsuan-Kung Yang ◽

Shih-Yang Su ◽

Tzu-Yun Shann ◽

...

Keyword(s):

Image Segmentation ◽

Virtual Worlds ◽

Control Policy ◽

Semantic Segmentation ◽

Physical World ◽

Training Data ◽

Avoidance Task ◽

Real Problem ◽

Modular Architecture ◽

Semantic Image Segmentation

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules. The perception module translates the perceived RGB image to semantic image segmentation. The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task. Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them. We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture.

Download Full-text

Improved Anchor-Free Instance Segmentation for Building Extraction from High-Resolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs12182910 ◽

2020 ◽

Vol 12 (18) ◽

pp. 2910

Author(s):

Tong Wu ◽

Yuan Hu ◽

Ling Peng ◽

Ruonan Chen

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Semantic Segmentation ◽

Aerial Images ◽

Building Extraction ◽

Remote Sensing Images ◽

Highly Sensitive ◽

Segmentation Methods ◽

Speed And Accuracy ◽

Instance Segmentation

Building extraction from high-resolution remote sensing images plays a vital part in urban planning, safety supervision, geographic databases updates, and some other applications. Several researches are devoted to using convolutional neural network (CNN) to extract buildings from high-resolution satellite/aerial images. There are two major methods, one is the CNN-based semantic segmentation methods, which can not distinguish different objects of the same category and may lead to edge connection. The other one is CNN-based instance segmentation methods, which rely heavily on pre-defined anchors, and result in the highly sensitive, high computation/storage cost and imbalance between positive and negative samples. Therefore, in this paper, we propose an improved anchor-free instance segmentation method based on CenterMask with spatial and channel attention-guided mechanisms and improved effective backbone network for accurate extraction of buildings in high-resolution remote sensing images. Then we analyze the influence of different parameters and network structure on the performance of the model, and compare the performance for building extraction of Mask R-CNN, Mask Scoring R-CNN, CenterMask, and the improved CenterMask in this paper. Experimental results show that our improved CenterMask method can successfully well-balanced performance in terms of speed and accuracy, which achieves state-of-the-art performance at real-time speed.

Download Full-text

Mapping Plastic Mulched Farmland for High Resolution Images of Unmanned Aerial Vehicle Using Deep Semantic Segmentation

Remote Sensing ◽

10.3390/rs11172008 ◽

2019 ◽

Vol 11 (17) ◽

pp. 2008 ◽

Cited By ~ 4

Author(s):

Qinchen Yang ◽

Man Liu ◽

Zhitao Zhang ◽

Shuqin Yang ◽

Jifeng Ning ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Unmanned Aerial Vehicle ◽

Semantic Segmentation ◽

Classification Method ◽

Remote Sensing Images ◽

Segmentation Methods ◽

Traditional Classification ◽

Aerial Vehicle ◽

Segmentation Models

With increasing consumption, plastic mulch benefits agriculture by promoting crop quality and yield, but the environmental and soil pollution is becoming increasingly serious. Therefore, research on the monitoring of plastic mulched farmland (PMF) has received increasing attention. Plastic mulched farmland in unmanned aerial vehicle (UAV) remote images due to the high resolution, shows a prominent spatial pattern, which brings difficulties to the task of monitoring PMF. In this paper, through a comparison between two deep semantic segmentation methods, SegNet and fully convolutional networks (FCN), and a traditional classification method, Support Vector Machine (SVM), we propose an end-to-end deep-learning method aimed at accurately recognizing PMF for UAV remote sensing images from Hetao Irrigation District, Inner Mongolia, China. After experiments with single-band, three-band and six-band image data, we found that deep semantic segmentation models built via single-band data which only use the texture pattern of PMF can identify it well; for example, SegNet reaching the highest accuracy of 88.68% in a 900 nm band. Furthermore, with three visual bands and six-band data (3 visible bands and 3 near-infrared bands), deep semantic segmentation models combining the texture and spectral features further improve the accuracy of PMF identification, whereas six-band data obtains an optimal performance for FCN and SegNet. In addition, deep semantic segmentation methods, FCN and SegNet, due to their strong feature extraction capability and direct pixel classification, clearly outperform the traditional SVM method in precision and speed. Among three classification methods, SegNet model built on three-band and six-band data obtains the optimal average accuracy of 89.62% and 90.6%, respectively. Therefore, the proposed deep semantic segmentation model, when tested against the traditional classification method, provides a promising path for mapping PMF in UAV remote sensing images.

Download Full-text

Portrait Segmentation Using Ensemble of Heterogeneous Deep-Learning Models

Entropy ◽

10.3390/e23020197 ◽

2021 ◽

Vol 23 (2) ◽

pp. 197

Author(s):

Yong-Woon Kim ◽

Yung-Cheol Byun ◽

Addapalli V. N. Krishna

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Autonomous Vehicles ◽

Medical Image Analysis ◽

Learning Technology ◽

Learning Models ◽

Computing Power ◽

Semantic Image Segmentation ◽

Voting Method ◽

Segmentation Models

Image segmentation plays a central role in a broad range of applications, such as medical image analysis, autonomous vehicles, video surveillance and augmented reality. Portrait segmentation, which is a subset of semantic image segmentation, is widely used as a preprocessing step in multiple applications such as security systems, entertainment applications, video conferences, etc. A substantial amount of deep learning-based portrait segmentation approaches have been developed, since the performance and accuracy of semantic image segmentation have improved significantly due to the recent introduction of deep learning technology. However, these approaches are limited to a single portrait segmentation model. In this paper, we propose a novel approach using an ensemble method by combining multiple heterogeneous deep-learning based portrait segmentation models to improve the segmentation performance. The Two-Models ensemble and Three-Models ensemble, using a simple soft voting method and weighted soft voting method, were experimented. Intersection over Union (IoU) metric, IoU standard deviation and false prediction rate were used to evaluate the performance. Cost efficiency was calculated to analyze the efficiency of segmentation. The experiment results show that the proposed ensemble approach can perform with higher accuracy and lower errors than single deep-learning-based portrait segmentation models. The results also show that the ensemble of deep-learning models typically increases the use of memory and computing power, although it also shows that the ensemble of deep-learning models can perform more efficiently than a single model with higher accuracy using less memory and less computing power.

Download Full-text