scholarly journals Semantic Segmentation for Aerial Images: A Literature Review

Author(s):  
Yongki Christian Sanjaya ◽  
Alexander Agung Santoso Gunawan ◽  
Edy Irwansyah

Semantic image segmentation is one of the fundamental applications of computer vision which can also be called pixel-level classification. Semantic image segmentation is the process of understanding the role of each pixel in an image. Over time, the model for completing Semantic Image Segmentation has developed very rapidly. Due to this rapid growth, many models related to Semantic Image Segmentation have been produced and have also been used or applied in many domains such as medical areas and intelligent transportation. Therefore, our motivation in making this paper is to contribute to the world of research by conducting a review of Semantic Image Segmentation which aims to provide a big picture related to the latest developments related to Semantic Image Segmentation. In addition, we also provide the results of performance measurements on each of the Semantic Image Segmentation methods that we discussed using the Intersectionover-Union (IoU) method. After that, we provide a comparison for each semantic image segmentation model that we discuss using the results of the IoU and then provide conclusions related to a model that has good performance. We hope this review paper can facilitate researchers in understanding the development of Semantic Image Segmentation in a shorter time, simplify understanding of the latest advancements in Semantic Image Segmentation, and can also be used as a reference for developing new Semantic Image Segmentation models in the future

Symmetry ◽  
2020 ◽  
Vol 12 (3) ◽  
pp. 427 ◽  
Author(s):  
Sanxing Zhang ◽  
Zhenhuan Ma ◽  
Gang Zhang ◽  
Tao Lei ◽  
Rui Zhang ◽  
...  

Semantic image segmentation, as one of the most popular tasks in computer vision, has been widely used in autonomous driving, robotics and other fields. Currently, deep convolutional neural networks (DCNNs) are driving major advances in semantic segmentation due to their powerful feature representation. However, DCNNs extract high-level feature representations by strided convolution, which makes it impossible to segment foreground objects precisely, especially when locating object boundaries. This paper presents a novel semantic segmentation algorithm with DeepLab v3+ and super-pixel segmentation algorithm-quick shift. DeepLab v3+ is employed to generate a class-indexed score map for the input image. Quick shift is applied to segment the input image into superpixels. Outputs of them are then fed into a class voting module to refine the semantic segmentation results. Extensive experiments on proposed semantic image segmentation are performed over PASCAL VOC 2012 dataset, and results that the proposed method can provide a more efficient solution.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2077 ◽  
Author(s):  
Shih-Yu Chen ◽  
Chinsu Lin ◽  
Guan-Jie Li ◽  
Yu-Chun Hsu ◽  
Keng-Hao Liu

The life cycle of leaves, from sprout to senescence, is the phenomenon of regular changes such as budding, branching, leaf spreading, flowering, fruiting, leaf fall, and dormancy due to seasonal climate changes. It is the effect of temperature and moisture in the life cycle on physiological changes, so the detection of newly grown leaves (NGL) is helpful for the estimation of tree growth and even climate change. This study focused on the detection of NGL based on deep learning convolutional neural network (CNN) models with sparse enhancement (SE). As the NGL areas found in forest images have similar sparse characteristics, we used a sparse image to enhance the signal of the NGL. The difference between the NGL and the background could be further improved. We then proposed hybrid CNN models that combined U-net and SegNet features to perform image segmentation. As the NGL in the image were relatively small and tiny targets, in terms of data characteristics, they also belonged to the problem of imbalanced data. Therefore, this paper further proposed 3-Layer SegNet, 3-Layer U-SegNet, 2-Layer U-SegNet, and 2-Layer Conv-U-SegNet architectures to reduce the pooling degree of traditional semantic segmentation models, and used a loss function to increase the weight of the NGL. According to the experimental results, our proposed algorithms were indeed helpful for the image segmentation of NGL and could achieve better kappa results by 0.743.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Lin Wang ◽  
Xingfu Wang ◽  
Ammar Hawbani ◽  
Yan Xiong ◽  
Xu Zhang

With the development of science and technology, the middle volume and neural network in the semantic image segmentation of the codec show good development prospects. Its advantage is that it can extract richer semantic features, but this will cause high costs. In order to solve this problem, this article mainly introduces the codec based on a separable convolutional neural network for semantic image segmentation. This article proposes a codec based on a separable convolutional neural network for semantic image segmentation research methods, including the traditional convolutional neural network hierarchy into a separable convolutional neural network, which can reduce the cost of image data segmentation and improve processing efficiency. Moreover, this article builds a separable convolutional neural network codec structure and designs a semantic segmentation process, so that the codec based on a separable convolutional neural network is used for semantic image segmentation research experiments. The experimental results show that the average improvement of the dataset by the improved codec is 0.01, which proves the effectiveness of the improved SegProNet. The smaller the number of training set samples, the more obvious the performance improvement.


Author(s):  
Tong Shen ◽  
Guosheng Lin ◽  
Chunhua Shen ◽  
Ian Reid

Semantic image segmentation is a fundamental task in image understanding. Per-pixel semantic labelling of an image benefits greatly from the ability to consider region consistency both locally and globally. However, many Fully Convolutional Network based methods do not impose such consistency, which may give rise to noisy and implausible predictions. We address this issue by proposing a dense multi-label network module that is able to encourage the region consistency at different levels. This simple but effective module can be easily integrated into any semantic segmentation systems. With comprehensive experiments, we show that the dense multi-label can successfully remove the implausible labels and clear the confusion so as to boost the performance of semantic segmentation systems.


2019 ◽  
Vol 8 (12) ◽  
pp. 582 ◽  
Author(s):  
Gang Zhang ◽  
Tao Lei ◽  
Yi Cui ◽  
Ping Jiang

Semantic segmentation on high-resolution aerial images plays a significant role in many remote sensing applications. Although the Deep Convolutional Neural Network (DCNN) has shown great performance in this task, it still faces the following two challenges: intra-class heterogeneity and inter-class homogeneity. To overcome these two problems, a novel dual-path DCNN, which contains a spatial path and an edge path, is proposed for high-resolution aerial image segmentation. The spatial path, which combines the multi-level and global context features to encode the local and global information, is used to address the intra-class heterogeneity challenge. For inter-class homogeneity problem, a Holistically-nested Edge Detection (HED)-like edge path is employed to detect the semantic boundaries for the guidance of feature learning. Furthermore, we improve the computational efficiency of the network by employing the backbone of MobileNetV2. We enhance the performance of MobileNetV2 with two modifications: (1) replacing the standard convolution in the last four Bottleneck Residual Blocks (BRBs) with atrous convolution; and (2) removing the convolution stride of 2 in the first layer of BRBs 4 and 6. Experimental results on the ISPRS Vaihingen and Potsdam 2D labeling dataset show that the proposed DCNN achieved real-time inference speed on a single GPU card with better performance, compared with the state-of-the-art baselines.


2021 ◽  
Vol 13 (21) ◽  
pp. 4271
Author(s):  
Wei Huang ◽  
Zeping Liu ◽  
Hong Tang ◽  
Jiayi Ge

Semantic and instance segmentation methods are commonly used to build extraction from high-resolution images. The semantic segmentation method involves assigning a class label to each pixel in the image, thus ignoring the geometry of the building rooftop, which results in irregular shapes of the rooftop edges. As for instance segmentation, there is a strong assumption within this method that there exists only one outline polygon along the rooftop boundary. In this paper, we present a novel method to sequentially delineate exterior and interior contours of rooftops with holes from VHR aerial images, where most of the buildings have holes, by integrating semantic segmentation and polygon delineation. Specifically, semantic segmentation from the Mask R-CNN is used as a prior for hole detection. Then, the holes are used as objects for generating the internal contours of the rooftop. The external and internal contours of the rooftop are inferred separately using a convolutional recurrent neural network. Experimental results showed that the proposed method can effectively delineate the rooftops with both one and multiple polygons and outperform state-of-the-art methods in terms of the visual results and six statistical indicators, including IoU, OA, F1, BoundF, RE and Hd.


Author(s):  
Zhang-Wei Hong ◽  
Yu-Ming Chen ◽  
Hsuan-Kung Yang ◽  
Shih-Yang Su ◽  
Tzu-Yun Shann ◽  
...  

Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform. Unfortunately, the reality gap between synthetic and real visual data prohibits direct migration of the models trained in virtual worlds to the real world. This paper proposes a modular architecture for tackling the virtual-to-real problem. The proposed architecture separates the learning model into a perception module and a control policy module, and uses semantic image segmentation as the meta representation for relating these two modules.  The perception module translates the perceived RGB image to semantic image segmentation.  The control policy module is implemented as a deep reinforcement learning agent, which performs actions based on the translated image segmentation. Our architecture is evaluated in an obstacle avoidance task and a target following task.  Experimental results show that our architecture significantly outperforms all of the baseline methods in both virtual and real environments, and demonstrates a faster learning curve than them.  We also present a detailed analysis for a variety of variant configurations, and validate the transferability of our modular architecture. 


2020 ◽  
Vol 12 (18) ◽  
pp. 2910
Author(s):  
Tong Wu ◽  
Yuan Hu ◽  
Ling Peng ◽  
Ruonan Chen

Building extraction from high-resolution remote sensing images plays a vital part in urban planning, safety supervision, geographic databases updates, and some other applications. Several researches are devoted to using convolutional neural network (CNN) to extract buildings from high-resolution satellite/aerial images. There are two major methods, one is the CNN-based semantic segmentation methods, which can not distinguish different objects of the same category and may lead to edge connection. The other one is CNN-based instance segmentation methods, which rely heavily on pre-defined anchors, and result in the highly sensitive, high computation/storage cost and imbalance between positive and negative samples. Therefore, in this paper, we propose an improved anchor-free instance segmentation method based on CenterMask with spatial and channel attention-guided mechanisms and improved effective backbone network for accurate extraction of buildings in high-resolution remote sensing images. Then we analyze the influence of different parameters and network structure on the performance of the model, and compare the performance for building extraction of Mask R-CNN, Mask Scoring R-CNN, CenterMask, and the improved CenterMask in this paper. Experimental results show that our improved CenterMask method can successfully well-balanced performance in terms of speed and accuracy, which achieves state-of-the-art performance at real-time speed.


2019 ◽  
Vol 11 (17) ◽  
pp. 2008 ◽  
Author(s):  
Qinchen Yang ◽  
Man Liu ◽  
Zhitao Zhang ◽  
Shuqin Yang ◽  
Jifeng Ning ◽  
...  

With increasing consumption, plastic mulch benefits agriculture by promoting crop quality and yield, but the environmental and soil pollution is becoming increasingly serious. Therefore, research on the monitoring of plastic mulched farmland (PMF) has received increasing attention. Plastic mulched farmland in unmanned aerial vehicle (UAV) remote images due to the high resolution, shows a prominent spatial pattern, which brings difficulties to the task of monitoring PMF. In this paper, through a comparison between two deep semantic segmentation methods, SegNet and fully convolutional networks (FCN), and a traditional classification method, Support Vector Machine (SVM), we propose an end-to-end deep-learning method aimed at accurately recognizing PMF for UAV remote sensing images from Hetao Irrigation District, Inner Mongolia, China. After experiments with single-band, three-band and six-band image data, we found that deep semantic segmentation models built via single-band data which only use the texture pattern of PMF can identify it well; for example, SegNet reaching the highest accuracy of 88.68% in a 900 nm band. Furthermore, with three visual bands and six-band data (3 visible bands and 3 near-infrared bands), deep semantic segmentation models combining the texture and spectral features further improve the accuracy of PMF identification, whereas six-band data obtains an optimal performance for FCN and SegNet. In addition, deep semantic segmentation methods, FCN and SegNet, due to their strong feature extraction capability and direct pixel classification, clearly outperform the traditional SVM method in precision and speed. Among three classification methods, SegNet model built on three-band and six-band data obtains the optimal average accuracy of 89.62% and 90.6%, respectively. Therefore, the proposed deep semantic segmentation model, when tested against the traditional classification method, provides a promising path for mapping PMF in UAV remote sensing images.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 197
Author(s):  
Yong-Woon Kim ◽  
Yung-Cheol Byun ◽  
Addapalli V. N. Krishna

Image segmentation plays a central role in a broad range of applications, such as medical image analysis, autonomous vehicles, video surveillance and augmented reality. Portrait segmentation, which is a subset of semantic image segmentation, is widely used as a preprocessing step in multiple applications such as security systems, entertainment applications, video conferences, etc. A substantial amount of deep learning-based portrait segmentation approaches have been developed, since the performance and accuracy of semantic image segmentation have improved significantly due to the recent introduction of deep learning technology. However, these approaches are limited to a single portrait segmentation model. In this paper, we propose a novel approach using an ensemble method by combining multiple heterogeneous deep-learning based portrait segmentation models to improve the segmentation performance. The Two-Models ensemble and Three-Models ensemble, using a simple soft voting method and weighted soft voting method, were experimented. Intersection over Union (IoU) metric, IoU standard deviation and false prediction rate were used to evaluate the performance. Cost efficiency was calculated to analyze the efficiency of segmentation. The experiment results show that the proposed ensemble approach can perform with higher accuracy and lower errors than single deep-learning-based portrait segmentation models. The results also show that the ensemble of deep-learning models typically increases the use of memory and computing power, although it also shows that the ensemble of deep-learning models can perform more efficiently than a single model with higher accuracy using less memory and less computing power.


Sign in / Sign up

Export Citation Format

Share Document