scholarly journals Multi-Class Strategies for Joint Building Footprint and Road Detection in Remote Sensing

2021 ◽  
Vol 11 (18) ◽  
pp. 8340
Author(s):  
Christian Ayala ◽  
Carlos Aranda ◽  
Mikel Galar

Building footprints and road networks are important inputs for a great deal of services. For instance, building maps are useful for urban planning, whereas road maps are essential for disaster response services. Traditionally, building and road maps are manually generated by remote sensing experts or land surveying, occasionally assisted by semi-automatic tools. In the last decade, deep learning-based approaches have demonstrated their capabilities to extract these elements automatically and accurately from remote sensing imagery. The building footprint and road network detection problem can be considered a multi-class semantic segmentation task, that is, a single model performs a pixel-wise classification on multiple classes, optimizing the overall performance. However, depending on the spatial resolution of the imagery used, both classes may coexist within the same pixel, drastically reducing their separability. In this regard, binary decomposition techniques, which have been widely studied in the machine learning literature, are proved useful for addressing multi-class problems. Accordingly, the multi-class problem can be split into multiple binary semantic segmentation sub-problems, specializing different models for each class. Nevertheless, in these cases, an aggregation step is required to obtain the final output labels. Additionally, other novel approaches, such as multi-task learning, may come in handy to further increase the performance of the binary semantic segmentation models. Since there is no certainty as to which strategy should be carried out to accurately tackle a multi-class remote sensing semantic segmentation problem, this paper performs an in-depth study to shed light on the issue. For this purpose, open-access Sentinel-1 and Sentinel-2 imagery (at 10 m) are considered for extracting buildings and roads, making use of the well-known U-Net convolutional neural network. It is worth stressing that building and road classes may coexist within the same pixel when working at such a low spatial resolution, setting a challenging problem scheme. Accordingly, a robust experimental study is developed to assess the benefits of the decomposition strategies and their combination with a multi-task learning scheme. The obtained results demonstrate that decomposing the considered multi-class remote sensing semantic segmentation problem into multiple binary ones using a One-vs.-All binary decomposition technique leads to better results than the standard direct multi-class approach. Additionally, the benefits of using a multi-task learning scheme for pushing the performance of binary segmentation models are also shown.

2020 ◽  
Vol 9 (8) ◽  
pp. 486 ◽  
Author(s):  
Aleksandar Milosavljević

The proliferation of high-resolution remote sensing sensors and platforms imposes the need for effective analyses and automated processing of high volumes of aerial imagery. The recent advance of artificial intelligence (AI) in the form of deep learning (DL) and convolutional neural networks (CNN) showed remarkable results in several image-related tasks, and naturally, gain the focus of the remote sensing community. In this paper, we focus on specifying the processing pipeline that relies on existing state-of-the-art DL segmentation models to automate building footprint extraction. The proposed pipeline is organized in three stages: image preparation, model implementation and training, and predictions fusion. For the first and third stages, we introduced several techniques that leverage remote sensing imagery specifics, while for the selection of the segmentation model, we relied on empirical examination. In the paper, we presented and discussed several experiments that we conducted on Inria Aerial Image Labeling Dataset. Our findings confirmed that automatic processing of remote sensing imagery using DL semantic segmentation is both possible and can provide applicable results. The proposed pipeline can be potentially transferred to any other remote sensing imagery segmentation task if the corresponding dataset is available.


2021 ◽  
Vol 13 (18) ◽  
pp. 3630
Author(s):  
Ziming Li ◽  
Qinchuan Xin ◽  
Ying Sun ◽  
Mengying Cao

Accurate building footprint polygons provide essential data for a wide range of urban applications. While deep learning models have been proposed to extract pixel-based building areas from remote sensing imagery, the direct vectorization of pixel-based building maps often leads to building footprint polygons with irregular shapes that are inconsistent with real building boundaries, making it difficult to use them in geospatial analysis. In this study, we proposed a novel deep learning-based framework for automated extraction of building footprint polygons (DLEBFP) from very high-resolution aerial imagery by combining deep learning models for different tasks. Our approach uses the U-Net, Cascade R-CNN, and Cascade CNN deep learning models to obtain building segmentation maps, building bounding boxes, and building corners, respectively, from very high-resolution remote sensing images. We used Delaunay triangulation to construct building footprint polygons based on the detected building corners with the constraints of building bounding boxes and building segmentation maps. Experiments on the Wuhan University building dataset and ISPRS Vaihingen dataset indicate that DLEBFP can perform well in extracting high-quality building footprint polygons. Compared with the other semantic segmentation models and the vector map generalization method, DLEBFP is able to achieve comparable mapping accuracies with semantic segmentation models on a pixel basis and generate building footprint polygons with concise edges and vertices with regular shapes that are close to the reference data. The promising performance indicates that our method has the potential to extract accurate building footprint polygons from remote sensing images for applications in geospatial analysis.


2021 ◽  
Vol 13 (3) ◽  
pp. 394
Author(s):  
Wei Zhang ◽  
Ping Tang ◽  
Thomas Corpetti ◽  
Lijun Zhao

Land cover classification is one of the most fundamental tasks in the field of remote sensing. In recent years, fully supervised fully convolutional network (FCN)-based semantic segmentation models have achieved state-of-the-art performance in the semantic segmentation task. However, creating pixel-level annotations is prohibitively expensive and laborious, especially when dealing with remote sensing images. Weakly supervised learning methods from weakly labeled annotations can overcome this difficulty to some extent and achieve impressive segmentation results, but results are limited in accuracy. Inspired by point supervision and the traditional segmentation method of seeded region growing (SRG) algorithm, a weakly towards strongly (WTS) supervised learning framework is proposed in this study for remote sensing land cover classification to handle the absence of well-labeled and abundant pixel-level annotations when using segmentation models. In this framework, only several points with true class labels are required as the training set, which are much less expensive to acquire compared with pixel-level annotations through field survey or visual interpretation using high-resolution images. Firstly, they are used to train a Support Vector Machine (SVM) classifier. Once fully trained, the SVM is used to generate the initial seeded pixel-level training set, in which only the pixels with high confidence are assigned with class labels whereas others are unlabeled. They are used to weakly train the segmentation model. Then, the seeded region growing module and fully connected Conditional Random Fields (CRFs) are used to iteratively update the seeded pixel-level training set for progressively increasing pixel-level supervision of the segmentation model. Sentinel-2 remote sensing images are used to validate the proposed framework, and SVM is selected for comparison. In addition, FROM-GLC10 global land cover map is used as training reference to directly train the segmentation model. Experimental results show that the proposed framework outperforms other methods and can be highly recommended for land cover classification tasks when the pixel-level labeled datasets are insufficient by using segmentation models.


2021 ◽  
Vol 10 (10) ◽  
pp. 672
Author(s):  
Suting Chen ◽  
Chaoqun Wu ◽  
Mithun Mukherjee ◽  
Yujie Zheng

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.


2021 ◽  
Vol 13 (13) ◽  
pp. 2578
Author(s):  
Samir Touzani ◽  
Jessica Granderson

Advances in machine learning and computer vision, combined with increased access to unstructured data (e.g., images and text), have created an opportunity for automated extraction of building characteristics, cost-effectively, and at scale. These characteristics are relevant to a variety of urban and energy applications, yet are time consuming and costly to acquire with today’s manual methods. Several recent research studies have shown that in comparison to more traditional methods that are based on features engineering approach, an end-to-end learning approach based on deep learning algorithms significantly improved the accuracy of automatic building footprint extraction from remote sensing images. However, these studies used limited benchmark datasets that have been carefully curated and labeled. How the accuracy of these deep learning-based approach holds when using less curated training data has not received enough attention. The aim of this work is to leverage the openly available data to automatically generate a larger training dataset with more variability in term of regions and type of cities, which can be used to build more accurate deep learning models. In contrast to most benchmark datasets, the gathered data have not been manually curated. Thus, the training dataset is not perfectly clean in terms of remote sensing images exactly matching the ground truth building’s foot-print. A workflow that includes data pre-processing, deep learning semantic segmentation modeling, and results post-processing is introduced and applied to a dataset that include remote sensing images from 15 cities and five counties from various region of the USA, which include 8,607,677 buildings. The accuracy of the proposed approach was measured on an out of sample testing dataset corresponding to 364,000 buildings from three USA cities. The results favorably compared to those obtained from Microsoft’s recently released US building footprint dataset.


Author(s):  
Chandra Pal Kushwah ◽  
Kuruna Markam

Bidirectional in recent years, Deep learning performance in natural scene image processing has improved its use in remote sensing image analysis. In this paper, we used the semantic segmentation of remote sensing images for deep neural networks (DNN). To make it ideal for multi-target semantic segmentation of remote sensing image systems, we boost the Seg Net encoder-decoder CNN structures with index pooling & U-net. The findings reveal that the segmentation of various objects has its benefits and drawbacks for both models. Furthermore, we provide an integrated algorithm that incorporates two models. The test results indicate that the integrated algorithm proposed will take advantage of all multi-target segmentation models and obtain improved segmentation relative to two models.


2021 ◽  
Vol 13 (6) ◽  
pp. 1083
Author(s):  
Chenbin Liang ◽  
Bo Cheng ◽  
Baihua Xiao ◽  
Chenlinqiu He ◽  
Xunan Liu ◽  
...  

Coastal aquaculture areas are some of the main areas to obtain marine fishery resources and are vulnerable to storm-tide disasters. Obtaining the information of coastal aquaculture areas quickly and accurately is important for the scientific management and planning of aquaculture resources. Recently, deep neural networks have been widely used in remote sensing to deal with many problems, such as scene classification and object detection, and there are many data sources with different spatial resolutions and different uses with the development of remote sensing technology. Thus, using deep learning networks to extract coastal aquaculture areas often encounters the following problems: (1) the difficulty in labeling; (2) the poor robustness of the model; (3) the spatial resolution of the image to be processed is inconsistent with that of the existing samples. In order to fix these problems, this paper proposes a novel semi-/weakly-supervised method, the semi-/weakly-supervised semantic segmentation network (Semi-SSN), and adopts 3 data sources: GaoFen-2 image, GaoFen-1(PMS)image, and GanFen-1(WFV)image with a 0.8 m, 2 m, and 16 m spatial resolution, respectively, and through experiments, we analyze the extraction effect of the model comprehensively. After comparing with other the-state-of-art methods and verifying on an open remote sensing dataset, we take the Fujian coastal area (mainly Sanduo) as the experimental area and employ our method to detect the effect of storm-tide disasters on coastal aquaculture areas, monitor the production, and make the distribution map of coastal aquaculture areas.


2019 ◽  
Vol 11 (17) ◽  
pp. 2008 ◽  
Author(s):  
Qinchen Yang ◽  
Man Liu ◽  
Zhitao Zhang ◽  
Shuqin Yang ◽  
Jifeng Ning ◽  
...  

With increasing consumption, plastic mulch benefits agriculture by promoting crop quality and yield, but the environmental and soil pollution is becoming increasingly serious. Therefore, research on the monitoring of plastic mulched farmland (PMF) has received increasing attention. Plastic mulched farmland in unmanned aerial vehicle (UAV) remote images due to the high resolution, shows a prominent spatial pattern, which brings difficulties to the task of monitoring PMF. In this paper, through a comparison between two deep semantic segmentation methods, SegNet and fully convolutional networks (FCN), and a traditional classification method, Support Vector Machine (SVM), we propose an end-to-end deep-learning method aimed at accurately recognizing PMF for UAV remote sensing images from Hetao Irrigation District, Inner Mongolia, China. After experiments with single-band, three-band and six-band image data, we found that deep semantic segmentation models built via single-band data which only use the texture pattern of PMF can identify it well; for example, SegNet reaching the highest accuracy of 88.68% in a 900 nm band. Furthermore, with three visual bands and six-band data (3 visible bands and 3 near-infrared bands), deep semantic segmentation models combining the texture and spectral features further improve the accuracy of PMF identification, whereas six-band data obtains an optimal performance for FCN and SegNet. In addition, deep semantic segmentation methods, FCN and SegNet, due to their strong feature extraction capability and direct pixel classification, clearly outperform the traditional SVM method in precision and speed. Among three classification methods, SegNet model built on three-band and six-band data obtains the optimal average accuracy of 89.62% and 90.6%, respectively. Therefore, the proposed deep semantic segmentation model, when tested against the traditional classification method, provides a promising path for mapping PMF in UAV remote sensing images.


2022 ◽  
Vol 14 (1) ◽  
pp. 190
Author(s):  
Yuxiang Cai ◽  
Yingchun Yang ◽  
Qiyi Zheng ◽  
Zhengwei Shen ◽  
Yongheng Shang ◽  
...  

When segmenting massive amounts of remote sensing images collected from different satellites or geographic locations (cities), the pre-trained deep learning models cannot always output satisfactory predictions. To deal with this issue, domain adaptation has been widely utilized to enhance the generalization abilities of the segmentation models. Most of the existing domain adaptation methods, which based on image-to-image translation, firstly transfer the source images to the pseudo-target images, adapt the classifier from the source domain to the target domain. However, these unidirectional methods suffer from the following two limitations: (1) they do not consider the inverse procedure and they cannot fully take advantage of the information from the other domain, which is also beneficial, as confirmed by our experiments; (2) these methods may fail in the cases where transferring the source images to the pseudo-target images is difficult. In this paper, in order to solve these problems, we propose a novel framework BiFDANet for unsupervised bidirectional domain adaptation in the semantic segmentation of remote sensing images. It optimizes the segmentation models in two opposite directions. In the source-to-target direction, BiFDANet learns to transfer the source images to the pseudo-target images and adapts the classifier to the target domain. In the opposite direction, BiFDANet transfers the target images to the pseudo-source images and optimizes the source classifier. At test stage, we make the best of the source classifier and the target classifier, which complement each other with a simple linear combination method, further improving the performance of our BiFDANet. Furthermore, we propose a new bidirectional semantic consistency loss for our BiFDANet to maintain the semantic consistency during the bidirectional image-to-image translation process. The experiments on two datasets including satellite images and aerial images demonstrate the superiority of our method against existing unidirectional methods.


Sign in / Sign up

Export Citation Format

Share Document