scholarly journals Rich CNN Features for Water-Body Segmentation from Very High Resolution Aerial and Satellite Imagery

2021 ◽  
Vol 13 (10) ◽  
pp. 1912
Author(s):  
Zhili Zhang ◽  
Meng Lu ◽  
Shunping Ji ◽  
Huafen Yu ◽  
Chenhui Nie

Extracting water-bodies accurately is a great challenge from very high resolution (VHR) remote sensing imagery. The boundaries of a water body are commonly hard to identify due to the complex spectral mixtures caused by aquatic vegetation, distinct lake/river colors, silts near the bank, shadows from the surrounding tall plants, and so on. The diversity and semantic information of features need to be increased for a better extraction of water-bodies from VHR remote sensing images. In this paper, we address these problems by designing a novel multi-feature extraction and combination module. This module consists of three feature extraction sub-modules based on spatial and channel correlations in feature maps at each scale, which extract the complete target information from the local space, larger space, and between-channel relationship to achieve a rich feature representation. Simultaneously, to better predict the fine contours of water-bodies, we adopt a multi-scale prediction fusion module. Besides, to solve the semantic inconsistency of feature fusion between the encoding stage and the decoding stage, we apply an encoder-decoder semantic feature fusion module to promote fusion effects. We carry out extensive experiments in VHR aerial and satellite imagery respectively. The result shows that our method achieves state-of-the-art segmentation performance, surpassing the classic and recent methods. Moreover, our proposed method is robust in challenging water-body extraction scenarios.

Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7397
Author(s):  
Yanjun Wang ◽  
Shaochun Li ◽  
Yunhao Lin ◽  
Mengjie Wang

Rapid and accurate extraction of water bodies from high-spatial-resolution remote sensing images is of great value for water resource management, water quality monitoring and natural disaster emergency response. For traditional water body extraction methods, it is difficult to select image texture and features, the shadows of buildings and other ground objects are in the same spectrum as water bodies, the existing deep convolutional neural network is difficult to train, the consumption of computing resources is large, and the methods cannot meet real-time requirements. In this paper, a water body extraction method based on lightweight MobileNetV2 is proposed and applied to multisensor high-resolution remote sensing images, such as GF-2, WorldView-2 and UAV orthoimages. This method was validated in two typical complex geographical scenes: water bodies for farmland irrigation, which have a broken shape and long and narrow area and are surrounded by many buildings in towns and villages; and water bodies in mountainous areas, which have undulating topography, vegetation coverage and mountain shadows all over. The results were compared with those of the support vector machine, random forest and U-Net models and also verified by generalization tests and the influence of spatial resolution changes. First, the results show that the F1-score and Kappa coefficients of the MobileNetV2 model extracting water bodies from three different high-resolution images were 0.75 and 0.72 for GF-2, 0.86 and 0.85 for Worldview-2 and 0.98 and 0.98 for UAV, respectively, which are higher than those of traditional machine learning models and U-Net. Second, the training time, number of parameters and calculation amount of the MobileNetV2 model were much lower than those of the U-Net model, which greatly improves the water body extraction efficiency. Third, in other more complex surface areas, the MobileNetV2 model still maintained relatively high accuracy of water body extraction. Finally, we tested the effects of multisensor models and found that training with lower and higher spatial resolution images combined can be beneficial, but that using just lower resolution imagery is ineffective. This study provides a reference for the efficient automation of water body classification and extraction under complex geographical environment conditions and can be extended to water resource investigation, management and planning.


2018 ◽  
Vol 10 (10) ◽  
pp. 1602 ◽  
Author(s):  
Rudong Xu ◽  
Yiting Tao ◽  
Zhongyuan Lu ◽  
Yanfei Zhong

A deep neural network is suitable for remote sensing image pixel-wise classification because it effectively extracts features from the raw data. However, remote sensing images with higher spatial resolution exhibit smaller inter-class differences and greater intra-class differences; thus, feature extraction becomes more difficult. The attention mechanism, as a method that simulates the manner in which humans comprehend and perceive images, is useful for the quick and accurate acquisition of key features. In this study, we propose a novel neural network that incorporates two kinds of attention mechanisms in its mask and trunk branches; i.e., control gate (soft) and feedback attention mechanisms, respectively, based on the branches’ primary roles. Thus, a deep neural network can be equipped with an attention mechanism to perform pixel-wise classification for very high-resolution remote sensing (VHRRS) images. The control gate attention mechanism in the mask branch is utilized to build pixel-wise masks for feature maps, to assign different priorities to different locations on different channels for feature extraction recalibration, to apply stress to the effective features, and to weaken the influence of other profitless features. The feedback attention mechanism in the trunk branch allows for the retrieval of high-level semantic features. Hence, additional aids are provided for lower layers to re-weight the focus and to re-update higher-level feature extraction in a target-oriented manner. These two attention mechanisms are fused to form a neural network module. By stacking various modules with different-scale mask branches, the network utilizes different attention-aware features under different local spatial structures. The proposed method is tested on the VHRRS images from the BJ-02, GF-02, Geoeye, and Quickbird satellites, and the influence of the network structure and the rationality of the network design are discussed. Compared with other state-of-the-art methods, our proposed method achieves competitive accuracy, thereby proving its effectiveness.


2021 ◽  
Vol 13 (4) ◽  
pp. 731 ◽  
Author(s):  
Bingyu Chen ◽  
Min Xia ◽  
Junqing Huang

Detailed information regarding land utilization/cover is a valuable resource in various fields. In recent years, remote sensing images, especially aerial images, have become higher in resolution and larger span in time and space, and the phenomenon that the objects in an identical category may yield a different spectrum would lead to the fact that relying on spectral features only is often insufficient to accurately segment the target objects. In convolutional neural networks, down-sampling operations are usually used to extract abstract semantic features, which leads to loss of details and fuzzy edges. To solve these problems, the paper proposes a Multi-level Feature Aggregation Network (MFANet), which is improved in two aspects: deep feature extraction and up-sampling feature fusion. Firstly, the proposed Channel Feature Compression module extracts the deep features and filters the redundant channel information from the backbone to optimize the learned context. Secondly, the proposed Multi-level Feature Aggregation Upsample module nestedly uses the idea that high-level features provide guidance information for low-level features, which is of great significance for positioning the restoration of high-resolution remote sensing images. Finally, the proposed Channel Ladder Refinement module is used to refine the restored high-resolution feature maps. Experimental results show that the proposed method achieves state-of-the-art performance 86.45% mean IOU on LandCover dataset.


2019 ◽  
Vol 11 (21) ◽  
pp. 2504 ◽  
Author(s):  
Jun Zhang ◽  
Min Zhang ◽  
Lukui Shi ◽  
Wenjie Yan ◽  
Bin Pan

Scene classification is one of the bases for automatic remote sensing image interpretation. Recently, deep convolutional neural networks have presented promising performance in high-resolution remote sensing scene classification research. In general, most researchers directly use raw deep features extracted from the convolutional networks to classify scenes. However, this strategy only considers single scale features, which cannot describe both the local and global features of images. In fact, the dissimilarity of scene targets in the same category may result in convolutional features being unable to classify them into the same category. Besides, the similarity of the global features in different categories may also lead to failure of fully connected layer features to distinguish them. To address these issues, we propose a scene classification method based on multi-scale deep feature representation (MDFR), which mainly includes two contributions: (1) region-based features selection and representation; and (2) multi-scale features fusion. Initially, the proposed method filters the multi-scale deep features extracted from pre-trained convolutional networks. Subsequently, these features are fused via two efficient fusion methods. Our method utilizes the complementarity between local features and global features by effectively exploiting the features of different scales and discarding the redundant information in features. Experimental results on three benchmark high-resolution remote sensing image datasets indicate that the proposed method is comparable to some state-of-the-art algorithms.


2019 ◽  
Vol 11 (24) ◽  
pp. 3006 ◽  
Author(s):  
Yafei Lv ◽  
Xiaohan Zhang ◽  
Wei Xiong ◽  
Yaqi Cui ◽  
Mi Cai

Remote sensing image scene classification (RSISC) is an active task in the remote sensing community and has attracted great attention due to its wide applications. Recently, the deep convolutional neural networks (CNNs)-based methods have witnessed a remarkable breakthrough in performance of remote sensing image scene classification. However, the problem that the feature representation is not discriminative enough still exists, which is mainly caused by the characteristic of inter-class similarity and intra-class diversity. In this paper, we propose an efficient end-to-end local-global-fusion feature extraction (LGFFE) network for a more discriminative feature representation. Specifically, global and local features are extracted from channel and spatial dimensions respectively, based on a high-level feature map from deep CNNs. For the local features, a novel recurrent neural network (RNN)-based attention module is first proposed to capture the spatial layout information and context information across different regions. Gated recurrent units (GRUs) is then exploited to generate the important weight of each region by taking a sequence of features from image patches as input. A reweighed regional feature representation can be obtained by focusing on the key region. Then, the final feature representation can be acquired by fusing the local and global features. The whole process of feature extraction and feature fusion can be trained in an end-to-end manner. Finally, extensive experiments have been conducted on four public and widely used datasets and experimental results show that our method LGFFE outperforms baseline methods and achieves state-of-the-art results.


Sign in / Sign up

Export Citation Format

Share Document