scholarly journals Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset

2018 ◽  
Vol 10 (6) ◽  
pp. 964 ◽  
Author(s):  
Zhenfeng Shao ◽  
Ke Yang ◽  
Weixun Zhou

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation.

2019 ◽  
Vol 11 (9) ◽  
pp. 1044 ◽  
Author(s):  
Wei Cui ◽  
Fei Wang ◽  
Xin He ◽  
Dongyou Zhang ◽  
Xuxiang Xu ◽  
...  

A comprehensive interpretation of remote sensing images involves not only remote sensing object recognition but also the recognition of spatial relations between objects. Especially in the case of different objects with the same spectrum, the spatial relationship can help interpret remote sensing objects more accurately. Compared with traditional remote sensing object recognition methods, deep learning has the advantages of high accuracy and strong generalizability regarding scene classification and semantic segmentation. However, it is difficult to simultaneously recognize remote sensing objects and their spatial relationship from end-to-end only relying on present deep learning networks. To address this problem, we propose a multi-scale remote sensing image interpretation network, called the MSRIN. The architecture of the MSRIN is a parallel deep neural network based on a fully convolutional network (FCN), a U-Net, and a long short-term memory network (LSTM). The MSRIN recognizes remote sensing objects and their spatial relationship through three processes. First, the MSRIN defines a multi-scale remote sensing image caption strategy and simultaneously segments the same image using the FCN and U-Net on different spatial scales so that a two-scale hierarchy is formed. The output of the FCN and U-Net are masked to obtain the location and boundaries of remote sensing objects. Second, using an attention-based LSTM, the remote sensing image captions include the remote sensing objects (nouns) and their spatial relationships described with natural language. Finally, we designed a remote sensing object recognition and correction mechanism to build the relationship between nouns in captions and object mask graphs using an attention weight matrix to transfer the spatial relationship from captions to objects mask graphs. In other words, the MSRIN simultaneously realizes the semantic segmentation of the remote sensing objects and their spatial relationship identification end-to-end. Experimental results demonstrated that the matching rate between samples and the mask graph increased by 67.37 percentage points, and the matching rate between nouns and the mask graph increased by 41.78 percentage points compared to before correction. The proposed MSRIN has achieved remarkable results.


2021 ◽  
Vol 13 (13) ◽  
pp. 2578
Author(s):  
Samir Touzani ◽  
Jessica Granderson

Advances in machine learning and computer vision, combined with increased access to unstructured data (e.g., images and text), have created an opportunity for automated extraction of building characteristics, cost-effectively, and at scale. These characteristics are relevant to a variety of urban and energy applications, yet are time consuming and costly to acquire with today’s manual methods. Several recent research studies have shown that in comparison to more traditional methods that are based on features engineering approach, an end-to-end learning approach based on deep learning algorithms significantly improved the accuracy of automatic building footprint extraction from remote sensing images. However, these studies used limited benchmark datasets that have been carefully curated and labeled. How the accuracy of these deep learning-based approach holds when using less curated training data has not received enough attention. The aim of this work is to leverage the openly available data to automatically generate a larger training dataset with more variability in term of regions and type of cities, which can be used to build more accurate deep learning models. In contrast to most benchmark datasets, the gathered data have not been manually curated. Thus, the training dataset is not perfectly clean in terms of remote sensing images exactly matching the ground truth building’s foot-print. A workflow that includes data pre-processing, deep learning semantic segmentation modeling, and results post-processing is introduced and applied to a dataset that include remote sensing images from 15 cities and five counties from various region of the USA, which include 8,607,677 buildings. The accuracy of the proposed approach was measured on an out of sample testing dataset corresponding to 364,000 buildings from three USA cities. The results favorably compared to those obtained from Microsoft’s recently released US building footprint dataset.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Jian Huang ◽  
Shanhui Liu ◽  
Yutian Tang ◽  
Xiushan Zhang

With the continuous development of deep learning in computer vision, semantic segmentation technology is constantly employed for processing remote sensing images. For instance, it is a key technology to automatically mark important objects such as ships or port land from port area remote sensing images. However, the existing supervised semantic segmentation model based on deep learning requires a large number of training samples. Otherwise, it will not be able to correctly learn the characteristics of the target objects, which results in the poor performance or even failure of semantic segmentation task. Since the target objects such as ships may move from time to time, it is nontrivial to collect enough samples to achieve satisfactory segmentation performance. And this severely hinders the performance improvement of most of existing augmentation methods. To tackle this problem, in this paper, we propose an object-level remote sensing image augmentation approach based on leveraging the U-Net-based generative adversarial networks. Specifically, our proposed approach consists two components including the semantic tag image generator and the U-Net GAN-based translator. To evaluate the effectiveness of the proposed approach, comprehensive experiments are conducted on a public dataset HRSC2016. State-of-the-art generative models, DCGAN, WGAN, and CycleGAN, are selected as baselines. According to the experimental results, our proposed approach significantly outperforms the baselines in terms of not only drawing the outlines of target objects but also capturing their meaningful details.


2018 ◽  
Vol 10 (8) ◽  
pp. 1220 ◽  
Author(s):  
Zhenfeng Shao ◽  
Ke Yang ◽  
Weixun Zhou

In our paper [1], we presented a dense labeling dataset that can be used for not only single-label and multi-label remote sensing image retrieval but also pixel-based problems such as semantic segmentation.[...]


2021 ◽  
Vol 26 (1) ◽  
pp. 200-215
Author(s):  
Muhammad Alam ◽  
Jian-Feng Wang ◽  
Cong Guangpei ◽  
LV Yunrong ◽  
Yuanfang Chen

AbstractIn recent years, the success of deep learning in natural scene image processing boosted its application in the analysis of remote sensing images. In this paper, we applied Convolutional Neural Networks (CNN) on the semantic segmentation of remote sensing images. We improve the Encoder- Decoder CNN structure SegNet with index pooling and U-net to make them suitable for multi-targets semantic segmentation of remote sensing images. The results show that these two models have their own advantages and disadvantages on the segmentation of different objects. In addition, we propose an integrated algorithm that integrates these two models. Experimental results show that the presented integrated algorithm can exploite the advantages of both the models for multi-target segmentation and achieve a better segmentation compared to these two models.


2021 ◽  
Vol 13 (5) ◽  
pp. 869
Author(s):  
Zheng Zhuo ◽  
Zhong Zhou

In recent years, the amount of remote sensing imagery data has increased exponentially. The ability to quickly and effectively find the required images from massive remote sensing archives is the key to the organization, management, and sharing of remote sensing image information. This paper proposes a high-resolution remote sensing image retrieval method with Gabor-CA-ResNet and a split-based deep feature transform network. The main contributions include two points. (1) For the complex texture, diverse scales, and special viewing angles of remote sensing images, A Gabor-CA-ResNet network taking ResNet as the backbone network is proposed by using Gabor to represent the spatial-frequency structure of images, channel attention (CA) mechanism to obtain stronger representative and discriminative deep features. (2) A split-based deep feature transform network is designed to divide the features extracted by the Gabor-CA-ResNet network into several segments and transform them separately for reducing the dimensionality and the storage space of deep features significantly. The experimental results on UCM, WHU-RS, RSSCN7, and AID datasets show that, compared with the state-of-the-art methods, our method can obtain competitive performance, especially for remote sensing images with rare targets and complex textures.


2021 ◽  
Vol 13 (13) ◽  
pp. 2524
Author(s):  
Ziyi Chen ◽  
Dilong Li ◽  
Wentao Fan ◽  
Haiyan Guan ◽  
Cheng Wang ◽  
...  

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3232 ◽  
Author(s):  
Yan Liu ◽  
Qirui Ren ◽  
Jiahui Geng ◽  
Meng Ding ◽  
Jiangyun Li

Efficient and accurate semantic segmentation is the key technique for automatic remote sensing image analysis. While there have been many segmentation methods based on traditional hand-craft feature extractors, it is still challenging to process high-resolution and large-scale remote sensing images. In this work, a novel patch-wise semantic segmentation method with a new training strategy based on fully convolutional networks is presented to segment common land resources. First, to handle the high-resolution image, the images are split as local patches and then a patch-wise network is built. Second, training data is preprocessed in several ways to meet the specific characteristics of remote sensing images, i.e., color imbalance, object rotation variations and lens distortion. Third, a multi-scale training strategy is developed to solve the severe scale variation problem. In addition, the impact of conditional random field (CRF) is studied to improve the precision. The proposed method was evaluated on a dataset collected from a capital city in West China with the Gaofen-2 satellite. The dataset contains ten common land resources (Grassland, Road, etc.). The experimental results show that the proposed algorithm achieves 54.96% in terms of mean intersection over union (MIoU) and outperforms other state-of-the-art methods in remote sensing image segmentation.


2021 ◽  
Author(s):  
Cong Huang ◽  
Yao Yang ◽  
Huajun Wang ◽  
Yu Ma ◽  
Jinquan Zhao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document