Description Generation for Remote Sensing Images Using Attribute Attention Mechanism

Image captioning generates a semantic description of an image. It deals with image understanding and text mining, which has made great progress in recent years. However, it is still a great challenge to bridge the “semantic gap” between low-level features and high-level semantics in remote sensing images, in spite of the improvement of image resolutions. In this paper, we present a new model with an attribute attention mechanism for the description generation of remote sensing images. Therefore, we have explored the impact of the attributes extracted from remote sensing images on the attention mechanism. The results of our experiments demonstrate the validity of our proposed model. The proposed method obtains six higher scores and one slightly lower, compared against several state of the art techniques, on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), and receives all seven higher scores on the UCM Dataset for remote sensing image captioning, indicating that the proposed framework achieves robust performance for semantic description in high-resolution remote sensing images.

Download Full-text

LAM: Remote Sensing Image Captioning with Label-Attention Mechanism

Remote Sensing ◽

10.3390/rs11202349 ◽

2019 ◽

Vol 11 (20) ◽

pp. 2349 ◽

Cited By ~ 2

Author(s):

Zhengyuan Zhang ◽

Wenhui Diao ◽

Wenkai Zhang ◽

Menglong Yan ◽

Xin Gao ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Remote Sensing Image ◽

Image Features ◽

Attention Mechanism ◽

Word Embedding ◽

Remote Sensing Images ◽

Image Captioning ◽

Scoring Method ◽

Label Information

Significant progress has been made in remote sensing image captioning by encoder-decoder frameworks. The conventional attention mechanism is prevalent in this task but still has some drawbacks. The conventional attention mechanism only uses visual information about the remote sensing images without considering using the label information to guide the calculation of attention masks. To this end, a novel attention mechanism, namely Label-Attention Mechanism (LAM), is proposed in this paper. LAM additionally utilizes the label information of high-resolution remote sensing images to generate natural sentences to describe the given images. It is worth noting that, instead of high-level image features, the predicted categories’ word embedding vectors are adopted to guide the calculation of attention masks. Representing the content of images in the form of word embedding vectors can filter out redundant image features. In addition, it can also preserve pure and useful information for generating complete sentences. The experimental results from UCM-Captions, Sydney-Captions and RSICD demonstrate that LAM can improve the model’s performance for describing high-resolution remote sensing images and obtain better S m scores compared with other methods. S m score is a hybrid scoring method derived from the AI Challenge 2017 scoring method. In addition, the validity of LAM is verified by the experiment of using true labels.

Download Full-text

Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images

Sensors ◽

10.3390/s18103232 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3232 ◽

Cited By ~ 17

Author(s):

Yan Liu ◽

Qirui Ren ◽

Jiahui Geng ◽

Meng Ding ◽

Jiangyun Li

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Large Scale ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Training Data ◽

Land Resources ◽

Remote Sensing Images ◽

Training Strategy ◽

The Impact

Efficient and accurate semantic segmentation is the key technique for automatic remote sensing image analysis. While there have been many segmentation methods based on traditional hand-craft feature extractors, it is still challenging to process high-resolution and large-scale remote sensing images. In this work, a novel patch-wise semantic segmentation method with a new training strategy based on fully convolutional networks is presented to segment common land resources. First, to handle the high-resolution image, the images are split as local patches and then a patch-wise network is built. Second, training data is preprocessed in several ways to meet the specific characteristics of remote sensing images, i.e., color imbalance, object rotation variations and lens distortion. Third, a multi-scale training strategy is developed to solve the severe scale variation problem. In addition, the impact of conditional random field (CRF) is studied to improve the precision. The proposed method was evaluated on a dataset collected from a capital city in West China with the Gaofen-2 satellite. The dataset contains ten common land resources (Grassland, Road, etc.). The experimental results show that the proposed algorithm achieves 54.96% in terms of mean intersection over union (MIoU) and outperforms other state-of-the-art methods in remote sensing image segmentation.

Download Full-text

A Remote Sensing Image Segmentation Method Based on Fusion Mechanism

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012016 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012016

Author(s):

Shuangling Zhu ◽

Guli Nazi·Aili Mujiang ◽

Huxidan Jumahong ◽

Pazi Laiti·Nuer Maiti

Keyword(s):

Remote Sensing ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Detection Algorithm ◽

Attention Mechanism ◽

Segmentation Method ◽

Remote Sensing Images ◽

Convolutional Network ◽

Input Layer ◽

Basic Network

Abstract A U-Net convolutional network structure is fully capable of completing the end-to-end training with extremely little data, and can achieve better results. When the convolutional network has a short link between a near input layer and a near output layer, it can implement training in a deeper, more accurate and effective way. This paper mainly proposes a high-resolution remote sensing image change detection algorithm based on dense convolutional channel attention mechanism. The detection algorithm uses U-Net network module as the basic network to extract features, combines Dense-Net dense module to enhance U-Net, and introduces dense convolution channel attention mechanism into the basic convolution unit to highlight important features, thus completing semantic segmentation of dense convolutional remote sensing images. Simulation results have verified the effectiveness and robustness of this study.

Download Full-text

Improved SRCNN remote sensing image spatio-temporal fusion based on multi-stream data input and attention mechanism: taking Landsat8 and MODIS remote sensing images as examples

International Conference on Signal Image Processing and Communication (ICSIPC 2021) ◽

10.1117/12.2600413 ◽

2021 ◽

Author(s):

Ping Liu ◽

Xiangru Jia ◽

Bo Li ◽

Xinrui Li ◽

Feilong Wang ◽

...

Keyword(s):

Remote Sensing ◽

Remote Sensing Image ◽

Attention Mechanism ◽

Data Input ◽

Stream Data ◽

Remote Sensing Images ◽

Spatio Temporal

Download Full-text

A NOVEL FRAMEWORK FOR REMOTE SENSING IMAGE SCENE CLASSIFICATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-657-2018 ◽

2018 ◽

Vol XLII-3 ◽

pp. 657-663 ◽

Cited By ~ 5

Author(s):

S. Jiang ◽

H. Zhao ◽

W. Wu ◽

Q. Tan

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Semantic Category ◽

Distribution Patterns ◽

Remote Sensing Image ◽

Scene Classification ◽

Remote Sensing Images ◽

Training Stage ◽

Feature Extractor ◽

High Level

High resolution remote sensing (HRRS) images scene classification aims to label an image with a specific semantic category. HRRS images contain more details of the ground objects and their spatial distribution patterns than low spatial resolution images. Scene classification can bridge the gap between low-level features and high-level semantics. It can be applied in urban planning, target detection and other fields. This paper proposes a novel framework for HRRS images scene classification. This framework combines the convolutional neural network (CNN) and XGBoost, which utilizes CNN as feature extractor and XGBoost as a classifier. Then, this framework is evaluated on two different HRRS images datasets: UC-Merced dataset and NWPU-RESISC45 dataset. Our framework achieved satisfying accuracies on two datasets, which is 95.57&thinsp;% and 83.35&thinsp;% respectively. From the experiments result, our framework has been proven to be effective for remote sensing images classification. Furthermore, we believe this framework will be more practical for further HRRS scene classification, since it costs less time on training stage.

Download Full-text

Ship Object Detection of Remote Sensing Image Based on Visual Attention

Remote Sensing ◽

10.3390/rs13163192 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3192

Author(s):

Yuxin Dong ◽

Fukun Chen ◽

Shuang Han ◽

Hao Liu

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Visual Attention ◽

Object Detection ◽

Remote Sensing Image ◽

Attention Mechanism ◽

Remote Sensing Images ◽

Data Set ◽

Ship Detection ◽

Visual Attention Mechanism

At present, reliable and precise ship detection in high-resolution optical remote sensing images affected by wave clutter, thin clouds, and islands under complex sea conditions is still challenging. At the same time, object detection algorithms in satellite remote sensing images are challenged by color, aspect ratio, complex background, and angle variability. Even the results obtained based on the latest convolutional neural network (CNN) method are not satisfactory. In order to obtain more accurate ship detection results, this paper proposes a remote sensing image ship object detection method based on a brainlike visual attention mechanism. We refer to the robust expression mode of the human brain, design a vector field filter with active rotation capability, and explicitly encode the direction information of the remote sensing object in the neural network. The progressive enhancement learning model guided by the visual attention mechanism is used to dynamically solve the problem, and the object can be discovered and detected through time–space information. To verify the effectiveness of the proposed method, a remote sensing ship object detection data set is established, and the proposed method is compared with other state-of-the-art methods on the established data set. Experiments show that the object detection accuracy of this method and the ability to capture image details have been improved. Compared with other models, the average intersection rate of the joint is 80.12%, which shows a clear advantage. The proposed method is fast enough to meet the needs of ship detection in remote sensing images.

Download Full-text

A Multi-Level Attention Model for Remote Sensing Image Captions

Remote Sensing ◽

10.3390/rs12060939 ◽

2020 ◽

Vol 12 (6) ◽

pp. 939 ◽

Cited By ~ 1

Author(s):

Yangyang Li ◽

Shuangkang Fang ◽

Licheng Jiao ◽

Ruijiao Liu ◽

Ronghua Shang

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Remote Sensing Image ◽

Attention Mechanism ◽

Complex Task ◽

Human Beings ◽

Image Captioning ◽

Attention Model ◽

Multi Level ◽

Image Caption

The task of image captioning involves the generation of a sentence that can describe an image appropriately, which is the intersection of computer vision and natural language. Although the research on remote sensing image captions has just started, it has great significance. The attention mechanism is inspired by the way humans think, which is widely used in remote sensing image caption tasks. However, the attention mechanism currently used in this task is mainly aimed at images, which is too simple to express such a complex task well. Therefore, in this paper, we propose a multi-level attention model, which is a closer imitation of attention mechanisms of human beings. This model contains three attention structures, which represent the attention to different areas of the image, the attention to different words, and the attention to vision and semantics. Experiments show that our model has achieved better results than before, which is currently state-of-the-art. In addition, the existing datasets for remote sensing image captioning contain a large number of errors. Therefore, in this paper, a lot of work has been done to modify the existing datasets in order to promote the research of remote sensing image captioning.

Download Full-text

MILL: Channel Attention–based Deep Multiple Instance Learning for Landslide Recognition

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3454009 ◽

2021 ◽

Vol 17 (2s) ◽

pp. 1-11

Author(s):

Xiaochuan Tang ◽

Mingzhe Liu ◽

Hao Zhong ◽

Yuanzhen Ju ◽

Weile Li ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Large Scale ◽

Remote Sensing Image ◽

Disaster Risk ◽

Multiple Instance Learning ◽

Remote Sensing Images ◽

Loess Area ◽

Remote Sensing Image Classification ◽

Natural Disaster Risk

Landslide recognition is widely used in natural disaster risk management. Traditional landslide recognition is mainly conducted by geologists, which is accurate but inefficient. This article introduces multiple instance learning (MIL) to perform automatic landslide recognition. An end-to-end deep convolutional neural network is proposed, referred to as Multiple Instance Learning–based Landslide classification (MILL). First, MILL uses a large-scale remote sensing image classification dataset to build pre-train networks for landslide feature extraction. Second, MILL extracts instances and assign instance labels without pixel-level annotations. Third, MILL uses a new channel attention–based MIL pooling function to map instance-level labels to bag-level label. We apply MIL to detect landslides in a loess area. Experimental results demonstrate that MILL is effective in identifying landslides in remote sensing images.

Download Full-text

Improved SinGAN Integrated with an Attentional Mechanism for Remote Sensing Image Classification

Remote Sensing ◽

10.3390/rs13091713 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1713

Author(s):

Songwei Gu ◽

Rui Zhang ◽

Hongxia Luo ◽

Mengyao Li ◽

Huamei Feng ◽

...

Keyword(s):

Remote Sensing ◽

Real Life ◽

Attention Mechanism ◽

Training Data ◽

Generative Adversarial Networks ◽

Natural Image ◽

Remote Sensing Images ◽

Training Time ◽

Adversarial Networks ◽

Remote Sensing Image Classification

Deep learning is an important research method in the remote sensing field. However, samples of remote sensing images are relatively few in real life, and those with markers are scarce. Many neural networks represented by Generative Adversarial Networks (GANs) can learn from real samples to generate pseudosamples, rather than traditional methods that often require more time and man-power to obtain samples. However, the generated pseudosamples often have poor realism and cannot be reliably used as the basis for various analyses and applications in the field of remote sensing. To address the abovementioned problems, a pseudolabeled sample generation method is proposed in this work and applied to scene classification of remote sensing images. The improved unconditional generative model that can be learned from a single natural image (Improved SinGAN) with an attention mechanism can effectively generate enough pseudolabeled samples from a single remote sensing scene image sample. Pseudosamples generated by the improved SinGAN model have stronger realism and relatively less training time, and the extracted features are easily recognized in the classification network. The improved SinGAN can better identify sub-jects from images with complex ground scenes compared with the original network. This mechanism solves the problem of geographic errors of generated pseudosamples. This study incorporated the generated pseudosamples into training data for the classification experiment. The result showed that the SinGAN model with the integration of the attention mechanism can better guarantee feature extraction of the training data. Thus, the quality of the generated samples is improved and the classification accuracy and stability of the classification network are also enhanced.

Download Full-text

Semantic Relation Model and Dataset for Remote Sensing Scene Understanding

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070488 ◽

2021 ◽

Vol 10 (7) ◽

pp. 488

Author(s):

Peng Li ◽

Dezheng Zhang ◽

Aziguli Wulamu ◽

Xin Liu ◽

Peng Chen

Keyword(s):

Remote Sensing ◽

Scene Understanding ◽

Deep Understanding ◽

Remote Sensing Images ◽

Convolutional Network ◽

Scene Graph ◽

Multi Scale ◽

Relationship Extraction ◽

High Level ◽

Graph Generation

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.

Download Full-text