Multi-label semantic feature fusion for remote sensing image captioning

Author(s):  
Shuang Wang ◽  
Xiutiao Ye ◽  
Yu Gu ◽  
Jihui Wang ◽  
Yun Meng ◽  
...  
2019 ◽  
Vol 11 (20) ◽  
pp. 2349 ◽  
Author(s):  
Zhengyuan Zhang ◽  
Wenhui Diao ◽  
Wenkai Zhang ◽  
Menglong Yan ◽  
Xin Gao ◽  
...  

Significant progress has been made in remote sensing image captioning by encoder-decoder frameworks. The conventional attention mechanism is prevalent in this task but still has some drawbacks. The conventional attention mechanism only uses visual information about the remote sensing images without considering using the label information to guide the calculation of attention masks. To this end, a novel attention mechanism, namely Label-Attention Mechanism (LAM), is proposed in this paper. LAM additionally utilizes the label information of high-resolution remote sensing images to generate natural sentences to describe the given images. It is worth noting that, instead of high-level image features, the predicted categories’ word embedding vectors are adopted to guide the calculation of attention masks. Representing the content of images in the form of word embedding vectors can filter out redundant image features. In addition, it can also preserve pure and useful information for generating complete sentences. The experimental results from UCM-Captions, Sydney-Captions and RSICD demonstrate that LAM can improve the model’s performance for describing high-resolution remote sensing images and obtain better S m scores compared with other methods. S m score is a hybrid scoring method derived from the AI Challenge 2017 scoring method. In addition, the validity of LAM is verified by the experiment of using true labels.


2020 ◽  
Vol 12 (11) ◽  
pp. 1874
Author(s):  
Kun Fu ◽  
Yang Li ◽  
Wenkai Zhang ◽  
Hongfeng Yu ◽  
Xian Sun

The encoder–decoder framework has been widely used in the remote sensing image captioning task. When we need to extract remote sensing images containing specific characteristics from the described sentences for research, rich sentences can improve the final extraction results. However, the Long Short-Term Memory (LSTM) network used in decoders still loses some information in the picture over time when the generated caption is long. In this paper, we present a new model component named the Persistent Memory Mechanism (PMM), which can expand the information storage capacity of LSTM with an external memory. The external memory is a memory matrix with a predetermined size. It can store all the hidden layer vectors of LSTM before the current time step. Thus, our method can effectively solve the above problem. At each time step, the PMM searches previous information related to the input information at the current time from the external memory. Then the PMM will process the captured long-term information and predict the next word with the current information. In addition, it updates its memory with the input information. This method can pick up the long-term information missed from the LSTM but useful to the caption generation. By applying this method to image captioning, our CIDEr scores on datasets UCM-Captions, Sydney-Captions, and RSICD increased by 3%, 5%, and 7%, respectively.


2021 ◽  
Vol 29 (11) ◽  
pp. 2672-2682
Author(s):  
Xin CHEN ◽  
◽  
Min-jie WAN ◽  
Chao MA ◽  
Qian CHEN ◽  
...  

Author(s):  
Zhengyuan Zhang ◽  
Wenkai Zhang ◽  
Menglong Yan ◽  
Xin Gao ◽  
Kun Fu ◽  
...  

Author(s):  
Yun Meng ◽  
Yu Gu ◽  
Xiutiao Ye ◽  
Jingxian Tian ◽  
Shuang Wang ◽  
...  

2019 ◽  
Vol 56 (12) ◽  
pp. 121003
Author(s):  
金秋含 Qiuhan Jin ◽  
王阳萍 Yangping Wang ◽  
杨景玉 Jingyu Yang

Sign in / Sign up

Export Citation Format

Share Document