Cross-Modal Feature Fusion Retrieval for Remote Sensing Image-Voice Retrieval

Author(s):  
Rui Yang ◽  
Yu Gu ◽  
Yu Liao ◽  
Huan Zhang ◽  
Yingzhi Sun ◽  
...  
2021 ◽  
Vol 13 (10) ◽  
pp. 1950
Author(s):  
Cuiping Shi ◽  
Xin Zhao ◽  
Liguo Wang

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 4673-4687
Author(s):  
Jixiang Zhao ◽  
Shanwei Liu ◽  
Jianhua Wan ◽  
Muhammad Yasir ◽  
Huayu Li

2019 ◽  
Vol 56 (11) ◽  
pp. 111001
Author(s):  
贺琪 Qi He ◽  
李瑶 Yao Li ◽  
宋巍 Wei Song ◽  
黄冬梅 Dongmei Huang ◽  
何盛琪 Shengqi He ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1999 ◽  
Author(s):  
Donghang Yu ◽  
Qing Xu ◽  
Haitao Guo ◽  
Chuan Zhao ◽  
Yuzhun Lin ◽  
...  

Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.


2018 ◽  
Vol 10 (12) ◽  
pp. 1970 ◽  
Author(s):  
Kun Fu ◽  
Wanxuan Lu ◽  
Wenhui Diao ◽  
Menglong Yan ◽  
Hao Sun ◽  
...  

Binary segmentation in remote sensing aims to obtain binary prediction mask classifying each pixel in the given image. Deep learning methods have shown outstanding performance in this task. These existing methods in fully supervised manner need massive high-quality datasets with manual pixel-level annotations. However, the annotations are generally expensive and sometimes unreliable. Recently, using only image-level annotations, weakly supervised methods have proven to be effective in natural imagery, which significantly reduce the dependence on manual fine labeling. In this paper, we review existing methods and propose a novel weakly supervised binary segmentation framework, which is capable of addressing the issue of class imbalance via a balanced binary training strategy. Besides, a weakly supervised feature-fusion network (WSF-Net) is introduced to adapt to the unique characteristics of objects in remote sensing image. The experiments were implemented on two challenging remote sensing datasets: Water dataset and Cloud dataset. Water dataset is acquired by Google Earth with a resolution of 0.5 m, and Cloud dataset is acquired by Gaofen-1 satellite with a resolution of 16 m. The results demonstrate that using only image-level annotations, our method can achieve comparable results to fully supervised methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-20
Author(s):  
Jiaming Xue ◽  
Shun Xiong ◽  
Chaoguang Men ◽  
Zhiming Liu ◽  
Yongmei Liu

Remote-sensing images play a crucial role in a wide range of applications and have been receiving significant attention. In recent years, great efforts have been made in developing various methods for intelligent interpretation of remote-sensing images. Generally speaking, machine learning-based methods of remote-sensing image interpretation require a large number of labeled samples and there are still not enough annotated datasets in the field of remote sensing. However, manual annotation of remote-sensing images is usually labor-intensive and requires expert knowledge and the accuracy of annotation results is relatively low. The goal of this paper is to propose a novel tile-level annotation method of remote-sensing images to obtain remote-sensing datasets which are well-labeled and contain accurate semantic concepts. Firstly, we use a set of images with defined semantic concepts to represent the training set and divide them into several nonoverlapping regions. Secondly, the color features, texture features, and spatial features of each region are extracted, and discriminative features are obtained by the weight optimization feature fusion method. Then, the features are quantized into visual words by applying a density-based clustering center selection method and an isolated feature point elimination method. And the remote-sensing images can be represented by a series of visual words. Finally, the LDA model is used to calculate the probabilities of semantic categories for each region. The experiments are conducted on remote-sensing images which demonstrate that our proposed method can achieve good performance on remote-sensing image tile-level annotation. The implications of our research can obtain annotated datasets with accurate semantic concepts for intelligent interpretation of remote-sensing images.


Sign in / Sign up

Export Citation Format

Share Document