scholarly journals A Dual-Model Architecture with Grouping-Attention-Fusion for Remote Sensing Scene Classification

2021 ◽  
Vol 13 (3) ◽  
pp. 433
Author(s):  
Junge Shen ◽  
Tong Zhang ◽  
Yichen Wang ◽  
Ruxin Wang ◽  
Qi Wang ◽  
...  

Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dual-model architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.

2021 ◽  
Vol 13 (10) ◽  
pp. 1925
Author(s):  
Shengzhou Xiong ◽  
Yihua Tan ◽  
Yansheng Li ◽  
Cai Wen ◽  
Pei Yan

Object detection in remote sensing images (RSIs) is one of the basic tasks in the field of remote sensing image automatic interpretation. In recent years, the deep object detection frameworks of natural scene images (NSIs) have been introduced into object detection on RSIs, and the detection performance has improved significantly because of the powerful feature representation. However, there are still many challenges concerning the particularities of remote sensing objects. One of the main challenges is the missed detection of small objects which have less than five percent of the pixels of the big objects. Generally, the existing algorithms choose to deal with this problem by multi-scale feature fusion based on a feature pyramid. However, the benefits of this strategy are limited, considering that the location of small objects in the feature map will disappear when the detection task is processed at the end of the network. In this study, we propose a subtask attention network (StAN), which handles the detection task directly on the shallow layer of the network. First, StAN contains one shared feature branch and two subtask attention branches of a semantic auxiliary subtask and a detection subtask based on the multi-task attention network (MTAN). Second, the detection branch uses only low-level features considering small objects. Third, the attention map guidance mechanism is put forward to optimize the network for keeping the identification ability. Fourth, the multi-dimensional sampling module (MdS), global multi-view channel weights (GMulW) and target-guided pixel attention (TPA) are designed for further improvement of the detection accuracy in complex scenes. The experimental results on the NWPU VHR-10 dataset and DOTA dataset demonstrated that the proposed algorithm achieved the SOTA performance, and the missed detection of small objects decreased. On the other hand, ablation experiments also proved the effects of MdS, GMulW and TPA.


2019 ◽  
Vol 11 (13) ◽  
pp. 1594 ◽  
Author(s):  
Heqian Qiu ◽  
Hongliang Li ◽  
Qingbo Wu ◽  
Fanman Meng ◽  
King Ngi Ngan ◽  
...  

Object detection is a significant and challenging problem in the study area of remote sensing and image analysis. However, most existing methods are easy to miss or incorrectly locate objects due to the various sizes and aspect ratios of objects. In this paper, we propose a novel end-to-end Adaptively Aspect Ratio Multi-Scale Network (A 2 RMNet) to solve this problem. On the one hand, we design a multi-scale feature gate fusion network to adaptively integrate the multi-scale features of objects. This network is composed of gate fusion modules, refine blocks and region proposal networks. On the other hand, an aspect ratio attention network is leveraged to preserve the aspect ratios of objects, which alleviates the excessive shape distortions of objects caused by aspect ratio changes during training. Experiments show that the proposed A 2 RMNet significantly outperforms the previous state of the arts on the DOTA dataset, NWPU VHR-10 dataset, RSOD dataset and UCAS-AOD dataset by 5.73 % , 7.06 % , 3.27 % and 2.24 % , respectively.


2019 ◽  
Vol 11 (21) ◽  
pp. 2504 ◽  
Author(s):  
Jun Zhang ◽  
Min Zhang ◽  
Lukui Shi ◽  
Wenjie Yan ◽  
Bin Pan

Scene classification is one of the bases for automatic remote sensing image interpretation. Recently, deep convolutional neural networks have presented promising performance in high-resolution remote sensing scene classification research. In general, most researchers directly use raw deep features extracted from the convolutional networks to classify scenes. However, this strategy only considers single scale features, which cannot describe both the local and global features of images. In fact, the dissimilarity of scene targets in the same category may result in convolutional features being unable to classify them into the same category. Besides, the similarity of the global features in different categories may also lead to failure of fully connected layer features to distinguish them. To address these issues, we propose a scene classification method based on multi-scale deep feature representation (MDFR), which mainly includes two contributions: (1) region-based features selection and representation; and (2) multi-scale features fusion. Initially, the proposed method filters the multi-scale deep features extracted from pre-trained convolutional networks. Subsequently, these features are fused via two efficient fusion methods. Our method utilizes the complementarity between local features and global features by effectively exploiting the features of different scales and discarding the redundant information in features. Experimental results on three benchmark high-resolution remote sensing image datasets indicate that the proposed method is comparable to some state-of-the-art algorithms.


2019 ◽  
Vol 11 (18) ◽  
pp. 2095 ◽  
Author(s):  
Kun Fu ◽  
Zhuo Chen ◽  
Yue Zhang ◽  
Xian Sun

In recent years, deep learning has led to a remarkable breakthrough in object detection in remote sensing images. In practice, two-stage detectors perform well regarding detection accuracy but are slow. On the other hand, one-stage detectors integrate the detection pipeline of two-stage detectors to simplify the detection process, and are faster, but with lower detection accuracy. Enhancing the capability of feature representation may be a way to improve the detection accuracy of one-stage detectors. For this goal, this paper proposes a novel one-stage detector with enhanced capability of feature representation. The enhanced capability benefits from two proposed structures: dual top-down module and dense-connected inception module. The former efficiently utilizes multi-scale features from multiple layers of the backbone network. The latter both widens and deepens the network to enhance the ability of feature representation with limited extra computational cost. To evaluate the effectiveness of proposed structures, we conducted experiments on horizontal bounding box detection tasks on the challenging DOTA dataset and gained 73.49% mean Average Precision (mAP), achieving state-of-the-art performance. Furthermore, our method ran significantly faster than the best public two-stage detector on the DOTA dataset.


2021 ◽  
Vol 13 (18) ◽  
pp. 3771
Author(s):  
Tao Lei ◽  
Linze Li ◽  
Zhiyong Lv ◽  
Mingzhe Zhu ◽  
Xiaogang Du ◽  
...  

Land cover classification from very high-resolution (VHR) remote sensing images is a challenging task due to the complexity of geography scenes and the varying shape and size of ground targets. It is difficult to utilize the spectral data directly, or to use traditional multi-scale feature extraction methods, to improve VHR remote sensing image classification results. To address the problem, we proposed a multi-modality and multi-scale attention fusion network for land cover classification from VHR remote sensing images. First, based on the encoding-decoding network, we designed a multi-modality fusion module that can simultaneously fuse more useful features and avoid redundant features. This addresses the problem of low classification accuracy for some objects caused by the weak ability of feature representation from single modality data. Second, a novel multi-scale spatial context enhancement module was introduced to improve feature fusion, which solves the problem of a large-scale variation of objects in remote sensing images, and captures long-range spatial relationships between objects. The proposed network and comparative networks were evaluated on two public datasets—the Vaihingen and the Potsdam datasets. It was observed that the proposed network achieves better classification results, with a mean F1-score of 88.6% for the Vaihingen dataset and 92.3% for the Potsdam dataset. Experimental results show that our model is superior to the state-of-the-art network models.


2021 ◽  
Vol 10 (7) ◽  
pp. 488
Author(s):  
Peng Li ◽  
Dezheng Zhang ◽  
Aziguli Wulamu ◽  
Xin Liu ◽  
Peng Chen

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.


2021 ◽  
Author(s):  
Xuyang Teng ◽  
Yiming Pan ◽  
Meilin He ◽  
Meihua Bi ◽  
Zhaoyang Qiu ◽  
...  

Sensors ◽  
2018 ◽  
Vol 18 (2) ◽  
pp. 498 ◽  
Author(s):  
Hong Zhu ◽  
Xinming Tang ◽  
Junfeng Xie ◽  
Weidong Song ◽  
Fan Mo ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document