A Lightweight and Fine-Grained Feature Fusion Network for Remote Sensing Scene Classification

Author(s):  
Lin Bai ◽  
Qingxin Liu ◽  
Cuiling Li ◽  
Zhen Ye ◽  
Meng Hui
2021 ◽  
Vol 13 (10) ◽  
pp. 1950
Author(s):  
Cuiping Shi ◽  
Xin Zhao ◽  
Liguo Wang

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1999 ◽  
Author(s):  
Donghang Yu ◽  
Qing Xu ◽  
Haitao Guo ◽  
Chuan Zhao ◽  
Yuzhun Lin ◽  
...  

Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.


2020 ◽  
Vol 12 (9) ◽  
pp. 1366 ◽  
Author(s):  
Jun Li ◽  
Daoyu Lin ◽  
Yang Wang ◽  
Guangluan Xu ◽  
Yunyan Zhang ◽  
...  

In recent years, convolutional neural networks (CNNs) have shown great success in the scene classification of computer vision images. Although these CNNs can achieve excellent classification accuracy, the discriminative ability of feature representations extracted from CNNs is still limited in distinguishing more complex remote sensing images. Therefore, we propose a unified feature fusion framework based on attention mechanism in this paper, which is called Deep Discriminative Representation Learning with Attention Map (DDRL-AM). Firstly, by applying Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, attention maps associated with the predicted results are generated in order to make CNNs focus on the most salient parts of the image. Secondly, a spatial feature transformer (SFT) is designed to extract discriminative features from attention maps. Then an innovative two-channel CNN architecture is proposed by the fusion of features extracted from attention maps and the RGB (red green blue) stream. A new objective function that considers both center and cross-entropy loss are optimized to decrease the influence of inter-class dispersion and within-class variance. In order to show its effectiveness in classifying remote sensing images, the proposed DDRL-AM method is evaluated on four public benchmark datasets. The experimental results demonstrate the competitive scene classification performance of the DDRL-AM approach. Moreover, the visualization of features extracted by the proposed DDRL-AM method can prove that the discriminative ability of features has been increased.


2018 ◽  
Vol 10 (7) ◽  
pp. 1158 ◽  
Author(s):  
Yunlong Yu ◽  
Fuxian Liu

Aerial scene classification is an active and challenging problem in high-resolution remote sensing imagery understanding. Deep learning models, especially convolutional neural networks (CNNs), have achieved prominent performance in this field. The extraction of deep features from the layers of a CNN model is widely used in these CNN-based methods. Although the CNN-based approaches have obtained great success, there is still plenty of room to further increase the classification accuracy. As a matter of fact, the fusion with other features has great potential for leading to the better performance of aerial scene classification. Therefore, we propose two effective architectures based on the idea of feature-level fusion. The first architecture, i.e., texture coded two-stream deep architecture, uses the raw RGB network stream and the mapped local binary patterns (LBP) coded network stream to extract two different sets of features and fuses them using a novel deep feature fusion model. In the second architecture, i.e., saliency coded two-stream deep architecture, we employ the saliency coded network stream as the second stream and fuse it with the raw RGB network stream using the same feature fusion model. For sake of validation and comparison, our proposed architectures are evaluated via comprehensive experiments with three publicly available remote sensing scene datasets. The classification accuracies of saliency coded two-stream architecture with our feature fusion model achieve 97.79%, 98.90%, 94.09%, 95.99%, 85.02%, and 87.01% on the UC-Merced dataset (50% and 80% training samples), the Aerial Image Dataset (AID) (20% and 50% training samples), and the NWPU-RESISC45 dataset (10% and 20% training samples), respectively, overwhelming state-of-the-art methods.


2020 ◽  
Vol 12 (4) ◽  
pp. 729 ◽  
Author(s):  
Ruchan Dong ◽  
Dazhuan Xu ◽  
Lichen Jiao ◽  
Jin Zhao ◽  
Jungang An

Current scene classification for high-resolution remote sensing images usually uses deep convolutional neural networks (DCNN) to extract extensive features and adopts support vector machine (SVM) as classifier. DCNN can well exploit deep features but ignore valuable shallow features like texture and directional information; and SVM can hardly train a large amount of samples in an efficient way. This paper proposes a fast deep perception network (FDPResnet) that integrates DCNN and Broad Learning System (BLS), a novel effective learning system, to extract both deep and shallow features and encapsulates a designed DPModel to fuse the two kinds of features. FDPResnet first extracts the shallow and the deep scene features of a remote sensing image through a pre-trained model on residual neural network-101 (Resnet101). Then, it inputs the two kinds of features into a designed deep perception module (DPModel) to obtain a new set of feature vectors that can describe both higher-level semantic and lower-level space information of the image. The DPModel is the key module responsible for dimension reduction and feature fusion. Finally, the obtained new feature vector is input into BLS for training and classification, and we can obtain a satisfactory classification result. A series of experiments are conducted on the challenging NWPU-RESISC45 remote sensing image dataset, and the results demonstrate that our approach outperforms some popular state-of-the-art deep learning methods, and present high-accurate scene classification within a shorter running time.


2019 ◽  
Vol 11 (17) ◽  
pp. 1996 ◽  
Author(s):  
Zhu ◽  
Yan ◽  
Mo ◽  
Liu

Scene classification of highresolution remote sensing images (HRRSI) is one of the most important means of landcover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intraclass diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradientweighted Class Activation Mapping (GradCAM), a multiplicative fusion of deep features and the centerbased cross-entropy loss function. First of all, we propose to make attention maps generated by GradCAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the centerbased cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intraclass diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios.


2020 ◽  
Vol 12 (4) ◽  
pp. 681
Author(s):  
Yunsheng Xiong ◽  
Xin Niu ◽  
Yong Dou ◽  
Hang Qie ◽  
Kang Wang

Aircraft recognition has great application value, but aircraft in remote sensing images have some problems such as low resolution, poor contrasts, poor sharpness, and lack of details caused by the vertical view, which make the aircraft recognition very difficult. Especially when there are many kinds of aircraft and the differences between aircraft are subtle, the fine-grained recognition of aircraft is more challenging. In this paper, we propose a non-locally enhanced feature fusion network(NLFFNet) and attempt to make full use of the features from discriminative parts of aircraft. First, according to the long-distance self-correlation in aircraft images, we adopt non-locally enhanced operation and guide the network to pay more attention to the discriminating areas and enhance the features beneficial to classification. Second, we propose a part-level feature fusion mechanism(PFF), which crops 5 parts of the aircraft on the shared feature maps, then extracts the subtle features inside the parts through the part full connection layer(PFC) and fuses the features of these parts together through the combined full connection layer(CFC). In addition, by adopting the improved loss function, we can enhance the weight of hard examples in the loss function meanwhile reducing the weight of excessively hard examples, which improves the overall recognition ability of the network. The dataset includes 47 categories of aircraft, including many aircraft of the same family with slight differences in appearance, and our method can achieve 89.12% accuracy on the test dataset, which proves the effectiveness of our method.


Sign in / Sign up

Export Citation Format

Share Document