Remote Sensing Image Scene Classification Using CNN-CapsNet

Remote sensing image scene classification is one of the most challenging problems in understanding high-resolution remote sensing images. Deep learning techniques, especially the convolutional neural network (CNN), have improved the performance of remote sensing image scene classification due to the powerful perspective of feature learning and reasoning. However, several fully connected layers are always added to the end of CNN models, which is not efficient in capturing the hierarchical structure of the entities in the images and does not fully consider the spatial information that is important to classification. Fortunately, capsule network (CapsNet), which is a novel network architecture that uses a group of neurons as a capsule or vector to replace the neuron in the traditional neural network and can encode the properties and spatial information of features in an image to achieve equivariance, has become an active area in the classification field in the past two years. Motivated by this idea, this paper proposes an effective remote sensing image scene classification architecture named CNN-CapsNet to make full use of the merits of these two models: CNN and CapsNet. First, a CNN without fully connected layers is used as an initial feature maps extractor. In detail, a pretrained deep CNN model that was fully trained on the ImageNet dataset is selected as a feature extractor in this paper. Then, the initial feature maps are fed into a newly designed CapsNet to obtain the final classification result. The proposed architecture is extensively evaluated on three public challenging benchmark remote sensing image datasets: the UC Merced Land-Use dataset with 21 scene categories, AID dataset with 30 scene categories, and the NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that the proposed method can lead to a competitive classification performance compared with the state-of-the-art methods.

Download Full-text

An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13163113 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3113

Author(s):

Ming Li ◽

Lin Lei ◽

Yuqi Tang ◽

Yuli Sun ◽

Gangyao Kuang

Keyword(s):

Remote Sensing ◽

Feature Learning ◽

Remote Sensing Image ◽

Classification Performance ◽

Learning Ability ◽

Scene Classification ◽

Feature Maps ◽

Feature Aggregation ◽

Scene Representation ◽

High Level

Remote sensing image scene classification (RSISC) has broad application prospects, but related challenges still exist and urgently need to be addressed. One of the most important challenges is how to learn a strong discriminative scene representation. Recently, convolutional neural networks (CNNs) have shown great potential in RSISC due to their powerful feature learning ability; however, their performance may be restricted by the complexity of remote sensing images, such as spatial layout, varying scales, complex backgrounds, category diversity, etc. In this paper, we propose an attention-guided multilayer feature aggregation network (AGMFA-Net) that attempts to improve the scene classification performance by effectively aggregating features from different layers. Specifically, to reduce the discrepancies between different layers, we employed the channel–spatial attention on multiple high-level convolutional feature maps to capture more accurately semantic regions that correspond to the content of the given scene. Then, we utilized the learned semantic regions as guidance to aggregate the valuable information from multilayer convolutional features, so as to achieve stronger scene features for classification. Experimental results on three remote sensing scene datasets indicated that our approach achieved competitive classification performance in comparison to the baselines and other state-of-the-art methods.

Download Full-text

Semantic Multigranularity Feature Learning for High-Resolution Remote Sensing Image Scene Classification

Applied Sciences ◽

10.3390/app11199204 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9204

Author(s):

Xinyi Ma ◽

Zhifeng Xiao ◽

Hong-sik Yun ◽

Seung-Jun Lee

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Spatial Information ◽

Feature Learning ◽

Remote Sensing Image ◽

Input Image ◽

Training Data ◽

Aerial Image ◽

Scene Classification ◽

Feature Maps

High-resolution remote sensing image scene classification is a challenging visual task due to the large intravariance and small intervariance between the categories. To accurately recognize the scene categories, it is essential to learn discriminative features from both global and local critical regions. Recent efforts focus on how to encourage the network to learn multigranularity features with the destruction of the spatial information on the input image at different scales, which leads to meaningless edges that are harmful to training. In this study, we propose a novel method named Semantic Multigranularity Feature Learning Network (SMGFL-Net) for remote sensing image scene classification. The core idea is to learn both global and multigranularity local features from rearranged intermediate feature maps, thus, eliminating the meaningless edges. These features are then fused for the final prediction. Our proposed framework is compared with a collection of state-of-the-art (SOTA) methods on two fine-grained remote sensing image scene datasets, including the NWPU-RESISC45 and Aerial Image Datasets (AID). We justify several design choices, including the branch granularities, fusion strategies, pooling operations, and necessity of feature map rearrangement through a comparative study. Moreover, the overall performance results show that SMGFL-Net consistently outperforms other peer methods in classification accuracy, and the superiority is more apparent with less training data, demonstrating the efficacy of feature learning of our approach.

Download Full-text

An Efficient and Lightweight Convolutional Neural Network for Remote Sensing Image Scene Classification

Sensors ◽

10.3390/s20071999 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1999 ◽

Cited By ~ 6

Author(s):

Donghang Yu ◽

Qing Xu ◽

Haitao Guo ◽

Chuan Zhao ◽

Yuzhun Lin ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Visual Recognition ◽

Feature Fusion ◽

Remote Sensing Image ◽

Classification Performance ◽

Image Features ◽

Training Dataset ◽

Scene Classification

Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.

Download Full-text

Convolutional Neural Networks with Deep Supervised Feature Learning for Remote Sensing Scene Classification

10.20944/preprints202008.0113.v1 ◽

2020 ◽

Author(s):

Grigorios Tsagkatakis ◽

Panagiotis Tsakalides

Keyword(s):

Remote Sensing ◽

Feature Learning ◽

Ground Truth ◽

Classification Performance ◽

Cross Entropy ◽

Scene Classification ◽

Feature Representations ◽

Benchmark Datasets ◽

Low Dimensional ◽

Fully Connected

State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

Weighted Spatial Pyramid Matching Collaborative Representation for Remote-Sensing-Image Scene Classification

Remote Sensing ◽

10.3390/rs11050518 ◽

2019 ◽

Vol 11 (5) ◽

pp. 518 ◽

Cited By ~ 13

Author(s):

Bao-Di Liu ◽

Jie Meng ◽

Wen-Yang Xie ◽

Shuai Shao ◽

Ye Li ◽

...

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Remote Sensing Image ◽

Superior Performance ◽

Collaborative Representation ◽

Scene Classification ◽

Spatial Pyramid Matching ◽

Classification Tasks ◽

Pyramid Matching ◽

Spatial Pyramid

At present, nonparametric subspace classifiers, such as collaborative representation-based classification (CRC) and sparse representation-based classification (SRC), are widely used in many pattern-classification and -recognition tasks. Meanwhile, the spatial pyramid matching (SPM) scheme, which considers spatial information in representing the image, is efficient for image classification. However, for SPM, the weights to evaluate the representation of different subregions are fixed. In this paper, we first introduce the spatial pyramid matching scheme to remote-sensing (RS)-image scene-classification tasks to improve performance. Then, we propose a weighted spatial pyramid matching collaborative-representation-based classification method, combining the CRC method with the weighted spatial pyramid matching scheme. The proposed method is capable of learning the weights of different subregions in representing an image. Finally, extensive experiments on several benchmark remote-sensing-image datasets were conducted and clearly demonstrate the superior performance of our proposed algorithm when compared with state-of-the-art approaches.

Download Full-text

Convolutional Neural Network for Remote-Sensing Scene Classification: Transfer Learning Analysis

Remote Sensing ◽

10.3390/rs12010086 ◽

2019 ◽

Vol 12 (1) ◽

pp. 86 ◽

Cited By ~ 14

Author(s):

Rafael Pires de Lima ◽

Kurt Marfurt

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Network Models ◽

Remote Sensing Image ◽

Natural Images ◽

Natural Image ◽

Scene Classification ◽

Neural Network Models

Remote-sensing image scene classification can provide significant value, ranging from forest fire monitoring to land-use and land-cover classification. Beginning with the first aerial photographs of the early 20th century to the satellite imagery of today, the amount of remote-sensing data has increased geometrically with a higher resolution. The need to analyze these modern digital data motivated research to accelerate remote-sensing image classification. Fortunately, great advances have been made by the computer vision community to classify natural images or photographs taken with an ordinary camera. Natural image datasets can range up to millions of samples and are, therefore, amenable to deep-learning techniques. Many fields of science, remote sensing included, were able to exploit the success of natural image classification by convolutional neural network models using a technique commonly called transfer learning. We provide a systematic review of transfer learning application for scene classification using different datasets and different deep-learning models. We evaluate how the specialization of convolutional neural network models affects the transfer learning process by splitting original models in different points. As expected, we find the choice of hyperparameters used to train the model has a significant influence on the final performance of the models. Curiously, we find transfer learning from models trained on larger, more generic natural images datasets outperformed transfer learning from models trained directly on smaller remotely sensed datasets. Nonetheless, results show that transfer learning provides a powerful tool for remote-sensing scene classification.

Download Full-text

Deep convolutional neural network structure design for remote sensing image scene classification based on transfer learning

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/569/1/012046 ◽

2020 ◽

Vol 569 ◽

pp. 012046

Author(s):

Xiaoxia Zhang ◽

Yong Guo ◽

Xia Zhang

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Network Structure ◽

Remote Sensing Image ◽

Structure Design ◽

Deep Convolutional Neural Network ◽

Scene Classification ◽

Neural Network Structure

Download Full-text

Multi-Label Remote Sensing Image Scene Classification by Combining a Convolutional Neural Network and a Graph Neural Network

Remote Sensing ◽

10.3390/rs12234003 ◽

2020 ◽

Vol 12 (23) ◽

pp. 4003

Author(s):

Yansheng Li ◽

Ruixian Chen ◽

Yongjun Zhang ◽

Mi Zhang ◽

Ling Chen

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Remote Sensing Image ◽

Superior Performance ◽

Human Beings ◽

Scene Classification ◽

Scene Graph ◽

Visual Elements ◽

Topological Relationships

As one of the fundamental tasks in remote sensing (RS) image understanding, multi-label remote sensing image scene classification (MLRSSC) is attracting increasing research interest. Human beings can easily perform MLRSSC by examining the visual elements contained in the scene and the spatio-topological relationships of these visual elements. However, most of existing methods are limited by only perceiving visual elements but disregarding the spatio-topological relationships of visual elements. With this consideration, this paper proposes a novel deep learning-based MLRSSC framework by combining convolutional neural network (CNN) and graph neural network (GNN), which is termed the MLRSSC-CNN-GNN. Specifically, the CNN is employed to learn the perception ability of visual elements in the scene and generate the high-level appearance features. Based on the trained CNN, one scene graph for each scene is further constructed, where nodes of the graph are represented by superpixel regions of the scene. To fully mine the spatio-topological relationships of the scene graph, the multi-layer-integration graph attention network (GAT) model is proposed to address MLRSSC, where the GAT is one of the latest developments in GNN. Extensive experiments on two public MLRSSC datasets show that the proposed MLRSSC-CNN-GNN can obtain superior performance compared with the state-of-the-art methods.

Download Full-text

REMOTE SENSING SCENE CLASSIFICATION USING MULTIPLE PYRAMID POOLING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w16-279-2019 ◽

2019 ◽

Vol XLII-2/W16 ◽

pp. 279-284

Author(s):

Y. Yao ◽

H. Zhao ◽

D. Huang ◽

Q. Tan

Keyword(s):

Remote Sensing ◽

Spatial Relationship ◽

Image Data ◽

Remote Sensing Image ◽

Input Image ◽

Image Size ◽

Scene Classification ◽

Fixed Size ◽

Fixed Length ◽

Fully Connected

<p><strong>Abstract.</strong> Remote sensing image scene classification has gained remarkable attention, due to its versatile use in different applications like geospatial object detection, ground object information extraction, environment monitoring and etc. The scene not only contains the information of the ground objects, but also includes the spatial relationship between the ground objects and the environment. With rapid growth of the amount of remote sensing image data, the need for automatic annotation methods for image scenes is more urgent. This paper proposes a new framework for high resolution remote sensing images scene classification based on convolutional neural network. To eliminate the requirement of fixed-size input image, multiple pyramid pooling strategy is equipped between convolutional layers and fully connected layers. Then, the fixed-size features generated by multiple pyramid pooling layer was extended to one-dimension fixed-length vector and fed into fully connected layers. Our method could generate a fixed-length representation regardless of image size, at the same time get higher classification accuracy. On UC-Merced and NWPU-RESISC45 datasets, our framework achieved satisfying accuracies, which is 93.24% and 88.62% respectively.</p>

Download Full-text