RS-DARTS: A Convolutional Neural Architecture Search for Remote Sensing Image Scene Classification

Due to the superiority of convolutional neural networks, many deep learning methods have been used in image classification. The enormous difference between natural images and remote sensing images makes it difficult to directly utilize or modify existing CNN models for remote sensing scene classification tasks. In this article, a new paradigm is proposed that can automatically design a suitable CNN architecture for scene classification. A more efficient search framework, RS-DARTS, is adopted to find the optimal network architecture. This framework has two phases. In the search phase, some new strategies are presented, making the calculation process smoother, and better distinguishing the optimal and other operations. In addition, we added noise to suppress skip connections in order to close the gap between trained and validation processing and ensure classification accuracy. Moreover, a small part of the neural network is sampled to reduce the redundancy in exploring the network space and speed up the search processing. In the evaluation phase, the optimal cell architecture is stacked to construct the final network. Extensive experiments demonstrated the validity of the search strategy and the impressive classification performance of RS-DARTS on four public benchmark datasets. The proposed method showed more effectiveness than the manually designed CNN model and other methods of neural architecture search. Especially, in terms of search cost, RS-DARTS consumed less time than other NAS methods.

Download Full-text

Deep Discriminative Representation Learning with Attention Map for Scene Classification

Remote Sensing ◽

10.3390/rs12091366 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1366 ◽

Cited By ~ 5

Author(s):

Jun Li ◽

Daoyu Lin ◽

Yang Wang ◽

Guangluan Xu ◽

Yunyan Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Representation Learning ◽

Classification Performance ◽

Great Success ◽

Scene Classification ◽

Remote Sensing Images ◽

Discriminative Ability ◽

Feature Representations ◽

Benchmark Datasets

In recent years, convolutional neural networks (CNNs) have shown great success in the scene classification of computer vision images. Although these CNNs can achieve excellent classification accuracy, the discriminative ability of feature representations extracted from CNNs is still limited in distinguishing more complex remote sensing images. Therefore, we propose a unified feature fusion framework based on attention mechanism in this paper, which is called Deep Discriminative Representation Learning with Attention Map (DDRL-AM). Firstly, by applying Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, attention maps associated with the predicted results are generated in order to make CNNs focus on the most salient parts of the image. Secondly, a spatial feature transformer (SFT) is designed to extract discriminative features from attention maps. Then an innovative two-channel CNN architecture is proposed by the fusion of features extracted from attention maps and the RGB (red green blue) stream. A new objective function that considers both center and cross-entropy loss are optimized to decrease the influence of inter-class dispersion and within-class variance. In order to show its effectiveness in classifying remote sensing images, the proposed DDRL-AM method is evaluated on four public benchmark datasets. The experimental results demonstrate the competitive scene classification performance of the DDRL-AM approach. Moreover, the visualization of features extracted by the proposed DDRL-AM method can prove that the discriminative ability of features has been increased.

Download Full-text

Convolutional Neural Networks with Deep Supervised Feature Learning for Remote Sensing Scene Classification

10.20944/preprints202008.0113.v1 ◽

2020 ◽

Author(s):

Grigorios Tsagkatakis ◽

Panagiotis Tsakalides

Keyword(s):

Remote Sensing ◽

Feature Learning ◽

Ground Truth ◽

Classification Performance ◽

Cross Entropy ◽

Scene Classification ◽

Feature Representations ◽

Benchmark Datasets ◽

Low Dimensional ◽

Fully Connected

State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.

Download Full-text

Remote Sensing Image Scene Classification Using CNN-CapsNet

Remote Sensing ◽

10.3390/rs11050494 ◽

2019 ◽

Vol 11 (5) ◽

pp. 494 ◽

Cited By ~ 45

Author(s):

Wei Zhang ◽

Ping Tang ◽

Lijun Zhao

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Network Architecture ◽

Spatial Information ◽

Feature Learning ◽

Remote Sensing Image ◽

Classification Performance ◽

Scene Classification ◽

Feature Maps ◽

Fully Connected

Remote sensing image scene classification is one of the most challenging problems in understanding high-resolution remote sensing images. Deep learning techniques, especially the convolutional neural network (CNN), have improved the performance of remote sensing image scene classification due to the powerful perspective of feature learning and reasoning. However, several fully connected layers are always added to the end of CNN models, which is not efficient in capturing the hierarchical structure of the entities in the images and does not fully consider the spatial information that is important to classification. Fortunately, capsule network (CapsNet), which is a novel network architecture that uses a group of neurons as a capsule or vector to replace the neuron in the traditional neural network and can encode the properties and spatial information of features in an image to achieve equivariance, has become an active area in the classification field in the past two years. Motivated by this idea, this paper proposes an effective remote sensing image scene classification architecture named CNN-CapsNet to make full use of the merits of these two models: CNN and CapsNet. First, a CNN without fully connected layers is used as an initial feature maps extractor. In detail, a pretrained deep CNN model that was fully trained on the ImageNet dataset is selected as a feature extractor in this paper. Then, the initial feature maps are fed into a newly designed CapsNet to obtain the final classification result. The proposed architecture is extensively evaluated on three public challenging benchmark remote sensing image datasets: the UC Merced Land-Use dataset with 21 scene categories, AID dataset with 30 scene categories, and the NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that the proposed method can lead to a competitive classification performance compared with the state-of-the-art methods.

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

Efficient Convolutional Neural Architecture Search for Remote Sensing Image Scene Classification

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2020.3020424 ◽

2020 ◽

pp. 1-14

Author(s):

Cheng Peng ◽

Yangyang Li ◽

Licheng Jiao ◽

Ronghua Shang

Keyword(s):

Remote Sensing ◽

Remote Sensing Image ◽

Scene Classification ◽

Neural Architecture

Download Full-text

An Efficient and Lightweight Convolutional Neural Network for Remote Sensing Image Scene Classification

Sensors ◽

10.3390/s20071999 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1999 ◽

Cited By ~ 6

Author(s):

Donghang Yu ◽

Qing Xu ◽

Haitao Guo ◽

Chuan Zhao ◽

Yuzhun Lin ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Visual Recognition ◽

Feature Fusion ◽

Remote Sensing Image ◽

Classification Performance ◽

Image Features ◽

Training Dataset ◽

Scene Classification

Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.

Download Full-text

A Lightweight Convolutional Neural Network Based on Group-Wise Hybrid Attention for Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs14010161 ◽

2021 ◽

Vol 14 (1) ◽

pp. 161

Author(s):

Cuiping Shi ◽

Xinlei Zhang ◽

Jingwei Sun ◽

Liguo Wang

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Spatial Attention ◽

Classification Performance ◽

Model Parameters ◽

Scene Classification ◽

Spatial Dimensions ◽

Work First ◽

Channel Dimension

With the development of computer vision, attention mechanisms have been widely studied. Although the introduction of an attention module into a network model can help to improve e classification performance on remote sensing scene images, the direct introduction of an attention module can increase the number of model parameters and amount of calculation, resulting in slower model operations. To solve this problem, we carried out the following work. First, a channel attention module and spatial attention module were constructed. The input features were enhanced through channel attention and spatial attention separately, and the features recalibrated by the attention modules were fused to obtain the features with hybrid attention. Then, to reduce the increase in parameters caused by the attention module, a group-wise hybrid attention module was constructed. The group-wise hybrid attention module divided the input features into four groups along the channel dimension, then used the hybrid attention mechanism to enhance the features in the channel and spatial dimensions for each group, then fused the features of the four groups along the channel dimension. Through the use of the group-wise hybrid attention module, the number of parameters and computational burden of the network were greatly reduced, and the running time of the network was shortened. Finally, a lightweight convolutional neural network was constructed based on the group-wise hybrid attention (LCNN-GWHA) for remote sensing scene image classification. Experiments on four open and challenging remote sensing scene datasets demonstrated that the proposed method has great advantages, in terms of classification accuracy, even with a very low number of parameters.

Download Full-text

Sharing Residual Units Through Collective Tensor Factorization To Improve Deep Neural Networks

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/88 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yunpeng Chen ◽

Xiaojie Jin ◽

Bingyi Kang ◽

Jiashi Feng ◽

Shuicheng Yan

Keyword(s):

Neural Networks ◽

Network Architecture ◽

Deep Neural Networks ◽

Tensor Decomposition ◽

Classification Performance ◽

Model Parameters ◽

Tensor Factorization ◽

Unified Framework ◽

Benchmark Datasets ◽

Basic Network

The residual unit and its variations are wildly used in building very deep neural networks for alleviating optimization difficulty. In this work, we revisit the standard residual function as well as its several successful variants and propose a unified framework based on tensor Block Term Decomposition (BTD) to explain these apparently different residual functions from the tensor decomposition view. With the BTD framework, we further propose a novel basic network architecture, named the Collective Residual Unit (CRU). CRU further enhances parameter efficiency of deep residual neural networks by sharing core factors derived from collective tensor factorization over the involved residual units. It enables efficient knowledge sharing across multiple residual units, reduces the number of model parameters, lowers the risk of over-fitting, and provides better generalization ability. Extensive experimental results show that our proposed CRU network brings outstanding parameter efficiency -- it achieves comparable classification performance with ResNet-200 while using a model size as small as ResNet-50 on the ImageNet-1k and Places365-Standard benchmark datasets.

Download Full-text

A transformer-based approach to irony and sarcasm detection

Neural Computing and Applications ◽

10.1007/s00521-020-05102-3 ◽

2020 ◽

Vol 32 (23) ◽

pp. 17309-17320

Author(s):

Rolandos Alexandros Potamias ◽

Georgios Siolas ◽

Andreas - Georgios Stafylopatis

Keyword(s):

Neural Network ◽

Language Processing ◽

Network Architecture ◽

Figurative Language ◽

State Of The Art ◽

Unresolved Issue ◽

Discussion Forums ◽

Large Margin ◽

Neural Architecture ◽

Benchmark Datasets

AbstractFigurative language (FL) seems ubiquitous in all social media discussion forums and chats, posing extra challenges to sentiment analysis endeavors. Identification of FL schemas in short texts remains largely an unresolved issue in the broader field of natural language processing, mainly due to their contradictory and metaphorical meaning content. The main FL expression forms are sarcasm, irony and metaphor. In the present paper, we employ advanced deep learning methodologies to tackle the problem of identifying the aforementioned FL forms. Significantly extending our previous work (Potamias et al., in: International conference on engineering applications of neural networks, Springer, Berlin, pp 164–175, 2019), we propose a neural network methodology that builds on a recently proposed pre-trained transformer-based network architecture which is further enhanced with the employment and devise of a recurrent convolutional neural network. With this setup, data preprocessing is kept in minimum. The performance of the devised hybrid neural architecture is tested on four benchmark datasets, and contrasted with other relevant state-of-the-art methodologies and systems. Results demonstrate that the proposed methodology achieves state-of-the-art performance under all benchmark datasets, outperforming, even by a large margin, all other methodologies and published studies.

Download Full-text

An Attention-Guided Multilayer Feature Aggregation Network for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13163113 ◽

2021 ◽

Vol 13 (16) ◽

pp. 3113

Author(s):

Ming Li ◽

Lin Lei ◽

Yuqi Tang ◽

Yuli Sun ◽

Gangyao Kuang

Keyword(s):

Remote Sensing ◽

Feature Learning ◽

Remote Sensing Image ◽

Classification Performance ◽

Learning Ability ◽

Scene Classification ◽

Feature Maps ◽

Feature Aggregation ◽

Scene Representation ◽

High Level

Remote sensing image scene classification (RSISC) has broad application prospects, but related challenges still exist and urgently need to be addressed. One of the most important challenges is how to learn a strong discriminative scene representation. Recently, convolutional neural networks (CNNs) have shown great potential in RSISC due to their powerful feature learning ability; however, their performance may be restricted by the complexity of remote sensing images, such as spatial layout, varying scales, complex backgrounds, category diversity, etc. In this paper, we propose an attention-guided multilayer feature aggregation network (AGMFA-Net) that attempts to improve the scene classification performance by effectively aggregating features from different layers. Specifically, to reduce the discrepancies between different layers, we employed the channel–spatial attention on multiple high-level convolutional feature maps to capture more accurately semantic regions that correspond to the content of the given scene. Then, we utilized the learned semantic regions as guidance to aggregate the valuable information from multilayer convolutional features, so as to achieve stronger scene features for classification. Experimental results on three remote sensing scene datasets indicated that our approach achieved competitive classification performance in comparison to the baselines and other state-of-the-art methods.

Download Full-text