Weighted Spatial Pyramid Matching Collaborative Representation for Remote-Sensing-Image Scene Classification

Bao-Di Liu; Jie Meng; Wen-Yang Xie; Shuai Shao; Ye Li; Yanjiang Wang

doi:10.3390/rs11050518

Weighted Spatial Pyramid Matching Collaborative Representation for Remote-Sensing-Image Scene Classification

Remote Sensing ◽

10.3390/rs11050518 ◽

2019 ◽

Vol 11 (5) ◽

pp. 518 ◽

Cited By ~ 13

Author(s):

Bao-Di Liu ◽

Jie Meng ◽

Wen-Yang Xie ◽

Shuai Shao ◽

Ye Li ◽

...

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Remote Sensing Image ◽

Superior Performance ◽

Collaborative Representation ◽

Scene Classification ◽

Spatial Pyramid Matching ◽

Classification Tasks ◽

Pyramid Matching ◽

Spatial Pyramid

At present, nonparametric subspace classifiers, such as collaborative representation-based classification (CRC) and sparse representation-based classification (SRC), are widely used in many pattern-classification and -recognition tasks. Meanwhile, the spatial pyramid matching (SPM) scheme, which considers spatial information in representing the image, is efficient for image classification. However, for SPM, the weights to evaluate the representation of different subregions are fixed. In this paper, we first introduce the spatial pyramid matching scheme to remote-sensing (RS)-image scene-classification tasks to improve performance. Then, we propose a weighted spatial pyramid matching collaborative-representation-based classification method, combining the CRC method with the weighted spatial pyramid matching scheme. The proposed method is capable of learning the weights of different subregions in representing an image. Finally, extensive experiments on several benchmark remote-sensing-image datasets were conducted and clearly demonstrate the superior performance of our proposed algorithm when compared with state-of-the-art approaches.

Get full-text (via PubEx)

A NOVEL SELF-TAUGHT LEARNING FRAMEWORK USING SPATIAL PYRAMID MATCHING FOR SCENE CLASSIFICATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-725-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 725-729

Author(s):

Y. Yang ◽

D. Zhu ◽

F. Ren ◽

C. Cheng

Keyword(s):

Remote Sensing ◽

Classification Accuracy ◽

Earth Observation ◽

Data Sets ◽

Scene Classification ◽

Data Set ◽

Spatial Pyramid Matching ◽

Learning Framework ◽

Pyramid Matching ◽

Spatial Pyramid

Abstract. Remote sensing earth observation images have a wide range of applications in areas like urban planning, agriculture, environment monitoring, etc. While the industrial world benefits from availability of high resolution earth observation images since recent years, interpreting such images has become more challenging than ever. Among many machine learning based methods that have worked out successfully in remote sensing scene classification, spatial pyramid matching using sparse coding (ScSPM) is a classical model that has achieved promising classification accuracy on many benchmark data sets. ScSPM is a three-stage algorithm, composed of dictionary learning, sparse representation and classification. It is generally believed that in the dictionary learning stage, although unsupervised, one should use the same data set as classification stage to get good results. However, recent studies in transfer learning suggest that it might be a better strategy to train the dictionary on a larger data set different from the one to classify.In our work, we propose an algorithm that combines ScSPM with self-taught learning, a transfer learning framework that trains a dictionary on an unlabeled data set and uses it for multiple classification tasks. In the experiments, we learn the dictionary on Caltech-101 data set, and classify two remote sensing scene image data sets: UC Merced LandUse data set and Changping data set. Experimental results show that the classification accuracy of proposed method is compatible to that of ScSPM. Our work thus provides a new way to reduce resource cost in learning a remote sensing scene image classifier.

Get full-text (via PubEx)

Hybrid Collaborative Representation for Remote-Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs10121934 ◽

2018 ◽

Vol 10 (12) ◽

pp. 1934 ◽

Cited By ~ 11

Author(s):

Bao-Di Liu ◽

Wen-Yang Xie ◽

Jie Meng ◽

Ye Li ◽

Yanjiang Wang

Keyword(s):

Remote Sensing ◽

Test Sample ◽

Remote Sensing Image ◽

Image Features ◽

Superior Performance ◽

Collaborative Representation ◽

Great Success ◽

Specific Class ◽

Remote Sensing Images ◽

Training Samples

In recent years, the collaborative representation-based classification (CRC) method has achieved great success in visual recognition by directly utilizing training images as dictionary bases. However, it describes a test sample with all training samples to extract shared attributes and does not consider the representation of the test sample with the training samples in a specific class to extract the class-specific attributes. For remote-sensing images, both the shared attributes and class-specific attributes are important for classification. In this paper, we propose a hybrid collaborative representation-based classification approach. The proposed method is capable of improving the performance of classifying remote-sensing images by embedding the class-specific collaborative representation to conventional collaborative representation-based classification. Moreover, we extend the proposed method to arbitrary kernel space to explore the nonlinear characteristics hidden in remote-sensing image features to further enhance classification performance. Extensive experiments on several benchmark remote-sensing image datasets were conducted and clearly demonstrate the superior performance of our proposed algorithm to state-of-the-art approaches.

Get full-text (via PubEx)

A New Scene Classification Method Based on Local Gabor Features

Mathematical Problems in Engineering ◽

10.1155/2015/109718 ◽

2015 ◽

Vol 2015 ◽

pp. 1-14 ◽

Cited By ~ 2

Author(s):

Baoyu Dong ◽

Guang Ren

Keyword(s):

Classification Method ◽

Scene Classification ◽

Spatial Pyramid Matching ◽

Visual Words ◽

Feature Vectors ◽

Feature Descriptors ◽

Gabor Features ◽

Gabor Feature ◽

Pyramid Matching ◽

Spatial Pyramid

A new scene classification method is proposed based on the combination of local Gabor features with a spatial pyramid matching model. First, new local Gabor feature descriptors are extracted from dense sampling patches of scene images. These local feature descriptors are embedded into a bag-of-visual-words (BOVW) model, which is combined with a spatial pyramid matching framework. The new local Gabor feature descriptors have sufficient discrimination abilities for dense regions of scene images. Then the efficient feature vectors of scene images can be obtained byK-means clustering method and visual word statistics. Second, in order to decrease classification time and improve accuracy, an improved kernel principal component analysis (KPCA) method is applied to reduce the dimensionality of pyramid histogram of visual words (PHOW). The principal components with the bigger interclass separability are retained in feature vectors, which are used for scene classification by the linear support vector machine (SVM) method. The proposed method is evaluated on three commonly used scene datasets. Experimental results demonstrate the effectiveness of the method.

Get full-text (via PubEx)

Multi-Label Remote Sensing Image Scene Classification by Combining a Convolutional Neural Network and a Graph Neural Network

Remote Sensing ◽

10.3390/rs12234003 ◽

2020 ◽

Vol 12 (23) ◽

pp. 4003

Author(s):

Yansheng Li ◽

Ruixian Chen ◽

Yongjun Zhang ◽

Mi Zhang ◽

Ling Chen

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Convolutional Neural Network ◽

Remote Sensing Image ◽

Superior Performance ◽

Human Beings ◽

Scene Classification ◽

Scene Graph ◽

Visual Elements ◽

Topological Relationships

As one of the fundamental tasks in remote sensing (RS) image understanding, multi-label remote sensing image scene classification (MLRSSC) is attracting increasing research interest. Human beings can easily perform MLRSSC by examining the visual elements contained in the scene and the spatio-topological relationships of these visual elements. However, most of existing methods are limited by only perceiving visual elements but disregarding the spatio-topological relationships of visual elements. With this consideration, this paper proposes a novel deep learning-based MLRSSC framework by combining convolutional neural network (CNN) and graph neural network (GNN), which is termed the MLRSSC-CNN-GNN. Specifically, the CNN is employed to learn the perception ability of visual elements in the scene and generate the high-level appearance features. Based on the trained CNN, one scene graph for each scene is further constructed, where nodes of the graph are represented by superpixel regions of the scene. To fully mine the spatio-topological relationships of the scene graph, the multi-layer-integration graph attention network (GAT) model is proposed to address MLRSSC, where the GAT is one of the latest developments in GNN. Extensive experiments on two public MLRSSC datasets show that the proposed MLRSSC-CNN-GNN can obtain superior performance compared with the state-of-the-art methods.

Get full-text (via PubEx)

AMN: Attention Metric Network for One-Shot Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs12244046 ◽

2020 ◽

Vol 12 (24) ◽

pp. 4046

Author(s):

Xirong Li ◽

Fangling Pu ◽

Rui Yang ◽

Rong Gui ◽

Xin Xu

Keyword(s):

Remote Sensing ◽

Classification Problem ◽

Remote Sensing Image ◽

Similarity Measurement ◽

Scene Classification ◽

Feature Maps ◽

Training Strategy ◽

Measurement Results ◽

Classification Tasks ◽

The Cost

In recent years, deep neural network (DNN) based scene classification methods have achieved promising performance. However, the data-driven training strategy requires a large number of labeled samples, making the DNN-based methods unable to solve the scene classification problem in the case of a small number of labeled images. As the number and variety of scene images continue to grow, the cost and difficulty of manual annotation also increase. Therefore, it is significant to deal with the scene classification problem with only a few labeled samples. In this paper, we propose an attention metric network (AMN) in the framework of the few-shot learning (FSL) to improve the performance of one-shot scene classification. AMN is composed of a self-attention embedding network (SAEN) and a cross-attention metric network (CAMN). In SAEN, we adopt the spatial attention and the channel attention of feature maps to obtain abundant features of scene images. In CAMN, we propose a novel cross-attention mechanism which can highlight the features that are more concerned about different categories, and improve the similarity measurement performance. A loss function combining mean square error (MSE) loss with multi-class N-pair loss is developed, which helps to promote the intra-class similarity and inter-class variance of embedding features, and also improve the similarity measurement results. Experiments on the NWPU-RESISC45 dataset and the RSD-WHU46 dataset demonstrate that our method achieves the state-of-the-art results on one-shot remote sensing image scene classification tasks.

Get full-text (via PubEx)

Indoor Scene recognition for Micro Aerial Vehicles Navigation using Enhanced SIFT-ScSPM Descriptors

Journal of Navigation ◽

10.1017/s0373463319000420 ◽

2019 ◽

Vol 73 (1) ◽

pp. 37-55 ◽

Cited By ~ 1

Author(s):

B. Anbarasu ◽

G. Anitha

Keyword(s):

State Of The Art ◽

Scene Recognition ◽

Support Vector ◽

Scene Classification ◽

Spatial Pyramid Matching ◽

Speeded Up Robust Features ◽

Indoor Scene ◽

Visual Descriptors ◽

Pyramid Matching ◽

Spatial Pyramid

In this paper, a new scene recognition visual descriptor called Enhanced Scale Invariant Feature Transform-based Sparse coding Spatial Pyramid Matching (Enhanced SIFT-ScSPM) descriptor is proposed by combining a Bag of Words (BOW)-based visual descriptor (SIFT-ScSPM) and Gist-based descriptors (Enhanced Gist-Enhanced multichannel Gist (Enhanced mGist)). Indoor scene classification is carried out by multi-class linear and non-linear Support Vector Machine (SVM) classifiers. Feature extraction methodology and critical review of several visual descriptors used for indoor scene recognition in terms of experimental perspectives have been discussed in this paper. An empirical study is conducted on the Massachusetts Institute of Technology (MIT) 67 indoor scene classification data set and assessed the classification accuracy of state-of-the-art visual descriptors and the proposed Enhanced mGist, Speeded Up Robust Features-Spatial Pyramid Matching (SURF-SPM) and Enhanced SIFT-ScSPM visual descriptors. Experimental results show that the proposed Enhanced SIFT-ScSPM visual descriptor performs better with higher classification rate, precision, recall and area under the Receiver Operating Characteristic (ROC) curve values with respect to the state-of-the-art and the proposed Enhanced mGist and SURF-SPM visual descriptors.

Get full-text (via PubEx)

A New Scene Classification Method Based on Spatial Pyramid Matching Model

Journal of Information and Computational Science ◽

10.12733/jics20104480 ◽

2015 ◽

Vol 12 (3) ◽

pp. 1073-1080

Author(s):

Baoyu Dong

Keyword(s):

Classification Method ◽

Scene Classification ◽

Matching Model ◽

Spatial Pyramid Matching ◽

Pyramid Matching ◽

Spatial Pyramid

Get full-text (via PubEx)

Remote Sensing Image Scene Classification Using CNN-CapsNet

Remote Sensing ◽

10.3390/rs11050494 ◽

2019 ◽

Vol 11 (5) ◽

pp. 494 ◽

Cited By ~ 45

Author(s):

Wei Zhang ◽

Ping Tang ◽

Lijun Zhao

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Network Architecture ◽

Spatial Information ◽

Feature Learning ◽

Remote Sensing Image ◽

Classification Performance ◽

Scene Classification ◽

Feature Maps ◽

Fully Connected

Remote sensing image scene classification is one of the most challenging problems in understanding high-resolution remote sensing images. Deep learning techniques, especially the convolutional neural network (CNN), have improved the performance of remote sensing image scene classification due to the powerful perspective of feature learning and reasoning. However, several fully connected layers are always added to the end of CNN models, which is not efficient in capturing the hierarchical structure of the entities in the images and does not fully consider the spatial information that is important to classification. Fortunately, capsule network (CapsNet), which is a novel network architecture that uses a group of neurons as a capsule or vector to replace the neuron in the traditional neural network and can encode the properties and spatial information of features in an image to achieve equivariance, has become an active area in the classification field in the past two years. Motivated by this idea, this paper proposes an effective remote sensing image scene classification architecture named CNN-CapsNet to make full use of the merits of these two models: CNN and CapsNet. First, a CNN without fully connected layers is used as an initial feature maps extractor. In detail, a pretrained deep CNN model that was fully trained on the ImageNet dataset is selected as a feature extractor in this paper. Then, the initial feature maps are fed into a newly designed CapsNet to obtain the final classification result. The proposed architecture is extensively evaluated on three public challenging benchmark remote sensing image datasets: the UC Merced Land-Use dataset with 21 scene categories, AID dataset with 30 scene categories, and the NWPU-RESISC45 dataset with 45 challenging scene categories. The experimental results demonstrate that the proposed method can lead to a competitive classification performance compared with the state-of-the-art methods.

Get full-text (via PubEx)

Semantic Multigranularity Feature Learning for High-Resolution Remote Sensing Image Scene Classification

Applied Sciences ◽

10.3390/app11199204 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9204

Author(s):

Xinyi Ma ◽

Zhifeng Xiao ◽

Hong-sik Yun ◽

Seung-Jun Lee

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Spatial Information ◽

Feature Learning ◽

Remote Sensing Image ◽

Input Image ◽

Training Data ◽

Aerial Image ◽

Scene Classification ◽

Feature Maps

High-resolution remote sensing image scene classification is a challenging visual task due to the large intravariance and small intervariance between the categories. To accurately recognize the scene categories, it is essential to learn discriminative features from both global and local critical regions. Recent efforts focus on how to encourage the network to learn multigranularity features with the destruction of the spatial information on the input image at different scales, which leads to meaningless edges that are harmful to training. In this study, we propose a novel method named Semantic Multigranularity Feature Learning Network (SMGFL-Net) for remote sensing image scene classification. The core idea is to learn both global and multigranularity local features from rearranged intermediate feature maps, thus, eliminating the meaningless edges. These features are then fused for the final prediction. Our proposed framework is compared with a collection of state-of-the-art (SOTA) methods on two fine-grained remote sensing image scene datasets, including the NWPU-RESISC45 and Aerial Image Datasets (AID). We justify several design choices, including the branch granularities, fusion strategies, pooling operations, and necessity of feature map rearrangement through a comparative study. Moreover, the overall performance results show that SMGFL-Net consistently outperforms other peer methods in classification accuracy, and the superiority is more apparent with less training data, demonstrating the efficacy of feature learning of our approach.

Get full-text (via PubEx)