Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification

Yakoub Bazi; Mohamad M. Al Rahhal; Haikel Alhichri; Naif Alajlan

doi:10.3390/rs11242908

Simple Yet Effective Fine-Tuning of Deep CNNs Using an Auxiliary Classification Loss for Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs11242908 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2908 ◽

Cited By ~ 7

Author(s):

Yakoub Bazi ◽

Mohamad M. Al Rahhal ◽

Haikel Alhichri ◽

Naif Alajlan

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Computational Cost ◽

Feature Learning ◽

Extraction Methods ◽

Fine Tuning ◽

Scene Classification ◽

Negative Effect ◽

Benchmark Datasets ◽

Low Computational Cost

The current literature of remote sensing (RS) scene classification shows that state-of-the-art results are achieved using feature extraction methods, where convolutional neural networks (CNNs) (mostly VGG16 with 138.36 M parameters) are used as feature extractors and then simple to complex handcrafted modules are added for additional feature learning and classification, thus coming back to feature engineering. In this paper, we revisit the fine-tuning approach for deeper networks (GoogLeNet and Beyond) and show that it has not been well exploited due to the negative effect of the vanishing gradient problem encountered when transferring knowledge to small datasets. The aim of this work is two-fold. Firstly, we provide best practices for fine-tuning pre-trained CNNs using the root-mean-square propagation (RMSprop) method. Secondly, we propose a simple yet effective solution for tackling the vanishing gradient problem by injecting gradients at an earlier layer of the network using an auxiliary classification loss function. Then, we fine-tune the resulting regularized network by optimizing both the primary and auxiliary losses. As for pre-trained CNNs, we consider in this work inception-based networks and EfficientNets with small weights: GoogLeNet (7 M) and EfficientNet-B0 (5.3 M) and their deeper versions Inception-v3 (23.83 M) and EfficientNet-B3 (12 M), respectively. The former networks have been used previously in the context of RS and yielded low accuracies compared to VGG16, while the latter are new state-of-the-art models. Extensive experimental results on several benchmark datasets reveal clearly that if fine-tuning is done in an appropriate way, it can settle new state-of-the-art results with low computational cost.

Download Full-text

Convolutional Neural Networks with Deep Supervised Feature Learning for Remote Sensing Scene Classification

10.20944/preprints202008.0113.v1 ◽

2020 ◽

Author(s):

Grigorios Tsagkatakis ◽

Panagiotis Tsakalides

Keyword(s):

Remote Sensing ◽

Feature Learning ◽

Ground Truth ◽

Classification Performance ◽

Cross Entropy ◽

Scene Classification ◽

Feature Representations ◽

Benchmark Datasets ◽

Low Dimensional ◽

Fully Connected

State-of-the-art remote sensing scene classification methods employ different Convolutional Neural Network architectures for achieving very high classification performance. A trait shared by the majority of these methods is that the class associated with each example is ascertained by examining the activations of the last fully connected layer, and the networks are trained to minimize the cross-entropy between predictions extracted from this layer and ground-truth annotations. In this work, we extend this paradigm by introducing an additional output branch which maps the inputs to low dimensional representations, effectively extracting additional feature representations of the inputs. The proposed model imposes additional distance constrains on these representations with respect to identified class representatives, in addition to the traditional categorical cross-entropy between predictions and ground-truth. By extending the typical cross-entropy loss function with a distance learning function, our proposed approach achieves significant gains across a wide set of benchmark datasets in terms of classification, while providing additional evidence related to class membership and classification confidence.

Download Full-text

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Wireless Communications and Mobile Computing ◽

10.1155/2021/1333250 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Shili Niu ◽

Weihua Ou ◽

Shihua Feng ◽

Jianping Gou ◽

Fei Long ◽

...

Keyword(s):

Pose Estimation ◽

State Of The Art ◽

Computational Cost ◽

Estimation Accuracy ◽

Human Pose Estimation ◽

Model Parameters ◽

Resource Limited ◽

Benchmark Datasets ◽

Human Pose ◽

Low Computational Cost

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

Download Full-text

A Multi-Branch Feature Fusion Strategy Based on an Attention Mechanism for Remote Sensing Image Scene Classification

Remote Sensing ◽

10.3390/rs13101950 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1950

Author(s):

Cuiping Shi ◽

Xin Zhao ◽

Liguo Wang

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Rapid Development ◽

Remote Sensing Image ◽

Classification Performance ◽

Attention Mechanism ◽

Scene Classification

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.

Download Full-text

Deep Discriminative Representation Learning with Attention Map for Scene Classification

Remote Sensing ◽

10.3390/rs12091366 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1366 ◽

Cited By ~ 5

Author(s):

Jun Li ◽

Daoyu Lin ◽

Yang Wang ◽

Guangluan Xu ◽

Yunyan Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Representation Learning ◽

Classification Performance ◽

Great Success ◽

Scene Classification ◽

Remote Sensing Images ◽

Discriminative Ability ◽

Feature Representations ◽

Benchmark Datasets

In recent years, convolutional neural networks (CNNs) have shown great success in the scene classification of computer vision images. Although these CNNs can achieve excellent classification accuracy, the discriminative ability of feature representations extracted from CNNs is still limited in distinguishing more complex remote sensing images. Therefore, we propose a unified feature fusion framework based on attention mechanism in this paper, which is called Deep Discriminative Representation Learning with Attention Map (DDRL-AM). Firstly, by applying Gradient-weighted Class Activation Mapping (Grad-CAM) algorithm, attention maps associated with the predicted results are generated in order to make CNNs focus on the most salient parts of the image. Secondly, a spatial feature transformer (SFT) is designed to extract discriminative features from attention maps. Then an innovative two-channel CNN architecture is proposed by the fusion of features extracted from attention maps and the RGB (red green blue) stream. A new objective function that considers both center and cross-entropy loss are optimized to decrease the influence of inter-class dispersion and within-class variance. In order to show its effectiveness in classifying remote sensing images, the proposed DDRL-AM method is evaluated on four public benchmark datasets. The experimental results demonstrate the competitive scene classification performance of the DDRL-AM approach. Moreover, the visualization of features extracted by the proposed DDRL-AM method can prove that the discriminative ability of features has been increased.

Download Full-text

Anticipating Atrial Fibrillation Signal Using Efficient Algorithm

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i02.19183 ◽

2021 ◽

Vol 17 (02) ◽

pp. 106

Author(s):

Mohand Lokman Ahmad Al-dabag ◽

Haider Th. Salim ALRikabi ◽

Raid Rafi Omar Al-Nima

Keyword(s):

Atrial Fibrillation ◽

Computational Cost ◽

Extraction Methods ◽

Support Vector ◽

Ecg Signal ◽

Heart Problem ◽

Ecg Signals ◽

The Right ◽

Electrocardiogram Ecg ◽

Low Computational Cost

One of the common types of arrhythmia is Atrial Fibrillation (AF), it may cause death to patients. Correct diagnosing of heart problem through examining the Electrocardiogram (ECG) signal will lead to prescribe the right treatment for a patient. This study proposes a system that distinguishes between the normal and AF ECG signals. First, this work provides a novel algorithm for segmenting the ECG signal for extracting a single heartbeat. The algorithm utilizes low computational cost techniques to segment the ECG signal. Then, useful pre-processing and feature extraction methods are suggested. Two classifiers, Support Vector Machine (SVM) and Multilayer Perceptron (MLP), are separately used to evaluate the two proposed algorithms. The performance of the last proposed method with the two classifiers (SVM and MLP) show an improvement of about (19% and 17%, respectively) after using the proposed segmentation method so it became 96.2% and 97.5%, respectively.

Download Full-text

Automatic Deep Feature Learning via Patch-Based Deep Belief Network for Vertebrae Segmentation in CT Images

Applied Sciences ◽

10.3390/app9010069 ◽

2018 ◽

Vol 9 (1) ◽

pp. 69 ◽

Cited By ~ 7

Author(s):

Syed Furqan Qadri ◽

Danni Ai ◽

Guoyu Hu ◽

Mubashir Ahmad ◽

Yong Huang ◽

...

Keyword(s):

Computational Cost ◽

Feature Learning ◽

Region Of Interest ◽

Ct Images ◽

Feature Reduction ◽

Fine Tuning ◽

Deep Feature ◽

Image Patches ◽

Contrastive Divergence ◽

Vertebrae Segmentation

Precise automatic vertebra segmentation in computed tomography (CT) images is important for the quantitative analysis of vertebrae-related diseases but remains a challenging task due to high variation in spinal anatomy among patients. In this paper, we propose a deep learning approach for automatic CT vertebra segmentation named patch-based deep belief networks (PaDBNs). Our proposed PaDBN model automatically selects the features from image patches and then measures the differences between classes and investigates performance. The region of interest (ROI) is obtained from CT images. Unsupervised feature reduction contrastive divergence algorithm is applied for weight initialization, and the weights are optimized by layers in a supervised fine-tuning procedure. The discriminative learning features obtained from the steps above are used as input of a classifier to obtain the likelihood of the vertebrae. Experimental results demonstrate that the proposed PaDBN model can considerably reduce computational cost and produce an excellent performance in vertebra segmentation in terms of accuracy compared with state-of-the-art methods.

Download Full-text

A New Generalized Projection and Its Application to Acceleration of Audio Declipping

Axioms ◽

10.3390/axioms8030105 ◽

2019 ◽

Vol 8 (3) ◽

pp. 105

Author(s):

Pavel Rajmic ◽

Pavel Záviška ◽

Vítězslav Veselý ◽

Ondřej Mokrý

Keyword(s):

Signal Processing ◽

Convex Optimization ◽

Linear Operator ◽

Explicit Formula ◽

Convex Sets ◽

State Of The Art ◽

Computational Cost ◽

The Other ◽

Speed Up ◽

Low Computational Cost

In convex optimization, it is often inevitable to work with projectors onto convex sets composed with a linear operator. Such a need arises from both the theory and applications, with signal processing being a prominent and broad field where convex optimization has been used recently. In this article, a novel projector is presented, which generalizes previous results in that it admits to work with a broader family of linear transforms when compared with the state of the art but, on the other hand, it is limited to box-type convex sets in the transformed domain. The new projector is described by an explicit formula, which makes it simple to implement and requires a low computational cost. The projector is interpreted within the framework of the so-called proximal splitting theory. The convenience of the new projector is demonstrated on an example from signal processing, where it was possible to speed up the convergence of a signal declipping algorithm by a factor of more than two.

Download Full-text

Unsupervised Representation High-Resolution Remote Sensing Image Scene Classification via Contrastive Learning Convolutional Neural Network

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.8.577 ◽

2021 ◽

Vol 87 (8) ◽

pp. 577-591

Author(s):

Fengpeng Li ◽

Jiabao Li ◽

Wei Han ◽

Ruyi Feng ◽

Lizhe Wang

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Convolutional Neural Network ◽

State Of The Art ◽

Remote Sensing Image ◽

Scene Classification ◽

Data Set ◽

Unsupervised Deep Learning

Inspired by the outstanding achievement of deep learning, supervised deep learning representation methods for high-spatial-resolution remote sensing image scene classification obtained state-of-the-art performance. However, supervised deep learning representation methods need a considerable amount of labeled data to capture class-specific features, limiting the application of deep learning-based methods while there are a few labeled training samples. An unsupervised deep learning representation, high-resolution remote sensing image scene classification method is proposed in this work to address this issue. The proposed method, called contrastive learning, narrows the distance between positive views: color channels belonging to the same images widens the gaps between negative view pairs consisting of color channels from different images to obtain class-specific data representations of the input data without any supervised information. The classifier uses extracted features by the convolutional neural network (CNN)-based feature extractor with labeled information of training data to set space of each category and then, using linear regression, makes predictions in the testing procedure. Comparing with existing unsupervised deep learning representation high-resolution remote sensing image scene classification methods, contrastive learning CNN achieves state-of-the-art performance on three different scale benchmark data sets: small scale RSSCN7 data set, midscale aerial image data set, and large-scale NWPU-RESISC45 data set.

Download Full-text

Hybrid pooling with wavelets for convolutional neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219223 ◽

2022 ◽

pp. 1-10

Author(s):

Daniel Trevino-Sanchez ◽

Vicente Alarcon-Aquino

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Computational Cost ◽

Relevant Information ◽

Accuracy Improvement ◽

Proposed Model ◽

Benchmark Datasets ◽

Augmentation Techniques ◽

High Computational Cost

The need to detect and classify objects correctly is a constant challenge, being able to recognize them at different scales and scenarios, sometimes cropped or badly lit is not an easy task. Convolutional neural networks (CNN) have become a widely applied technique since they are completely trainable and suitable to extract features. However, the growing number of convolutional neural networks applications constantly pushes their accuracy improvement. Initially, those improvements involved the use of large datasets, augmentation techniques, and complex algorithms. These methods may have a high computational cost. Nevertheless, feature extraction is known to be the heart of the problem. As a result, other approaches combine different technologies to extract better features to improve the accuracy without the need of more powerful hardware resources. In this paper, we propose a hybrid pooling method that incorporates multiresolution analysis within the CNN layers to reduce the feature map size without losing details. To prevent relevant information from losing during the downsampling process an existing pooling method is combined with wavelet transform technique, keeping those details "alive" and enriching other stages of the CNN. Achieving better quality characteristics improves CNN accuracy. To validate this study, ten pooling methods, including the proposed model, are tested using four benchmark datasets. The results are compared with four of the evaluated methods, which are also considered as the state-of-the-art.

Download Full-text

Evaluation of pre-training impact on fine-tuning for remote sensing scene classification

Remote Sensing Letters ◽

10.1080/2150704x.2018.1526423 ◽

2018 ◽

Vol 10 (1) ◽

pp. 49-58 ◽

Cited By ~ 1

Author(s):

Man Yuan ◽

Zhi Liu ◽

Fan Wang

Keyword(s):

Remote Sensing ◽

Fine Tuning ◽

Scene Classification ◽

Training Impact

Download Full-text