scholarly journals UAV Image Multi-Labeling with Data-Efficient Transformers

2021 ◽  
Vol 11 (9) ◽  
pp. 3974
Author(s):  
Laila Bashmal ◽  
Yakoub Bazi ◽  
Mohamad Mahmoud Al Rahhal ◽  
Haikel Alhichri ◽  
Naif Al Ajlan

In this paper, we present an approach for the multi-label classification of remote sensing images based on data-efficient transformers. During the training phase, we generated a second view for each image from the training set using data augmentation. Then, both the image and its augmented version were reshaped into a sequence of flattened patches and then fed to the transformer encoder. The latter extracts a compact feature representation from each image with the help of a self-attention mechanism, which can handle the global dependencies between different regions of the high-resolution aerial image. On the top of the encoder, we mounted two classifiers, a token and a distiller classifier. During training, we minimized a global loss consisting of two terms, each corresponding to one of the two classifiers. In the test phase, we considered the average of the two classifiers as the final class labels. Experiments on two datasets acquired over the cities of Trento and Civezzano with a ground resolution of two-centimeter demonstrated the effectiveness of the proposed model.

2019 ◽  
Vol 9 (5) ◽  
pp. 1020 ◽  
Author(s):  
Lilun Zhang ◽  
Dezhi Wang ◽  
Changchun Bao ◽  
Yongxian Wang ◽  
Kele Xu

Whale vocal calls contain valuable information and abundant characteristics that are important for classification of whale sub-populations and related biological research. In this study, an effective data-driven approach based on pre-trained Convolutional Neural Networks (CNN) using multi-scale waveforms and time-frequency feature representations is developed in order to perform the classification of whale calls from a large open-source dataset recorded by sensors carried by whales. Specifically, the classification is carried out through a transfer learning approach by using pre-trained state-of-the-art CNN models in the field of computer vision. 1D raw waveforms and 2D log-mel features of the whale-call data are respectively used as the input of CNN models. For raw waveform input, windows are applied to capture multiple sketches of a whale-call clip at different time scales and stack the features from different sketches for classification. When using the log-mel features, the delta and delta-delta features are also calculated to produce a 3-channel feature representation for analysis. In the training, a 4-fold cross-validation technique is employed to reduce the overfitting effect, while the Mix-up technique is also applied to implement data augmentation in order to further improve the system performance. The results show that the proposed method can improve the accuracies by more than 20% in percentage for the classification into 16 whale pods compared with the baseline method using groups of 2D shape descriptors of spectrograms and the Fisher discriminant scores on the same dataset. Moreover, it is shown that classifications based on log-mel features have higher accuracies than those based directly on raw waveforms. The phylogeny graph is also produced to significantly illustrate the relationships among the whale sub-populations.


2021 ◽  
Vol 14 (1) ◽  
pp. 171
Author(s):  
Qingyan Wang ◽  
Meng Chen ◽  
Junping Zhang ◽  
Shouqiang Kang ◽  
Yujing Wang

Hyperspectral image (HSI) data classification often faces the problem of the scarcity of labeled samples, which is considered to be one of the major challenges in the field of remote sensing. Although active deep networks have been successfully applied in semi-supervised classification tasks to address this problem, their performance inevitably meets the bottleneck due to the limitation of labeling cost. To address the aforementioned issue, this paper proposes a semi-supervised classification method for hyperspectral images that improves active deep learning. Specifically, the proposed model introduces the random multi-graph algorithm and replaces the expert mark in active learning with the anchor graph algorithm, which can label a considerable amount of unlabeled data precisely and automatically. In this way, a large number of pseudo-labeling samples would be added to the training subsets such that the model could be fine-tuned and the generalization performance could be improved without extra efforts for data manual labeling. Experiments based on three standard HSIs demonstrate that the proposed model can get better performance than other conventional methods, and they also outperform other studied algorithms in the case of a small training set.


2019 ◽  
Vol 11 (7) ◽  
pp. 830 ◽  
Author(s):  
Penghua Liu ◽  
Xiaoping Liu ◽  
Mengxi Liu ◽  
Qian Shi ◽  
Jinxing Yang ◽  
...  

The rapid development in deep learning and computer vision has introduced new opportunities and paradigms for building extraction from remote sensing images. In this paper, we propose a novel fully convolutional network (FCN), in which a spatial residual inception (SRI) module is proposed to capture and aggregate multi-scale contexts for semantic understanding by successively fusing multi-level features. The proposed SRI-Net is capable of accurately detecting large buildings that might be easily omitted while retaining global morphological characteristics and local details. On the other hand, to improve computational efficiency, depthwise separable convolutions and convolution factorization are introduced to significantly decrease the number of model parameters. The proposed model is evaluated on the Inria Aerial Image Labeling Dataset and the Wuhan University (WHU) Aerial Building Dataset. The experimental results show that the proposed methods exhibit significant improvements compared with several state-of-the-art FCNs, including SegNet, U-Net, RefineNet, and DeepLab v3+. The proposed model shows promising potential for building detection from remote sensing images on a large scale.


2021 ◽  
Vol 13 (14) ◽  
pp. 2728
Author(s):  
Qingjie Zeng ◽  
Jie Geng ◽  
Kai Huang ◽  
Wen Jiang ◽  
Jun Guo

Few-shot classification of remote sensing images has attracted attention due to its important applications in various fields. The major challenge in few-shot remote sensing image scene classification is that limited labeled samples can be utilized for training. This may lead to the deviation of prototype feature expression, and thus the classification performance will be impacted. To solve these issues, a prototype calibration with a feature-generating model is proposed for few-shot remote sensing image scene classification. In the proposed framework, a feature encoder with self-attention is developed to reduce the influence of irrelevant information. Then, the feature-generating module is utilized to expand the support set of the testing set based on prototypes of the training set, and prototype calibration is proposed to optimize features of support images that can enhance the representativeness of each category features. Experiments on NWPU-RESISC45 and WHU-RS19 datasets demonstrate that the proposed method can yield superior classification accuracies for few-shot remote sensing image scene classification.


2020 ◽  
Vol 224 (1) ◽  
pp. 191-198
Author(s):  
Xinliang Liu ◽  
Tao Ren ◽  
Hongfeng Chen ◽  
Yufeng Chen

SUMMARY In this paper, convolutional neural networks (CNNs) were used to distinguish between tectonic and non-tectonic seismicity. The proposed CNNs consisted of seven convolutional layers with small kernels and one fully connected layer, which only relied on the acoustic waveform without extracting features manually. For a single station, the accuracy of the model was 0.90, and the event accuracy could reach 0.93. The proposed model was tested using data from January 2019 to August 2019 in China. The event accuracy could reach 0.92, showing that the proposed model could distinguish between tectonic and non-tectonic seismicity.


2020 ◽  
Vol 12 (17) ◽  
pp. 2789 ◽  
Author(s):  
Xue Shan ◽  
Pingping Liu ◽  
Guixia Gou ◽  
Qiuzhan Zhou ◽  
Zhen Wang

As satellite observation technology improves, the number of remote sensing images significantly and rapidly increases. Therefore, a growing number of studies are focusing on remote sensing image retrieval. However, having a large number of remote sensing images considerably slows the retrieval time and takes up a great deal of memory space. The hash method is being increasingly used for rapid image retrieval because of its remarkably fast performance. At the same time, selecting samples that contain more information and greater stability to train the network has gradually become the key to improving retrieval performance. Given the above considerations, we propose a deep hash remote sensing image retrieval method, called the hard probability sampling hash retrieval method (HPSH), which combines hash code learning with hard probability sampling in a deep network. Specifically, we used a probability sampling method to select training samples, and we designed one novel hash loss function to better train the network parameters and reduce the hashing accuracy loss due to quantization. Our experimental results demonstrate that HPSH could yield an excellent representation compared with other state-of-the-art hash approaches. For the university of California, merced (UCMD) dataset, HPSH+S resulted in a mean average precision (mAP) of up to 90.9% on 16 hash bits, 92.2% on 24 hash bits, and 92.8% on 32 hash bits. For the aerial image dataset (AID), HPSH+S achieved a mAP of up to 89.8% on 16 hash bits, 93.6% on 24 hash bits, and 95.5% on 32 hash bits. For the UCMD dataset, with the use of data augmentation, our proposed approach achieved a mAP of up to 99.6% on 32 hash bits and 99.7% on 64 hash bits.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7545
Author(s):  
Md Mahibul Hasan ◽  
Zhijie Wang ◽  
Muhammad Ather Iqbal Hussain ◽  
Kaniz Fatima

Vehicle type classification plays an essential role in developing an intelligent transportation system (ITS). Based on the modern accomplishments of deep learning (DL) on image classification, we proposed a model based on transfer learning, incorporating data augmentation, for the recognition and classification of Bangladeshi native vehicle types. An extensive dataset of Bangladeshi native vehicles, encompassing 10,440 images, was developed. Here, the images are categorized into 13 common vehicle classes in Bangladesh. The method utilized was a residual network (ResNet-50)-based model, with extra classification blocks added to improve performance. Here, vehicle type features were automatically extracted and categorized. While conducting the analysis, a variety of metrics was used for the evaluation, including accuracy, precision, recall, and F1 − Score. In spite of the changing physical properties of the vehicles, the proposed model achieved progressive accuracy. Our proposed method surpasses the existing baseline method as well as two pre-trained DL approaches, AlexNet and VGG-16. Based on result comparisons, we have seen that, in the classification of Bangladeshi native vehicle types, our suggested ResNet-50 pre-trained model achieves an accuracy of 98.00%.


2022 ◽  
Vol 2022 ◽  
pp. 1-16
Author(s):  
Nesrine Wagaa ◽  
Hichem Kallel ◽  
Nédra Mellouli

Handwritten characters recognition is a challenging research topic. A lot of works have been present to recognize letters of different languages. The availability of Arabic handwritten characters databases is limited. Motivated by this topic of research, we propose a convolution neural network for the classification of Arabic handwritten letters. Also, seven optimization algorithms are performed, and the best algorithm is reported. Faced with few available Arabic handwritten datasets, various data augmentation techniques are implemented to improve the robustness needed for the convolution neural network model. The proposed model is improved by using the dropout regularization method to avoid data overfitting problems. Moreover, suitable change is presented in the choice of optimization algorithms and data augmentation approaches to achieve a good performance. The model has been trained on two Arabic handwritten characters datasets AHCD and Hijja. The proposed algorithm achieved high recognition accuracy of 98.48% and 91.24% on AHCD and Hijja, respectively, outperforming other state-of-the-art models.


Sign in / Sign up

Export Citation Format

Share Document