Enhanced Feature Representation in Detection for Optical Remote Sensing Images

In recent years, deep learning has led to a remarkable breakthrough in object detection in remote sensing images. In practice, two-stage detectors perform well regarding detection accuracy but are slow. On the other hand, one-stage detectors integrate the detection pipeline of two-stage detectors to simplify the detection process, and are faster, but with lower detection accuracy. Enhancing the capability of feature representation may be a way to improve the detection accuracy of one-stage detectors. For this goal, this paper proposes a novel one-stage detector with enhanced capability of feature representation. The enhanced capability benefits from two proposed structures: dual top-down module and dense-connected inception module. The former efficiently utilizes multi-scale features from multiple layers of the backbone network. The latter both widens and deepens the network to enhance the ability of feature representation with limited extra computational cost. To evaluate the effectiveness of proposed structures, we conducted experiments on horizontal bounding box detection tasks on the challenging DOTA dataset and gained 73.49% mean Average Precision (mAP), achieving state-of-the-art performance. Furthermore, our method ran significantly faster than the best public two-stage detector on the DOTA dataset.

Download Full-text

Subtask Attention Based Object Detection in Remote Sensing Images

Remote Sensing ◽

10.3390/rs13101925 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1925

Author(s):

Shengzhou Xiong ◽

Yihua Tan ◽

Yansheng Li ◽

Cai Wen ◽

Pei Yan

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Feature Fusion ◽

Detection Task ◽

Feature Representation ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Attention Network ◽

Multi Scale ◽

Automatic Interpretation

Object detection in remote sensing images (RSIs) is one of the basic tasks in the field of remote sensing image automatic interpretation. In recent years, the deep object detection frameworks of natural scene images (NSIs) have been introduced into object detection on RSIs, and the detection performance has improved significantly because of the powerful feature representation. However, there are still many challenges concerning the particularities of remote sensing objects. One of the main challenges is the missed detection of small objects which have less than five percent of the pixels of the big objects. Generally, the existing algorithms choose to deal with this problem by multi-scale feature fusion based on a feature pyramid. However, the benefits of this strategy are limited, considering that the location of small objects in the feature map will disappear when the detection task is processed at the end of the network. In this study, we propose a subtask attention network (StAN), which handles the detection task directly on the shallow layer of the network. First, StAN contains one shared feature branch and two subtask attention branches of a semantic auxiliary subtask and a detection subtask based on the multi-task attention network (MTAN). Second, the detection branch uses only low-level features considering small objects. Third, the attention map guidance mechanism is put forward to optimize the network for keeping the identification ability. Fourth, the multi-dimensional sampling module (MdS), global multi-view channel weights (GMulW) and target-guided pixel attention (TPA) are designed for further improvement of the detection accuracy in complex scenes. The experimental results on the NWPU VHR-10 dataset and DOTA dataset demonstrated that the proposed algorithm achieved the SOTA performance, and the missed detection of small objects decreased. On the other hand, ablation experiments also proved the effects of MdS, GMulW and TPA.

Download Full-text

Detection of Schools in Remote Sensing Images Based on Attention-Guided Dense Network

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10110736 ◽

2021 ◽

Vol 10 (11) ◽

pp. 736

Author(s):

Han Fu ◽

Xiangtao Fan ◽

Zhenzhen Yan ◽

Xiaoping Du

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Feature Fusion ◽

State Of The Art ◽

Feature Representation ◽

Detection Accuracy ◽

Dense Network ◽

Remote Sensing Images ◽

Composite Object ◽

Detection Algorithms

The detection of primary and secondary schools (PSSs) is a meaningful task for composite object detection in remote sensing images (RSIs). As a typical composite object in RSIs, PSSs have diverse appearances with complex backgrounds, which makes it difficult to effectively extract their features using the existing deep-learning-based object detection algorithms. Aiming at the challenges of PSSs detection, we propose an end-to-end framework called the attention-guided dense network (ADNet), which can effectively improve the detection accuracy of PSSs. First, a dual attention module (DAM) is designed to enhance the ability in representing complex characteristics and alleviate distractions in the background. Second, a dense feature fusion module (DFFM) is built to promote attention cues flow into low layers, which guides the generation of hierarchical feature representation. Experimental results demonstrate that our proposed method outperforms the state-of-the-art methods and achieves 79.86% average precision. The study proves the effectiveness of our proposed method on PSSs detection.

Download Full-text

A Lightweight Object Detection Framework for Remote Sensing Images

Remote Sensing ◽

10.3390/rs13040683 ◽

2021 ◽

Vol 13 (4) ◽

pp. 683

Author(s):

Lang Huyan ◽

Yunpeng Bai ◽

Ying Li ◽

Dongmei Jiang ◽

Yanning Zhang ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Computational Cost ◽

Feature Representation ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Low Level

Onboard real-time object detection in remote sensing images is a crucial but challenging task in this computation-constrained scenario. This task not only requires the algorithm to yield excellent performance but also requests limited time and space complexity of the algorithm. However, previous convolutional neural networks (CNN) based object detectors for remote sensing images suffer from heavy computational cost, which hinders them from being deployed on satellites. Moreover, an onboard detector is desired to detect objects at vastly different scales. To address these issues, we proposed a lightweight one-stage multi-scale feature fusion detector called MSF-SNET for onboard real-time object detection of remote sensing images. Using lightweight SNET as the backbone network reduces the number of parameters and computational complexity. To strengthen the detection performance of small objects, three low-level features are extracted from the three stages of SNET respectively. In the detection part, another three convolutional layers are designed to further extract deep features with rich semantic information for large-scale object detection. To improve detection accuracy, the deep features and low-level features are fused to enhance the feature representation. Extensive experiments and comprehensive evaluations on the openly available NWPU VHR-10 dataset and DIOR dataset are conducted to evaluate the proposed method. Compared with other state-of-art detectors, the proposed detection framework has fewer parameters and calculations, while maintaining consistent accuracy.

Download Full-text

A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs11111376 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1376 ◽

Cited By ~ 4

Author(s):

Weiying Xie ◽

Haonan Qin ◽

Yunsong Li ◽

Zhuo Wang ◽

Jie Lei

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Feature Fusion ◽

Field Enhancement ◽

Feature Representation ◽

Detection Accuracy ◽

Convolutional Network ◽

Remote Sensing Imagery ◽

Multi Scale ◽

One Stage

With great significance in military and civilian applications, the topic of detecting small and densely arranged objects in wide-scale remote sensing imagery is still challenging nowadays. To solve this problem, we propose a novel effectively optimized one-stage network (NEOON). As a fully convolutional network, NEOON consists of four parts: Feature extraction, feature fusion, feature enhancement, and multi-scale detection. To extract effective features, the first part has implemented bottom-up and top-down coherent processing by taking successive down-sampling and up-sampling operations in conjunction with residual modules. The second part consolidates high-level and low-level features by adopting concatenation operations with subsequent convolutional operations to explicitly yield strong feature representation and semantic information. The third part is implemented by constructing a receptive field enhancement (RFE) module and incorporating it into the fore part of the network where the information of small objects exists. The final part is achieved by four detectors with different sensitivities accessing the fused features, all four parallel, to enable the network to make full use of information of objects in different scales. Besides, the Focal Loss is set to enable the cross entropy for classification to solve the tough problem of class imbalance in one-stage methods. In addition, we introduce the Soft-NMS to preserve accurate bounding boxes in the post-processing stage especially for densely arranged objects. Note that the split and merge strategy and multi-scale training strategy are employed in training. Thorough experiments are performed on ACS datasets constructed by us and NWPU VHR-10 datasets to evaluate the performance of NEOON. Specifically, 4.77% and 5.50% improvements in mAP and recall, respectively, on the ACS dataset as compared to YOLOv3 powerfully prove that NEOON can effectually improve the detection accuracy of small objects in remote sensing imagery. In addition, extensive experiments and comprehensive evaluations on the NWPU VHR-10 dataset with 10 classes have illustrated the superiority of NEOON in the extraction of spatial information of high-resolution remote sensing images.

Download Full-text

A Dual-Model Architecture with Grouping-Attention-Fusion for Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs13030433 ◽

2021 ◽

Vol 13 (3) ◽

pp. 433

Author(s):

Junge Shen ◽

Tong Zhang ◽

Yichen Wang ◽

Ruxin Wang ◽

Qi Wang ◽

...

Keyword(s):

Remote Sensing ◽

Feature Representation ◽

Dual Model ◽

Scene Classification ◽

Remote Sensing Images ◽

Single Model ◽

Fusion Strategy ◽

Multi Scale ◽

The Arts ◽

Scene Representation

Remote sensing images contain complex backgrounds and multi-scale objects, which pose a challenging task for scene classification. The performance is highly dependent on the capacity of the scene representation as well as the discriminability of the classifier. Although multiple models possess better properties than a single model on these aspects, the fusion strategy for these models is a key component to maximize the final accuracy. In this paper, we construct a novel dual-model architecture with a grouping-attention-fusion strategy to improve the performance of scene classification. Specifically, the model employs two different convolutional neural networks (CNNs) for feature extraction, where the grouping-attention-fusion strategy is used to fuse the features of the CNNs in a fine and multi-scale manner. In this way, the resultant feature representation of the scene is enhanced. Moreover, to address the issue of similar appearances between different scenes, we develop a loss function which encourages small intra-class diversities and large inter-class distances. Extensive experiments are conducted on four scene classification datasets include the UCM land-use dataset, the WHU-RS19 dataset, the AID dataset, and the OPTIMAL-31 dataset. The experimental results demonstrate the superiority of the proposed method in comparison with the state-of-the-arts.

Download Full-text

A Lightweight SE-YOLOv3 Network for Multi-Scale Object Detection in Remote Sensing Imagery

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421500373 ◽

2021 ◽

Author(s):

Lifang Zhou ◽

Guang Deng ◽

Weisheng Li ◽

Jianxun Mi ◽

Bangjun Lei

Keyword(s):

Remote Sensing ◽

High Efficiency ◽

Computational Cost ◽

Image Data ◽

Detection Accuracy ◽

Data Sets ◽

One Stage ◽

Efficiency And Effectiveness ◽

Spatial Pyramid Pooling ◽

Feature Augmentation

Current state-of-the-art detectors achieved impressive performance in detection accuracy with the use of deep learning. However, most of such detectors cannot detect objects in real time due to heavy computational cost, which limits their wide application. Although some one-stage detectors are designed to accelerate the detection speed, it is still not satisfied for task in high-resolution remote sensing images. To address this problem, a lightweight one-stage approach based on YOLOv3 is proposed in this paper, which is named Squeeze-and-Excitation YOLOv3 (SE-YOLOv3). The proposed algorithm maintains high efficiency and effectiveness simultaneously. With an aim to reduce the number of parameters and increase the ability of feature description, two customized modules, lightweight feature extraction and attention-aware feature augmentation, are embedded by utilizing global information and suppressing redundancy features, respectively. To meet the scale invariance, a spatial pyramid pooling method is used to aggregate local features. The evaluation experiments on two remote sensing image data sets, DOTA and NWPU VHR-10, reveal that the proposed approach achieves more competitive detection effect with less computational consumption.

Download Full-text

A New Benchmark and an Attribute-Guided Multilevel Feature Representation Network for Fine-Grained Ship Classification in Optical Remote Sensing Images

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2020.2981686 ◽

2020 ◽

Vol 13 ◽

pp. 1271-1285 ◽

Cited By ~ 1

Author(s):

Xiaohan Zhang ◽

Yafei Lv ◽

Libo Yao ◽

Wei Xiong ◽

Chunlong Fu

Keyword(s):

Remote Sensing ◽

Feature Representation ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Fine Grained ◽

Ship Classification

Download Full-text

IoU-Adaptive Deformable R-CNN: Make Full Use of IoU for Multi-Class Object Detection in Remote Sensing Imagery

Remote Sensing ◽

10.3390/rs11030286 ◽

2019 ◽

Vol 11 (3) ◽

pp. 286 ◽

Cited By ~ 24

Author(s):

Jiangqiao Yan ◽

Hongqi Wang ◽

Menglong Yan ◽

Wenhui Diao ◽

Xian Sun ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Object Detection ◽

Convolutional Neural Network ◽

State Of The Art ◽

Ground Truth ◽

Detection Performance ◽

Candidate Region ◽

Detection Accuracy ◽

Remote Sensing Images

Recently, methods based on Faster region-based convolutional neural network (R-CNN)have been popular in multi-class object detection in remote sensing images due to their outstandingdetection performance. The methods generally propose candidate region of interests (ROIs) througha region propose network (RPN), and the regions with high enough intersection-over-union (IoU)values against ground truth are treated as positive samples for training. In this paper, we find thatthe detection result of such methods is sensitive to the adaption of different IoU thresholds. Specially,detection performance of small objects is poor when choosing a normal higher threshold, while alower threshold will result in poor location accuracy caused by a large quantity of false positives.To address the above issues, we propose a novel IoU-Adaptive Deformable R-CNN framework formulti-class object detection. Specially, by analyzing the different roles that IoU can play in differentparts of the network, we propose an IoU-guided detection framework to reduce the loss of small objectinformation during training. Besides, the IoU-based weighted loss is designed, which can learn theIoU information of positive ROIs to improve the detection accuracy effectively. Finally, the class aspectratio constrained non-maximum suppression (CARC-NMS) is proposed, which further improves theprecision of the results. Extensive experiments validate the effectiveness of our approach and weachieve state-of-the-art detection performance on the DOTA dataset.

Download Full-text