Vehicle and Vessel Detection on Satellite Imagery: A Comparative Study on Single-Shot Detectors

In this paper, we investigate the feasibility of automatic small object detection, such as vehicles and vessels, in satellite imagery with a spatial resolution between 0.3 and 0.5 m. The main challenges of this task are the small objects, as well as the spread in object sizes, with objects ranging from 5 to a few hundred pixels in length. We first annotated 1500 km2, making sure to have equal amounts of land and water data. On top of this dataset we trained and evaluated four different single-shot object detection networks: YOLOV2, YOLOV3, D-YOLO and YOLT, adjusting the many hyperparameters to achieve maximal accuracy. We performed various experiments to better understand the performance and differences between the models. The best performing model, D-YOLO, reached an average precision of 60% for vehicles and 66% for vessels and can process an image of around 1 Gpx in 14 s. We conclude that these models, if properly tuned, can thus indeed be used to help speed up the workflows of satellite data analysts and to create even bigger datasets, making it possible to train even better models in the future.

Download Full-text

Small Object Detection Algorithm Based on Feature Pyramid-Enhanced Fusion SSD

Complexity ◽

10.1155/2019/7297960 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Haotian Li ◽

Kezheng Lin ◽

Jingxuan Bai ◽

Ao Li ◽

Jiali Yu

Keyword(s):

Object Detection ◽

Detection Rate ◽

Detection Algorithm ◽

Single Shot ◽

Small Object ◽

Feature Maps ◽

Scale Invariant ◽

Feature Pyramid ◽

Small Object Detection ◽

Detection And Localization

In order to improve the detection rate of the traditional single-shot multibox detection algorithm in small object detection, a feature-enhanced fusion SSD object detection algorithm based on the pyramid network is proposed. Firstly, the selected multiscale feature layer is merged with the scale-invariant convolutional layer through the feature pyramid network structure; at the same time, the multiscale feature map is separately converted into the channel number using the scale-invariant convolution kernel. Then, the obtained two sets of pyramid-shaped feature layers are further feature fused to generate a set of enhanced multiscale feature maps, and the scale-invariant convolution is performed again on these layers. Finally, the obtained layer is used for detection and localization. The final location coordinates and confidence are output after nonmaximum suppression. Experimental results on the Pascal VOC 2007 and 2012 datasets confirm that there is a 8.2% improvement in mAP compared to the original SSD and some existing algorithms.

Download Full-text

SSD-TSEFFM: New SSD Using Trident Feature and Squeeze and Extraction Feature Fusion

Sensors ◽

10.3390/s20133630 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3630 ◽

Cited By ~ 1

Author(s):

Young-Joon Hwang ◽

Jin-Gu Lee ◽

Un-Chul Moon ◽

Ho-Hyun Park

Keyword(s):

Object Detection ◽

Semantic Information ◽

Feature Fusion ◽

Contextual Information ◽

Single Shot ◽

Small Object ◽

Dilated Convolution ◽

Average Improvement ◽

Proposed Model ◽

Small Object Detection

The single shot multi-box detector (SSD) exhibits low accuracy in small-object detection; this is because it does not consider the scale contextual information between its layers, and the shallow layers lack adequate semantic information. To improve the accuracy of the original SSD, this paper proposes a new single shot multi-box detector using trident feature and squeeze and extraction feature fusion (SSD-TSEFFM); this detector employs the trident network and the squeeze and excitation feature fusion module. Furthermore, a trident feature module (TFM) is developed, inspired by the trident network, to consider the scale contextual information. The use of this module makes the proposed model robust to scale changes owing to the application of dilated convolution. Further, the squeeze and excitation block feature fusion module (SEFFM) is used to provide more semantic information to the model. The SSD-TSEFFM is compared with the faster regions with convolution neural network features (RCNN) (2015), SSD (2016), and DF-SSD (2020) on the PASCAL VOC 2007 and 2012 datasets. The experimental results demonstrate the high accuracy of the proposed model in small-object detection, in addition to a good overall accuracy. The SSD-TSEFFM achieved 80.4% mAP and 80.2% mAP on the 2007 and 2012 datasets, respectively. This indicates an average improvement of approximately 2% over other models.

Download Full-text

FASSD: A Feature Fusion and Spatial Attention-Based Single Shot Detector for Small Object Detection

Electronics ◽

10.3390/electronics9091536 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1536

Author(s):

Deng Jiang ◽

Bei Sun ◽

Shaojing Su ◽

Zhen Zuo ◽

Peng Wu ◽

...

Keyword(s):

Object Detection ◽

Spatial Attention ◽

Feature Fusion ◽

Single Shot ◽

Small Object ◽

Feature Maps ◽

Feature Representations ◽

Small Object Detection ◽

High Level ◽

Detection Speed

Deep learning methods have significantly improved object detection performance, but small object detection remains an extremely difficult and challenging task in computer vision. We propose a feature fusion and spatial attention-based single shot detector (FASSD) for small object detection. We fuse high-level semantic information into shallow layers to generate discriminative feature representations for small objects. To adaptively enhance the expression of small object areas and suppress the feature response of background regions, the spatial attention block learns a self-attention mask to enhance the original feature maps. We also establish a small object dataset (LAKE-BOAT) of a scene with a boat on a lake and tested our algorithm to evaluate its performance. The results show that our FASSD achieves 79.3% mAP (mean average precision) on the PASCAL VOC2007 test with input 300 × 300, which outperforms the original single shot multibox detector (SSD) by 1.6 points, as well as most improved algorithms based on SSD. The corresponding detection speed was 45.3 FPS (frame per second) on the VOC2007 test using a single NVIDIA TITAN RTX GPU. The test results of a simplified FASSD on the LAKE-BOAT dataset indicate that our model achieved an improvement of 3.5% mAP on the baseline network while maintaining a real-time detection speed (64.4 FPS).

Download Full-text

SSD-EMB: An Improved SSD Using Enhanced Feature Map Block for Object Detection

Sensors ◽

10.3390/s21082842 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2842

Author(s):

Hong-Tae Choi ◽

Ho-Jun Lee ◽

Hoon Kang ◽

Sungwook Yu ◽

Ho-Hyun Park

Keyword(s):

Object Detection ◽

Great Success ◽

Processing Unit ◽

Single Shot ◽

Small Object ◽

Feature Map ◽

Proposed Model ◽

Small Object Detection ◽

Detection Speed ◽

Good Detection

The development of deep learning has achieved great success in object detection, but small object detection is still a difficult and challenging task in computer vision. To address the problem, we propose an improved single-shot multibox detector (SSD) using enhanced feature map blocks (SSD-EMB). The enhanced feature map block (EMB) consists of attention stream and feature map concatenation stream. The attention stream allows the proposed model to focus on the object regions rather than background owing to channel averaging and the effectiveness of the normalization. The feature map concatenation stream provides additional semantic information to the model without degrading the detection speed. By combining the output of these two streams, the enhanced feature map, which improves the detection of a small object, is generated. Experimental results show that the proposed model has high accuracy in small object detection. The proposed model not only achieves good detection accuracy, but also has a good detection speed. The SSD-EMB achieved a mean average precision (mAP) of 80.4% on the PASCAL VOC 2007 dataset at 30 frames per second on an RTX 2080Ti graphics processing unit, an mAP of 79.9% on the VOC 2012 dataset, and an mAP of 26.6% on the MS COCO dataset.

Download Full-text

VEHICLE DETECTION IN HIGH RESOLUTION IMAGE BASED ON DEEP LEARNING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-49-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 49-54

Author(s):

H. Gao ◽

X. Li

Keyword(s):

High Resolution ◽

Object Detection ◽

Feature Fusion ◽

Single Shot ◽

Feature Maps ◽

Average Precision ◽

Speed Up ◽

High Resolution Images ◽

Small Targets ◽

Fusion Feature

Abstract. Despite its high accuracy and fast speed in object detection, Single Shot Multi-Box Detector (SSD) tends to get undesirable results especially for small targets such as vehicles on high-resolution images. In this paper, we propose a new convolutional neural network based on SSD to detect vehicles on high-resolution images. In the proposed framework, the feature fusion module and detection module are incorporated. In the feature fusion module, feature maps of different scales are integrated into a fusion feature for object detection, which could improve the accuracy effectively. Besides, to prevent the network from overfitting and speed up the training, the batch normalization layer is embedded between the detection layers in the detection module. Some ablation experiments provide strong evidence for the effectiveness of these above structures. On the UCAS-High Resolution Aerial Object Detection Dataset, our network has the ability to achieve the 0.904 AP (average precision) with 0.094 AP higher than SSD512 but similar speed to it.

Download Full-text

Multi-scales feature integration single shot multi-box detector on small object detection

MIPPR 2019: Pattern Recognition and Computer Vision ◽

10.1117/12.2538020 ◽

2020 ◽

Author(s):

Jianbang zhou ◽

Bo Chen ◽

Jiahao Zhang ◽

Zhong Chen ◽

Jian Yang

Keyword(s):

Object Detection ◽

Feature Integration ◽

Single Shot ◽

Small Object ◽

Small Object Detection ◽

Multi Scales

Download Full-text

Multi-View Object Detection Based on Deep Learning

Applied Sciences ◽

10.3390/app8091423 ◽

2018 ◽

Vol 8 (9) ◽

pp. 1423 ◽

Cited By ~ 15

Author(s):

Cong Tang ◽

Yongshun Ling ◽

Xing Yang ◽

Wei Jin ◽

Chao Zheng

Keyword(s):

Deep Learning ◽

Object Detection ◽

Regression Models ◽

Detection Methods ◽

Detection Accuracy ◽

Single Shot ◽

Small Object ◽

Object Retrieval ◽

Detection Approach ◽

Small Object Detection

A multi-view object detection approach based on deep learning is proposed in this paper. Classical object detection methods based on regression models are introduced, and the reasons for their weak ability to detect small objects are analyzed. To improve the performance of these methods, a multi-view object detection approach is proposed, and the model structure and working principles of this approach are explained. Additionally, the object retrieval ability and object detection accuracy of both the multi-view methods and the corresponding classical methods are evaluated and compared based on a test on a small object dataset. The experimental results show that in terms of object retrieval capability, Multi-view YOLO (You Only Look Once: Unified, Real-Time Object Detection), Multi-view YOLOv2 (based on an updated version of YOLO), and Multi-view SSD (Single Shot Multibox Detector) achieve AF (average F-measure) scores that are higher than those of their classical counterparts by 0.177, 0.06, and 0.169, respectively. Moreover, in terms of the detection accuracy, when difficult objects are not included, the mAP (mean average precision) scores of the multi-view methods are higher than those of the classical methods by 14.3%, 7.4%, and 13.1%, respectively. Thus, the validity of the approach proposed in this paper has been verified. In addition, compared with state-of-the-art methods based on region proposals, multi-view detection methods are faster while achieving mAPs that are approximately the same in small object detection.

Download Full-text