An Approach to Improve SSD through Skip Connection of Multiscale Feature Maps

SSD (Single Shot MultiBox Detector) is one of the best object detection algorithms and is able to provide high accurate object detection performance in real time. However, SSD shows relatively poor performance on small object detection because its shallow prediction layer, which is responsible for detecting small objects, lacks enough semantic information. To overcome this problem, SKIPSSD, an improved SSD with a novel skip connection of multiscale feature maps, is proposed in this paper to enhance the semantic information and the details of the prediction layers through skippingly fusing high-level and low-level feature maps. For the detail of the fusion methods, we design two feature fusion modules and multiple fusion strategies to improve the SSD detector’s sensitivity and perception ability. Experimental results on the PASCAL VOC2007 test set demonstrate that SKIPSSD significantly improves the detection performance and outperforms lots of state-of-the-art object detectors. With an input size of 300 × 300, SKIPSSD achieves 79.0% mAP (mean average precision) at 38.7 FPS (frame per second) on a single 1080 GPU, 1.8% higher than the mAP of SSD while still keeping the real-time detection speed.

Download Full-text

SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Applied Sciences ◽

10.3390/app11031096 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1096

Author(s):

Qing Li ◽

Yingcheng Lin ◽

Wei He

Keyword(s):

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Contextual Information ◽

Attention Mechanism ◽

Detection Accuracy ◽

Single Shot ◽

Feature Maps ◽

Embedded Devices

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Download Full-text

SSD-TSEFFM: New SSD Using Trident Feature and Squeeze and Extraction Feature Fusion

Sensors ◽

10.3390/s20133630 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3630 ◽

Cited By ~ 1

Author(s):

Young-Joon Hwang ◽

Jin-Gu Lee ◽

Un-Chul Moon ◽

Ho-Hyun Park

Keyword(s):

Object Detection ◽

Semantic Information ◽

Feature Fusion ◽

Contextual Information ◽

Single Shot ◽

Small Object ◽

Dilated Convolution ◽

Average Improvement ◽

Proposed Model ◽

Small Object Detection

The single shot multi-box detector (SSD) exhibits low accuracy in small-object detection; this is because it does not consider the scale contextual information between its layers, and the shallow layers lack adequate semantic information. To improve the accuracy of the original SSD, this paper proposes a new single shot multi-box detector using trident feature and squeeze and extraction feature fusion (SSD-TSEFFM); this detector employs the trident network and the squeeze and excitation feature fusion module. Furthermore, a trident feature module (TFM) is developed, inspired by the trident network, to consider the scale contextual information. The use of this module makes the proposed model robust to scale changes owing to the application of dilated convolution. Further, the squeeze and excitation block feature fusion module (SEFFM) is used to provide more semantic information to the model. The SSD-TSEFFM is compared with the faster regions with convolution neural network features (RCNN) (2015), SSD (2016), and DF-SSD (2020) on the PASCAL VOC 2007 and 2012 datasets. The experimental results demonstrate the high accuracy of the proposed model in small-object detection, in addition to a good overall accuracy. The SSD-TSEFFM achieved 80.4% mAP and 80.2% mAP on the 2007 and 2012 datasets, respectively. This indicates an average improvement of approximately 2% over other models.

Download Full-text

FASSD: A Feature Fusion and Spatial Attention-Based Single Shot Detector for Small Object Detection

Electronics ◽

10.3390/electronics9091536 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1536

Author(s):

Deng Jiang ◽

Bei Sun ◽

Shaojing Su ◽

Zhen Zuo ◽

Peng Wu ◽

...

Keyword(s):

Object Detection ◽

Spatial Attention ◽

Feature Fusion ◽

Single Shot ◽

Small Object ◽

Feature Maps ◽

Feature Representations ◽

Small Object Detection ◽

High Level ◽

Detection Speed

Deep learning methods have significantly improved object detection performance, but small object detection remains an extremely difficult and challenging task in computer vision. We propose a feature fusion and spatial attention-based single shot detector (FASSD) for small object detection. We fuse high-level semantic information into shallow layers to generate discriminative feature representations for small objects. To adaptively enhance the expression of small object areas and suppress the feature response of background regions, the spatial attention block learns a self-attention mask to enhance the original feature maps. We also establish a small object dataset (LAKE-BOAT) of a scene with a boat on a lake and tested our algorithm to evaluate its performance. The results show that our FASSD achieves 79.3% mAP (mean average precision) on the PASCAL VOC2007 test with input 300 × 300, which outperforms the original single shot multibox detector (SSD) by 1.6 points, as well as most improved algorithms based on SSD. The corresponding detection speed was 45.3 FPS (frame per second) on the VOC2007 test using a single NVIDIA TITAN RTX GPU. The test results of a simplified FASSD on the LAKE-BOAT dataset indicate that our model achieved an improvement of 3.5% mAP on the baseline network while maintaining a real-time detection speed (64.4 FPS).

Download Full-text

Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

10.20944/preprints202003.0313.v1 ◽

2020 ◽

Author(s):

Jakaria Rabbi ◽

Nilanjan Ray ◽

Matthias Schubert ◽

Subir Chowdhury ◽

Dennis Chao

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Super Resolution ◽

Detection Performance ◽

Superior Performance ◽

Single Shot ◽

Small Object ◽

Remote Sensing Images ◽

Low Resolution ◽

End To End

The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection performance degrades for the small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we apply a new edge-enhanced super-resolution GAN (EESRGAN) to improve the image quality of remote sensing images and used different detector networks in an end-to-end manner where detector loss is backpropagated into the EESRGAN to improve the detection performance. We propose an architecture with three components: ESRGAN, Edge Enhancement Network (EEN), and Detection network. We use residual-in-residual dense blocks (RRDB) for both the GAN and EEN, and for the detector network, we use the faster region-based convolutional network (FRCNN) (two-stage detector) and single-shot multi-box detector (SSD) (one stage detector). Extensive experiments on car overhead with context and oil and gas storage tank (created by us) data sets show superior performance of our method compared to the standalone state-of-the-art object detectors.

Download Full-text

Lightweight Feature Enhancement Network for Single-Shot Object Detection

Sensors ◽

10.3390/s21041066 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1066

Author(s):

Peng Jia ◽

Fuxiang Liu

Keyword(s):

Receptive Field ◽

Object Detection ◽

Real Time ◽

Detection System ◽

Field Enhancement ◽

Detection Performance ◽

Feature Representation ◽

Data Sets ◽

Single Shot ◽

Enhancement Method

At present, the one-stage detector based on the lightweight model can achieve real-time speed, but the detection performance is challenging. To enhance the discriminability and robustness of the model extraction features and improve the detector’s detection performance for small objects, we propose two modules in this work. First, we propose a receptive field enhancement method, referred to as adaptive receptive field fusion (ARFF). It enhances the model’s feature representation ability by adaptively learning the fusion weights of different receptive field branches in the receptive field module. Then, we propose an enhanced up-sampling (EU) module to reduce the information loss caused by up-sampling on the feature map. Finally, we assemble ARFF and EU modules on top of YOLO v3 to build a real-time, high-precision and lightweight object detection system referred to as the ARFF-EU network. We achieve a state-of-the-art speed and accuracy trade-off on both the Pascal VOC and MS COCO data sets, reporting 83.6% AP at 37.5 FPS and 42.5% AP at 33.7 FPS, respectively. The experimental results show that our proposed ARFF and EU modules improve the detection performance of the ARFF-EU network and achieve the development of advanced, very deep detectors while maintaining real-time speed.

Download Full-text

Small Object Detection in Traffic Scenes Based on Attention Feature Fusion

Sensors ◽

10.3390/s21093031 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3031

Author(s):

Jing Lian ◽

Yuhang Yin ◽

Linhui Li ◽

Zhenghao Wang ◽

Yafu Zhou

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Contextual Information ◽

Detection Accuracy ◽

Small Object ◽

Limited Information ◽

Feature Maps ◽

Multi Scale ◽

Validation Set ◽

Small Object Detection

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.

Download Full-text

Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

10.20944/preprints202003.0313.v2 ◽

2020 ◽

Author(s):

Jakaria Rabbi ◽

Nilanjan Ray ◽

Matthias Schubert ◽

Subir Chowdhury ◽

Dennis Chao

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Super Resolution ◽

Detection Performance ◽

Superior Performance ◽

Single Shot ◽

Small Object ◽

Remote Sensing Images ◽

Low Resolution ◽

End To End

The detection performance of small objects in remote sensing images is not satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) shows remarkable image enhancement performance, but reconstructed images miss high-frequency edge information. Therefore, object detection performance degrades for small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we apply a new edge-enhanced super-resolution GAN (EESRGAN) to improve the image quality of remote sensing images and use different detector networks in an end-to-end manner where detector loss is backpropagated into the EESRGAN to improve the detection performance. We propose an architecture with three components: ESRGAN, Edge Enhancement Network (EEN), and Detection network. We use residual-in-residual dense blocks (RRDB) for both the ESRGAN and EEN, and for the detector network, we use the faster region-based convolutional network (FRCNN) (two-stage detector) and single-shot multi-box detector (SSD) (one stage detector). Extensive experiments on a public (car overhead with context) and a self-assembled (oil and gas storage tank) satellite dataset show superior performance of our method compared to the standalone state-of-the-art object detectors.

Download Full-text

Real-time Robust Object Detection Using an Adjacent Feature Fusion-based Single Shot Multibox Detector

IEIE Transactions on Smart Processing and Computing ◽

10.5573/ieiespc.2020.9.1.022 ◽

2020 ◽

Vol 9 (1) ◽

pp. 22-27

Author(s):

Donggeun Kim ◽

Sangwoo Park ◽

Donggoo Kang ◽

Joonki Paik

Keyword(s):

Object Detection ◽

Real Time ◽

Feature Fusion ◽

Single Shot

Download Full-text

Small Object Detection Algorithm Based on Feature Pyramid-Enhanced Fusion SSD

Complexity ◽

10.1155/2019/7297960 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Haotian Li ◽

Kezheng Lin ◽

Jingxuan Bai ◽

Ao Li ◽

Jiali Yu

Keyword(s):

Object Detection ◽

Detection Rate ◽

Detection Algorithm ◽

Single Shot ◽

Small Object ◽

Feature Maps ◽

Scale Invariant ◽

Feature Pyramid ◽

Small Object Detection ◽

Detection And Localization

In order to improve the detection rate of the traditional single-shot multibox detection algorithm in small object detection, a feature-enhanced fusion SSD object detection algorithm based on the pyramid network is proposed. Firstly, the selected multiscale feature layer is merged with the scale-invariant convolutional layer through the feature pyramid network structure; at the same time, the multiscale feature map is separately converted into the channel number using the scale-invariant convolution kernel. Then, the obtained two sets of pyramid-shaped feature layers are further feature fused to generate a set of enhanced multiscale feature maps, and the scale-invariant convolution is performed again on these layers. Finally, the obtained layer is used for detection and localization. The final location coordinates and confidence are output after nonmaximum suppression. Experimental results on the Pascal VOC 2007 and 2012 datasets confirm that there is a 8.2% improvement in mAP compared to the original SSD and some existing algorithms.

Download Full-text

Real-Time Object Detection in Remote Sensing Images Based on Visual Perception and Memory Reasoning

Electronics ◽

10.3390/electronics8101151 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1151 ◽

Cited By ~ 4

Author(s):

Xia Hua ◽

Xinqing Wang ◽

Ting Rui ◽

Dong Wang ◽

Faming Shao

Keyword(s):

Remote Sensing ◽

Visual Perception ◽

Object Detection ◽

Real Time ◽

Detection Accuracy ◽

Small Object ◽

Remote Sensing Images ◽

Feature Maps ◽

Convolutional Network ◽

Fully Convolutional Network

Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed of two fully convolutional networks, namely, the strengthened object self-attention pre-screening fully convolutional network (SOSA-FCN) and the object accurate detection fully convolutional network (AD-FCN). SOSA-FCN introduces a self-attention module to extract attention feature maps and constructs a depth feature pyramid to optimize the attention feature maps by combining convolutional long-term and short-term memory networks. It guides the acquisition of potential sub-regions of the object in the scene, reduces the computational complexity, and enhances the network’s ability to extract multi-scale object features. It adapts to the complex background and small object characteristics of a large-scene remote sensing image. In AD-FCN, the object mask and object orientation estimation layer are designed to achieve fine positioning of candidate frames. The performance of the proposed algorithm is compared with that of other advanced methods on NWPU_VHR-10, DOTA, UCAS-AOD, and other open datasets. The experimental results show that the proposed algorithm significantly improves the efficiency of object detection while ensuring detection accuracy and has high adaptability. It has extensive engineering application prospects.

Download Full-text