Object Detection Network Based on Feature Fusion and Attention Mechanism

In recent years, almost all of the current top-performing object detection networks use CNN (convolutional neural networks) features. State-of-the-art object detection networks depend on CNN features. In this work, we add feature fusion in the object detection network to obtain a better CNN feature, which incorporates well deep, but semantic, and shallow, but high-resolution, CNN features, thus improving the performance of a small object. Also, the attention mechanism was applied to our object detection network, AF R-CNN (attention mechanism and convolution feature fusion based object detection), to enhance the impact of significant features and weaken background interference. Our AF R-CNN is a single end to end network. We choose the pre-trained network, VGG-16, to extract CNN features. Our detection network is trained on the dataset, PASCAL VOC 2007 and 2012. Empirical evaluation of the PASCAL VOC 2007 dataset demonstrates the effectiveness and improvement of our approach. Our AF R-CNN achieves an object detection accuracy of 75.9% on PASCAL VOC 2007, six points higher than Faster R-CNN.

Download Full-text

Small Object Detection in Traffic Scenes Based on Attention Feature Fusion

Sensors ◽

10.3390/s21093031 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3031

Author(s):

Jing Lian ◽

Yuhang Yin ◽

Linhui Li ◽

Zhenghao Wang ◽

Yafu Zhou

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Contextual Information ◽

Detection Accuracy ◽

Small Object ◽

Limited Information ◽

Feature Maps ◽

Multi Scale ◽

Validation Set ◽

Small Object Detection

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.

Download Full-text

SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Applied Sciences ◽

10.3390/app11031096 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1096

Author(s):

Qing Li ◽

Yingcheng Lin ◽

Wei He

Keyword(s):

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Contextual Information ◽

Attention Mechanism ◽

Detection Accuracy ◽

Single Shot ◽

Feature Maps ◽

Embedded Devices

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Download Full-text

Real-Time Small Drones Detection Based on Pruned YOLOv4

Sensors ◽

10.3390/s21103374 ◽

2021 ◽

Vol 21 (10) ◽

pp. 3374

Author(s):

Hansen Liu ◽

Kuangang Fan ◽

Qinghua Ouyang ◽

Na Li

Keyword(s):

Object Detection ◽

Real Time ◽

Processing Speed ◽

State Of The Art ◽

Detection Methods ◽

Detection Accuracy ◽

Small Object ◽

Art Object ◽

Real Time Detection ◽

Small Object Detection

To address the threat of drones intruding into high-security areas, the real-time detection of drones is urgently required to protect these areas. There are two main difficulties in real-time detection of drones. One of them is that the drones move quickly, which leads to requiring faster detectors. Another problem is that small drones are difficult to detect. In this paper, firstly, we achieve high detection accuracy by evaluating three state-of-the-art object detection methods: RetinaNet, FCOS, YOLOv3 and YOLOv4. Then, to address the first problem, we prune the convolutional channel and shortcut layer of YOLOv4 to develop thinner and shallower models. Furthermore, to improve the accuracy of small drone detection, we implement a special augmentation for small object detection by copying and pasting small drones. Experimental results verify that compared to YOLOv4, our pruned-YOLOv4 model, with 0.8 channel prune rate and 24 layers prune, achieves 90.5% mAP and its processing speed is increased by 60.4%. Additionally, after small object augmentation, the precision and recall of the pruned-YOLOv4 almost increases by 22.8% and 12.7%, respectively. Experiment results verify that our pruned-YOLOv4 is an effective and accurate approach for drone detection.

Download Full-text

Improving YOLOv5 with Attention Mechanism for Detecting Boulders from Planetary Images

Remote Sensing ◽

10.3390/rs13183776 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3776

Author(s):

Linlin Zhu ◽

Xun Geng ◽

Zheng Li ◽

Chun Liu

Keyword(s):

Object Detection ◽

Detection Method ◽

Feature Fusion ◽

Attention Mechanism ◽

Detection Methods ◽

Visual Object ◽

Art Object ◽

Geological Processes ◽

Landing Sites ◽

New Feature

It is of great significance to apply the object detection methods to automatically detect boulders from planetary images and analyze their distribution. This contributes to the selection of candidate landing sites and the understanding of the geological processes. This paper improves the state-of-the-art object detection method of YOLOv5 with attention mechanism and designs a pyramid based approach to detect boulders from planetary images. A new feature fusion layer has been designed to capture more shallow features of the small boulders. The attention modules implemented by combining the convolutional block attention module (CBAM) and efficient channel attention network (ECA-Net) are also added into YOLOv5 to highlight the information that contribute to boulder detection. Based on the Pascal Visual Object Classes 2007 (VOC2007) dataset which is widely used for object detection evaluations and the boulder dataset that we constructed from the images of Bennu asteroid, the evaluation results have shown that the improvements have increased the performance of YOLOv5 by 3.4% in precision. With the improved YOLOv5 detection method, the pyramid based approach extracts several layers of images with different resolutions from the large planetary images and detects boulders of different scales from different layers. We have also applied the proposed approach to detect the boulders on Bennu asteroid. The distribution of the boulders on Bennu asteroid has been analyzed and presented.

Download Full-text

A new multi-scale backbone network for object detection based on asymmetric convolutions

Science Progress ◽

10.1177/00368504211011343 ◽

2021 ◽

Vol 104 (2) ◽

pp. 003685042110113

Author(s):

Xianghua Ma ◽

Zhenkun Yang

Keyword(s):

Object Detection ◽

Image Features ◽

Detection Accuracy ◽

Mobile Platforms ◽

Multi Scale ◽

Backbone Network ◽

Aspect Ratios ◽

Pascal Voc ◽

Scale Characteristics ◽

Detection Speed

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.

Download Full-text

Malaria parasite detection in thick blood smear microscopic images using modified YOLOV3 and YOLOV4 models

BMC Bioinformatics ◽

10.1186/s12859-021-04036-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Fetulhak Abdurahman ◽

Kinde Anlay Fante ◽

Mohammed Aliy

Keyword(s):

Object Detection ◽

Malaria Parasite ◽

Blood Smear ◽

Clustering Algorithm ◽

State Of The Art ◽

Detection Accuracy ◽

Small Object ◽

Thick Blood Smear ◽

Malaria Parasites ◽

Microscopic Images

Abstract Background Manual microscopic examination of Leishman/Giemsa stained thin and thick blood smear is still the “gold standard” for malaria diagnosis. One of the drawbacks of this method is that its accuracy, consistency, and diagnosis speed depend on microscopists’ diagnostic and technical skills. It is difficult to get highly skilled microscopists in remote areas of developing countries. To alleviate this problem, in this paper, we propose to investigate state-of-the-art one-stage and two-stage object detection algorithms for automated malaria parasite screening from microscopic image of thick blood slides. Results YOLOV3 and YOLOV4 models, which are state-of-the-art object detectors in accuracy and speed, are not optimized for detecting small objects such as malaria parasites in microscopic images. We modify these models by increasing feature scale and adding more detection layers to enhance their capability of detecting small objects without notably decreasing detection speed. We propose one modified YOLOV4 model, called YOLOV4-MOD and two modified models of YOLOV3, which are called YOLOV3-MOD1 and YOLOV3-MOD2. Besides, new anchor box sizes are generated using K-means clustering algorithm to exploit the potential of these models in small object detection. The performance of the modified YOLOV3 and YOLOV4 models were evaluated on a publicly available malaria dataset. These models have achieved state-of-the-art accuracy by exceeding performance of their original versions, Faster R-CNN, and SSD in terms of mean average precision (mAP), recall, precision, F1 score, and average IOU. YOLOV4-MOD has achieved the best detection accuracy among all the other models with a mAP of 96.32%. YOLOV3-MOD2 and YOLOV3-MOD1 have achieved mAP of 96.14% and 95.46%, respectively. Conclusions The experimental results of this study demonstrate that performance of modified YOLOV3 and YOLOV4 models are highly promising for detecting malaria parasites from images captured by a smartphone camera over the microscope eyepiece. The proposed system is suitable for deployment in low-resource setting areas.

Download Full-text

Small Targets Detection for Transmission Tower Based on SRGAN and Faster RCNN

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096514666211026143543 ◽

2021 ◽

Vol 14 ◽

Author(s):

Runze Liu ◽

Guangwei Yan ◽

Hui He ◽

Yubin An ◽

Ting Wang ◽

...

Keyword(s):

Object Detection ◽

Super Resolution ◽

Generative Adversarial Networks ◽

Stable Operation ◽

Detection Accuracy ◽

Small Object ◽

Detection Model ◽

Tower Equipment ◽

Small Object Detection ◽

Small Targets

Background: Power line inspection is essential to ensure the safe and stable operation of the power system. Object detection for tower equipment can significantly improve inspection efficiency. However, due to the low resolution of small targets and limited features, the detection accuracy of small targets is not easy to improve. Objective: This study aimed to improve the tiny targets’ resolution while making the small target's texture and detailed features more prominent to be perceived by the detection model. Methods: In this paper, we propose an algorithm that employs generative adversarial networks to improve small objects' detection accuracy. First, the original image is converted into a super-resolution one by a super-resolution reconstruction network (SRGAN). Then the object detection framework Faster RCNN is utilized to detect objects on the super-resolution images. Result: The experimental results on two small object recognition datasets show that the model proposed in this paper has good robustness. It can especially detect the targets missed by Faster RCNN, which indicates that SRGAN can effectively enhance the detailed information of small targets by improving the resolution. Conclusion: We found that higher resolution data is conducive to obtaining more detailed information of small targets, which can help the detection algorithm achieve higher accuracy. The small object detection model based on the generative adversarial network proposed in this paper is feasible and more efficient. Compared with Faster RCNN, this model has better performance on small object detection.

Download Full-text

FA-YOLO: An Improved YOLO Model for Infrared Occlusion Object Detection under Confusing Background

Wireless Communications and Mobile Computing ◽

10.1155/2021/1896029 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Shuangjiang Du ◽

Baofu Zhang ◽

Pin Zhang ◽

Peng Xiang ◽

Hong Xue

Keyword(s):

Object Detection ◽

Background Information ◽

Detection Accuracy ◽

Infrared Images ◽

Improved Method ◽

Negative Sample ◽

Remote Sensing Images ◽

Complex Background ◽

Small Targets ◽

The Impact

Infrared target detection is a popular applied field in object detection as well as a challenge. This paper proposes the focus and attention mechanism-based YOLO (FA-YOLO), which is an improved method to detect the infrared occluded vehicles in the complex background of remote sensing images. Firstly, we use GAN to create infrared images from the visible datasets to make sufficient datasets for training as well as using transfer learning. Then, to mitigate the impact of the useless and complex background information, we propose the negative sample focusing mechanism to focus on the confusing negative sample training to depress the false positives and increase the detection precision. Finally, to enhance the features of the infrared small targets, we add the dilated convolutional block attention module (dilated CBAM) to the CSPdarknet53 in the YOLOv4 backbone. To verify the superiority of our model, we carefully select 318 infrared occluded vehicle images from the VIVID-infrared dataset for testing. The detection accuracy-mAP improves from 79.24% to 92.95%, and the F1 score improves from 77.92% to 88.13%, which demonstrates a significant improvement in infrared small occluded vehicle detection.

Download Full-text

Deep-Learning-Based Road Crack Detection Frameworks for Dashcam-captured Images under Different Illumination Conditions

10.21203/rs.3.rs-685762/v1 ◽

2021 ◽

Author(s):

Da-Ren Chen ◽

Wei-Min Chiu

Keyword(s):

Object Detection ◽

Large Scale ◽

Crack Detection ◽

State Of The Art ◽

Gaussian Mixture Models ◽

Gaussian Mixture ◽

Machine Learning Techniques ◽

Detection Accuracy ◽

The Road ◽

Art Object

Abstract Machine learning techniques have been used to increase detection accuracy of cracks in road surfaces. Most studies failed to consider variable illumination conditions on the target of interest (ToI), and only focus on detecting the presence or absence of road cracks. This paper proposes a new road crack detection method, IlumiCrack, which integrates Gaussian mixture models (GMM) and object detection CNN models. This work provides the following contributions: 1) For the first time, a large-scale road crack image dataset with a range of illumination conditions (e.g., day and night) is prepared using a dashcam. 2) Based on GMM, experimental evaluations on 2 to 4 levels of brightness are conducted for optimal classification. 3) the IlumiCrack framework is used to integrate state-of-the-art object detecting methods with CNN to classify the road crack images into eight types with high accuracy. Experimental results show that IlumiCrack outperforms the state-of-the-art R-CNN object detection frameworks.

Download Full-text

GC-YOLOv3: You Only Look Once with Global Context Block

Electronics ◽

10.3390/electronics9081235 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1235

Author(s):

Yang Yang ◽

Hongmin Deng

Keyword(s):

Object Detection ◽

Irrelevant Information ◽

Detection Algorithm ◽

Visual Object ◽

Detection Accuracy ◽

Feature Maps ◽

Average Precision ◽

Global Context ◽

Pascal Voc ◽

Feature Pyramid

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.

Download Full-text