An object detection network based on YOLOv4 and improved spatial attention mechanism

In recent years, the research on object detection has been intensified. A large number of object detection results are applied to our daily life, which greatly facilitates our work and life. In this paper, we propose a more effective object detection neural network model ENHANCE_YOLOV4. We studied the effects of several attention mechanisms on YOLOV4, and finally concluded that spatial attention mechanism had the best effect on YOLOV4. Therefore, based on previous studies, this paper introduces Dilated Convolution and one-by-one convolution into the spatial attention mechanism to expand the receptive field and combine channel information. Compared with CBAM and BAM, which are composed of spatial attention and channel attention, this improved spatial attention module reduces model parameters and improves detection capabilities. We built a new network model by embedding improved spatial attention module in the appropriate place in YOLOV4. And this paper proves that the detection accuracy of this network structure on the VOC data set is increased by 0.8%, and the detection accuracy on the coco data set is increased by 7%when the calculation performance is increased a little.

Download Full-text

Multiple-Oriented and Small Object Detection with Convolutional Neural Networks for Aerial Image

Remote Sensing ◽

10.3390/rs11182176 ◽

2019 ◽

Vol 11 (18) ◽

pp. 2176 ◽

Cited By ~ 3

Author(s):

Chen ◽

Zhong ◽

Tan

Keyword(s):

Neural Networks ◽

Object Detection ◽

Convolutional Neural Networks ◽

Aerial Images ◽

Superior Performance ◽

Aerial Image ◽

Detection Accuracy ◽

Small Object ◽

Data Set ◽

Orientation Information

Detecting objects in aerial images is a challenging task due to multiple orientations and relatively small size of the objects. Although many traditional detection models have demonstrated an acceptable performance by using the imagery pyramid and multiple templates in a sliding-window manner, such techniques are inefficient and costly. Recently, convolutional neural networks (CNNs) have successfully been used for object detection, and they have demonstrated considerably superior performance than that of traditional detection methods; however, this success has not been expanded to aerial images. To overcome such problems, we propose a detection model based on two CNNs. One of the CNNs is designed to propose many object-like regions that are generated from the feature maps of multi scales and hierarchies with the orientation information. Based on such a design, the positioning of small size objects becomes more accurate, and the generated regions with orientation information are more suitable for the objects arranged with arbitrary orientations. Furthermore, another CNN is designed for object recognition; it first extracts the features of each generated region and subsequently makes the final decisions. The results of the extensive experiments performed on the vehicle detection in aerial imagery (VEDAI) and overhead imagery research data set (OIRDS) datasets indicate that the proposed model performs well in terms of not only the detection accuracy but also the detection speed.

Download Full-text

Object Detection Network Based on Feature Fusion and Attention Mechanism

Future Internet ◽

10.3390/fi11010009 ◽

2019 ◽

Vol 11 (1) ◽

pp. 9 ◽

Cited By ~ 6

Author(s):

Ying Zhang ◽

Yimin Chen ◽

Chen Huang ◽

Mingke Gao

Keyword(s):

Object Detection ◽

Feature Fusion ◽

Empirical Evaluation ◽

Attention Mechanism ◽

Detection Accuracy ◽

Small Object ◽

Art Object ◽

Pascal Voc ◽

Almost All ◽

The Impact

In recent years, almost all of the current top-performing object detection networks use CNN (convolutional neural networks) features. State-of-the-art object detection networks depend on CNN features. In this work, we add feature fusion in the object detection network to obtain a better CNN feature, which incorporates well deep, but semantic, and shallow, but high-resolution, CNN features, thus improving the performance of a small object. Also, the attention mechanism was applied to our object detection network, AF R-CNN (attention mechanism and convolution feature fusion based object detection), to enhance the impact of significant features and weaken background interference. Our AF R-CNN is a single end to end network. We choose the pre-trained network, VGG-16, to extract CNN features. Our detection network is trained on the dataset, PASCAL VOC 2007 and 2012. Empirical evaluation of the PASCAL VOC 2007 dataset demonstrates the effectiveness and improvement of our approach. Our AF R-CNN achieves an object detection accuracy of 75.9% on PASCAL VOC 2007, six points higher than Faster R-CNN.

Download Full-text

Multi-scale traffic sign detection model with attention

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020950054 ◽

2020 ◽

pp. 095440702095005

Author(s):

Bei Bei Fan ◽

He Yang

Keyword(s):

Spatial Attention ◽

Detection Algorithm ◽

Attention Mechanism ◽

Background Information ◽

Traffic Sign ◽

Data Set ◽

Multi Scale ◽

Sign Detection ◽

Object Area ◽

Traffic Sign Detection

The current traffic sign detection technology is disturbed by factors such as illumination changes, weather, and camera angle, which makes it unsatisfactory for traffic sign detection. The traffic sign data set usually contains a large number of small objects, and the scale variance of the object is a huge challenge for traffic indication detection. In response to the above problems, a multi-scale traffic sign detection algorithm based on attention mechanism is proposed. The attention mechanism is composed of channel attention mechanism and spatial attention mechanism. By filtering the background information on redundant contradictions with channel attention mechanism in the network, the information on the network is more accurate, and the performance of the network to recognize the traffic signs is improved. Using spatial attention mechanism, the proposed method pays more attention to the object area in traffic recognition image and suppresses the non-object area or background areas. The model in this paper is validated on the Tsinghua-Tencent 100K data set, and the accuracy of the experiment reached a higher level compared to state-of-the-art approaches in traffic sign detection.

Download Full-text

Automatic Pixel-Level Pavement Crack Recognition Using a Deep Feature Aggregation Segmentation Network with a scSE Attention Mechanism Module

Sensors ◽

10.3390/s21092902 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2902

Author(s):

Wenting Qiao ◽

Qiangwei Liu ◽

Xiaoguang Wu ◽

Biao Ma ◽

Gang Li

Keyword(s):

Deep Learning ◽

Crack Detection ◽

Attention Mechanism ◽

Model Parameters ◽

Detection Accuracy ◽

Safe Driving ◽

Deep Feature ◽

Feature Aggregation ◽

Pavement Crack Detection ◽

Detection Speed

Pavement crack detection is essential for safe driving. The traditional manual crack detection method is highly subjective and time-consuming. Hence, an automatic pavement crack detection system is needed to facilitate this progress. However, this is still a challenging task due to the complex topology and large noise interference of crack images. Recently, although deep learning-based technologies have achieved breakthrough progress in crack detection, there are still some challenges, such as large parameters and low detection efficiency. Besides, most deep learning-based crack detection algorithms find it difficult to establish good balance between detection accuracy and detection speed. Inspired by the latest deep learning technology in the field of image processing, this paper proposes a novel crack detection algorithm based on the deep feature aggregation network with the spatial-channel squeeze & excitation (scSE) attention mechanism module, which calls CrackDFANet. Firstly, we cut the collected crack images into 512 × 512 pixel image blocks to establish a crack dataset. Then through iterative optimization on the training and validation sets, we obtained a crack detection model with good robustness. Finally, the CrackDFANet model verified on a total of 3516 images in five datasets with different sizes and containing different noise interferences. Experimental results show that the trained CrackDFANet has strong anti-interference ability, and has better robustness and generalization ability under the interference of light interference, parking line, water stains, plant disturbance, oil stains, and shadow conditions. Furthermore, the CrackDFANet is found to be better than other state-of-the-art algorithms with more accurate detection effect and faster detection speed. Meanwhile, our algorithm model parameters and error rates are significantly reduced.

Download Full-text

SSD7-FFAM: A Real-Time Object Detection Network Friendly to Embedded Devices from Scratch

Applied Sciences ◽

10.3390/app11031096 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1096

Author(s):

Qing Li ◽

Yingcheng Lin ◽

Wei He

Keyword(s):

Object Detection ◽

Real Time ◽

Large Scale ◽

Feature Fusion ◽

Contextual Information ◽

Attention Mechanism ◽

Detection Accuracy ◽

Single Shot ◽

Feature Maps ◽

Embedded Devices

The high requirements for computing and memory are the biggest challenges in deploying existing object detection networks to embedded devices. Living lightweight object detectors directly use lightweight neural network architectures such as MobileNet or ShuffleNet pre-trained on large-scale classification datasets, which results in poor network structure flexibility and is not suitable for some specific scenarios. In this paper, we propose a lightweight object detection network Single-Shot MultiBox Detector (SSD)7-Feature Fusion and Attention Mechanism (FFAM), which saves storage space and reduces the amount of calculation by reducing the number of convolutional layers. We offer a novel Feature Fusion and Attention Mechanism (FFAM) method to improve detection accuracy. Firstly, the FFAM method fuses high-level semantic information-rich feature maps with low-level feature maps to improve small objects’ detection accuracy. The lightweight attention mechanism cascaded by channels and spatial attention modules is employed to enhance the target’s contextual information and guide the network to focus on its easy-to-recognize features. The SSD7-FFAM achieves 83.7% mean Average Precision (mAP), 1.66 MB parameters, and 0.033 s average running time on the NWPU VHR-10 dataset. The results indicate that the proposed SSD7-FFAM is more suitable for deployment to embedded devices for real-time object detection.

Download Full-text

An Improved Neural Network Model Based on Visual Attention Mechanism for Object Detection

Proceedings of the 2019 International Conference on Big Data, Electronics and Communication Engineering (BDECE 2019) ◽

10.2991/acsr.k.191223.035 ◽

2019 ◽

Author(s):

Zeren Jiang

Keyword(s):

Neural Network ◽

Visual Attention ◽

Object Detection ◽

Network Model ◽

Neural Network Model ◽

Attention Mechanism ◽

Model Based ◽

Visual Attention Mechanism

Download Full-text

Small Object Detection in Traffic Scenes Based on YOLO-MXANet

Sensors ◽

10.3390/s21217422 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7422

Author(s):

Xiaowei He ◽

Rao Cheng ◽

Zhonglong Zheng ◽

Zeji Wang

Keyword(s):

Object Detection ◽

Spatial Attention ◽

Training Model ◽

Activation Function ◽

Model Complexity ◽

Detection Accuracy ◽

Small Object ◽

Enhancement Method ◽

Detection Speed ◽

Improved Algorithm

In terms of small objects in traffic scenes, general object detection algorithms have low detection accuracy, high model complexity, and slow detection speed. To solve the above problems, an improved algorithm (named YOLO-MXANet) is proposed in this paper. Complete-Intersection over Union (CIoU) is utilized to improve loss function for promoting the positioning accuracy of the small object. In order to reduce the complexity of the model, we present a lightweight yet powerful backbone network (named SA-MobileNeXt) that incorporates channel and spatial attention. Our approach can extract expressive features more effectively by applying the Shuffle Channel and Spatial Attention (SCSA) module into the SandGlass Block (SGBlock) module while increasing the parameters by a small number. In addition, the data enhancement method combining Mosaic and Mixup is employed to improve the robustness of the training model. The Multi-scale Feature Enhancement Fusion (MFEF) network is proposed to fuse the extracted features better. In addition, the SiLU activation function is utilized to optimize the Convolution-Batchnorm-Leaky ReLU (CBL) module and the SGBlock module to accelerate the convergence of the model. The ablation experiments on the KITTI dataset show that each improved method is effective. The improved algorithm reduces the complexity and detection speed of the model while improving the object detection accuracy. The comparative experiments on the KITTY dataset and CCTSDB dataset with other algorithms show that our algorithm also has certain advantages.

Download Full-text

Gray-Edge-HOG feature based cascaded learning for facial landmark detection

MATEC Web of Conferences ◽

10.1051/matecconf/201818910023 ◽

2018 ◽

Vol 189 ◽

pp. 10023

Author(s):

Wenhui Zhang ◽

Wentong Wang ◽

Shuang Zhao ◽

Bin Sun

Keyword(s):

Neural Network ◽

Network Model ◽

Neural Network Model ◽

Deep Neural Network ◽

Facial Feature ◽

Appearance Model ◽

Detection Accuracy ◽

Feature Points ◽

Data Set ◽

Facial Landmark Detection

Compared with the traditional statistical models, such as the active shape model and the active appearance model, the facial feature point localization method based on deep learning has improved in accuracy and speed, but there still exist some problems. First, when the traditional deep neural network model targets a data set containing different face poses, it only performs the preprocessing through the initialized face alignment, and does not consider the regularity of the distribution of the feature points corresponding to the face pose during feature extraction. Secondly, the traditional deep neural network model does not take into account the feature space differences caused by the different position distribution of the external contour points and internal organ points (such as eyes, nose and mouth), resulting in inconsistent detection accuracy and difficulty of different feature points. In order to solve the above problems this paper proposes a convolutional neural network (CNN) based on grayedge-HOG (GEH) fusion feature.

Download Full-text

Road Object Detection of YOLO Algorithm with Attention Mechanism

Frontiers in Signal Processing ◽

10.22606/fsp.2021.51002 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Jiacheng Li ◽

◽

Huazhang Wang ◽

Yuan Xu ◽

Fan Liu ◽

...

Keyword(s):

Feature Extraction ◽

Object Detection ◽

Spatial Attention ◽

High Precision ◽

Detection Algorithm ◽

Attention Mechanism ◽

Experimental Results ◽

Automatic Driving ◽

Key Features ◽

Detection Effect

In auto-driving cars, incorrect object detection can lead to serious accidents, so high-precision object detection is the key to automatic driving. This paper improves on the YOLOv3 object detection algorithm, and introduces the channel attention mechanism and spatial attention mechanism into the feature extraction network, which is used to autonomously learn the weight of each channel, enhance key features, and suppress redundant features. Experimental results show that the detection effect of the improved network algorithm is significantly higher than that of the YOLOv3 algorithm.

Download Full-text

A Real-Time Object Detector for Autonomous Vehicles Based on YOLOv4

Computational Intelligence and Neuroscience ◽

10.1155/2021/9218137 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Rui Wang ◽

Ziyue Wang ◽

Zhengwei Xu ◽

Chi Wang ◽

Qiang Li ◽

...

Keyword(s):

Object Detection ◽

Real Time ◽

High Speed ◽

Feature Fusion ◽

Autonomous Driving ◽

Detection Algorithm ◽

Model Parameters ◽

Detection Accuracy ◽

Time Operation ◽

On The Road

Object detection is an important part of autonomous driving technology. To ensure the safe running of vehicles at high speed, real-time and accurate detection of all the objects on the road is required. How to balance the speed and accuracy of detection is a hot research topic in recent years. This paper puts forward a one-stage object detection algorithm based on YOLOv4, which improves the detection accuracy and supports real-time operation. The backbone of the algorithm doubles the stacking times of the last residual block of CSPDarkNet53. The neck of the algorithm replaces the SPP with the RFB structure, improves the PAN structure of the feature fusion module, adds the attention mechanism CBAM and CA structure to the backbone and neck structure, and finally reduces the overall width of the network to the original 3/4, so as to reduce the model parameters and improve the inference speed. Compared with YOLOv4, the algorithm in this paper improves the average accuracy on KITTI dataset by 2.06% and BDD dataset by 2.95%. When the detection accuracy is almost unchanged, the inference speed of this algorithm is increased by 9.14%, and it can detect in real time at a speed of more than 58.47 FPS.

Download Full-text