GC-YOLOv3: You Only Look Once with Global Context Block

Yang Yang; Hongmin Deng

doi:10.3390/electronics9081235

GC-YOLOv3: You Only Look Once with Global Context Block

Electronics ◽

10.3390/electronics9081235 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1235

Author(s):

Yang Yang ◽

Hongmin Deng

Keyword(s):

Object Detection ◽

Irrelevant Information ◽

Detection Algorithm ◽

Visual Object ◽

Detection Accuracy ◽

Feature Maps ◽

Average Precision ◽

Global Context ◽

Pascal Voc ◽

Feature Pyramid

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.

Download Full-text

Small Object Detection Algorithm Based on Feature Pyramid-Enhanced Fusion SSD

Complexity ◽

10.1155/2019/7297960 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Haotian Li ◽

Kezheng Lin ◽

Jingxuan Bai ◽

Ao Li ◽

Jiali Yu

Keyword(s):

Object Detection ◽

Detection Rate ◽

Detection Algorithm ◽

Single Shot ◽

Small Object ◽

Feature Maps ◽

Scale Invariant ◽

Feature Pyramid ◽

Small Object Detection ◽

Detection And Localization

In order to improve the detection rate of the traditional single-shot multibox detection algorithm in small object detection, a feature-enhanced fusion SSD object detection algorithm based on the pyramid network is proposed. Firstly, the selected multiscale feature layer is merged with the scale-invariant convolutional layer through the feature pyramid network structure; at the same time, the multiscale feature map is separately converted into the channel number using the scale-invariant convolution kernel. Then, the obtained two sets of pyramid-shaped feature layers are further feature fused to generate a set of enhanced multiscale feature maps, and the scale-invariant convolution is performed again on these layers. Finally, the obtained layer is used for detection and localization. The final location coordinates and confidence are output after nonmaximum suppression. Experimental results on the Pascal VOC 2007 and 2012 datasets confirm that there is a 8.2% improvement in mAP compared to the original SSD and some existing algorithms.

Download Full-text

Air-to-ground multimodal object detection algorithm based on feature association learning

International Journal of Advanced Robotic Systems ◽

10.1177/1729881419842995 ◽

2019 ◽

Vol 16 (3) ◽

pp. 172988141984299 ◽

Cited By ~ 1

Author(s):

Dongfang Yang ◽

Xing Liu ◽

Hao He ◽

Yongfei Li

Keyword(s):

Visible Light ◽

Object Detection ◽

Weather Conditions ◽

Detection Algorithm ◽

Detection Accuracy ◽

Small Object ◽

Feature Maps ◽

Data Set ◽

Association Learning ◽

Feature Association

Detecting objects on unmanned aerial vehicles is a hard task, due to the long visual distance and the subsequent small size and lack of view. Besides, the traditional ground observation manners based on visible light camera are sensitive to brightness. This article aims to improve the target detection accuracy in various weather conditions, by using both visible light camera and infrared camera simultaneously. In this article, an association network of multimodal feature maps on the same scene is used to design an object detection algorithm, which is the so-called feature association learning method. In addition, this article collects a new cross-modal detection data set and proposes a cross-modal object detection algorithm based on visible light and infrared observations. The experimental results show that the algorithm improves the detection accuracy of small objects in the air-to-ground view. The multimodal joint detection network can overcome the influence of illumination in different weather conditions, which provides a new detection means and ideas for the space-based unmanned platform to the small object detection task.

Download Full-text

Research on Object Detection Algorithm Based on Multilayer Information Fusion

Mathematical Problems in Engineering ◽

10.1155/2020/9076857 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Bao-Yuan Chen ◽

Yu-Kun Shen ◽

Kun Sun

Keyword(s):

Feature Extraction ◽

Object Detection ◽

Feature Fusion ◽

Basic Feature ◽

Detection Algorithm ◽

Mean Average Precision ◽

Detection Accuracy ◽

Average Precision ◽

Position Information ◽

The Mean

At present, object detectors based on convolution neural networks generally rely on the last layer of features extracted by the feature extraction network. In the process of continuous convolution and pooling of deep features, the position information cannot be completely transferred backward. This paper proposes a multiscale feature reuse detection model, which includes the basic feature extraction network DenseNet, feature fusion network, multiscale anchor region proposal network, and classification and regression network. The fusion of high-dimensional features and low-dimensional features not only strengthens the model's sensitivity to objects of different sizes but also strengthens the transmission of information, so that the feature map has rich deep semantic information and shallow location information at the same time, which significantly improves the robustness and detection accuracy of the model. The algorithm is trained and tested in Pascal VOC2007 dataset. The experimental results show that the mean average precision of the objects in the dataset is 73.87%. At the same time, compared with the mainstream faster RCNN and SSD detection models, the mean average precision of object detection algorithm based on DenseNet is improved by 5.63% and 3.86%, respectively.

Download Full-text

A new multi-scale backbone network for object detection based on asymmetric convolutions

Science Progress ◽

10.1177/00368504211011343 ◽

2021 ◽

Vol 104 (2) ◽

pp. 003685042110113

Author(s):

Xianghua Ma ◽

Zhenkun Yang

Keyword(s):

Object Detection ◽

Image Features ◽

Detection Accuracy ◽

Mobile Platforms ◽

Multi Scale ◽

Backbone Network ◽

Aspect Ratios ◽

Pascal Voc ◽

Scale Characteristics ◽

Detection Speed

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.

Download Full-text

A Set of Single YOLO Modalities to Detect Occluded Entities via Viewpoint Conversion

Applied Sciences ◽

10.3390/app11136016 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6016

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Object Detection ◽

Autonomous Vehicles ◽

Autonomous Driving ◽

Detection Algorithm ◽

Detection Accuracy ◽

Cloud Data ◽

Detection Techniques ◽

Bounding Boxes ◽

Partially Occluded ◽

Rgb Image

For autonomous vehicles, it is critical to be aware of the driving environment to avoid collisions and drive safely. The recent evolution of convolutional neural networks has contributed significantly to accelerating the development of object detection techniques that enable autonomous vehicles to handle rapid changes in various driving environments. However, collisions in an autonomous driving environment can still occur due to undetected obstacles and various perception problems, particularly occlusion. Thus, we propose a robust object detection algorithm for environments in which objects are truncated or occluded by employing RGB image and light detection and ranging (LiDAR) bird’s eye view (BEV) representations. This structure combines independent detection results obtained in parallel through “you only look once” networks using an RGB image and a height map converted from the BEV representations of LiDAR’s point cloud data (PCD). The region proposal of an object is determined via non-maximum suppression, which suppresses the bounding boxes of adjacent regions. A performance evaluation of the proposed scheme was performed using the KITTI vision benchmark suite dataset. The results demonstrate the detection accuracy in the case of integration of PCD BEV representations is superior to when only an RGB camera is used. In addition, robustness is improved by significantly enhancing detection accuracy even when the target objects are partially occluded when viewed from the front, which demonstrates that the proposed algorithm outperforms the conventional RGB-based model.

Download Full-text

Revisiting knowledge distillation for light-weight visual object detection

Transactions of the Institute of Measurement and Control ◽

10.1177/01423312211022877 ◽

2021 ◽

Vol 43 (13) ◽

pp. 2888-2898

Author(s):

Tianze Gao ◽

Yunfeng Gao ◽

Yu Li ◽

Peiyuan Qin

Keyword(s):

Object Detection ◽

Essential Element ◽

Detection Algorithm ◽

Positive Sample ◽

Detection Methods ◽

Visual Object ◽

Light Weight ◽

Model Compression ◽

Novel Approach ◽

Knowledge Distillation

An essential element for intelligent perception in mechatronic and robotic systems (M&RS) is the visual object detection algorithm. With the ever-increasing advance of artificial neural networks (ANN), researchers have proposed numerous ANN-based visual object detection methods that have proven to be effective. However, networks with cumbersome structures do not befit the real-time scenarios in M&RS, necessitating the techniques of model compression. In the paper, a novel approach to training light-weight visual object detection networks is developed by revisiting knowledge distillation. Traditional knowledge distillation methods are oriented towards image classification is not compatible with object detection. Therefore, a variant of knowledge distillation is developed and adapted to a state-of-the-art keypoint-based visual detection method. Two strategies named as positive sample retaining and early distribution softening are employed to yield a natural adaption. The mutual consistency between teacher model and student model is further promoted through a hint-based distillation. By extensive controlled experiments, the proposed method is testified to be effective in enhancing the light-weight network’s performance by a large margin.

Download Full-text

Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network

Sensors ◽

10.3390/s18103341 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3341 ◽

Cited By ~ 40

Author(s):

Hilal Tayara ◽

Kil Chong

Keyword(s):

High Resolution ◽

Object Detection ◽

Computation Time ◽

Aerial Images ◽

Feature Maps ◽

Two Stage ◽

One Stage ◽

Wide Range ◽

Feature Pyramid ◽

Very High

Object detection in very high-resolution (VHR) aerial images is an essential step for a wide range of applications such as military applications, urban planning, and environmental management. Still, it is a challenging task due to the different scales and appearances of the objects. On the other hand, object detection task in VHR aerial images has improved remarkably in recent years due to the achieved advances in convolution neural networks (CNN). Most of the proposed methods depend on a two-stage approach, namely: a region proposal stage and a classification stage such as Faster R-CNN. Even though two-stage approaches outperform the traditional methods, their optimization is not easy and they are not suitable for real-time applications. In this paper, a uniform one-stage model for object detection in VHR aerial images has been proposed. In order to tackle the challenge of different scales, a densely connected feature pyramid network has been proposed by which high-level multi-scale semantic feature maps with high-quality information are prepared for object detection. This work has been evaluated on two publicly available datasets and outperformed the current state-of-the-art results on both in terms of mean average precision (mAP) and computation time.

Download Full-text

Object Detection Algorithm Based on Multiheaded Attention

Applied Sciences ◽

10.3390/app9091829 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1829 ◽

Cited By ~ 1

Author(s):

Jie Jiang ◽

Hui Xu ◽

Shichang Zhang ◽

Yujie Fang

Keyword(s):

Object Detection ◽

Linear Interpolation ◽

Detection Algorithm ◽

Attention Mechanism ◽

Visual Object ◽

Single Shot ◽

Object Class ◽

Feature Information ◽

Base Network ◽

Detector Model

This study proposes a multiheaded object detection algorithm referred to as MANet. The main purpose of the study is to integrate feature layers of different scales based on the attention mechanism and to enhance contextual connections. To achieve this, we first replaced the feed-forward base network of the single-shot detector with the ResNet–101 (inspired by the Deconvolutional Single-Shot Detector) and then applied linear interpolation and the attention mechanism. The information of the feature layers at different scales was fused to improve the accuracy of target detection. The primary contributions of this study are the propositions of (a) a fusion attention mechanism, and (b) a multiheaded attention fusion method. Our final MANet detector model effectively unifies the feature information among the feature layers at different scales, thus enabling it to detect objects with different sizes and with higher precision. We used the 512 × 512 input MANet (the backbone is ResNet–101) to obtain a mean accuracy of 82.7% based on the PASCAL visual object class 2007 test. These results demonstrated that our proposed method yielded better accuracy than those provided by the conventional Single-shot detector (SSD) and other advanced detectors.

Download Full-text

A New Steel Defect Detection Algorithm Based on Deep Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5592878 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Weidong Zhao ◽

Feng Chen ◽

Hancheng Huang ◽

Dan Li ◽

Wei Cheng

Keyword(s):

Deep Learning ◽

Target Detection ◽

Defect Detection ◽

Steel Surface ◽

Surface Defect ◽

Detection Algorithm ◽

Detection Accuracy ◽

Average Precision ◽

Original Algorithm ◽

Surface Defect Detection

In recent years, more and more scholars devoted themselves to the research of the target detection algorithm due to the continuous development of deep learning. Among them, the detection and recognition of small and complex targets are still a problem to be solved. The authors of this article have understood the shortcomings of the deep learning detection algorithm in detecting small and complex defect targets and would like to share a new improved target detection algorithm in steel surface defect detection. The steel surface defects will affect the quality of steel seriously. We find that most of the current detection algorithms for NEU-DET dataset detection accuracy are low, so we choose to verify a steel surface defect detection algorithm based on machine vision on this dataset for the problem of defect detection in steel production. A series of improvement measures are carried out in the traditional Faster R-CNN algorithm, such as reconstructing the network structure of Faster R-CNN. Based on the small features of the target, we train the network with multiscale fusion. For the complex features of the target, we replace part of the conventional convolution network with a deformable convolution network. The experimental results show that the deep learning network model trained by the proposed method has good detection performance, and the mean average precision is 0.752, which is 0.128 higher than the original algorithm. Among them, the average precision of crazing, inclusion, patches, pitted surface, rolled in scale and scratches is 0.501, 0.791, 0.792, 0.874, 0.649, and 0.905, respectively. The detection method is able to identify small target defects on the steel surface effectively, which can provide a reference for the automatic detection of steel defects.

Download Full-text

A Region-Based Efficient Network for Accurate Object Detection

Traitement du signal ◽

10.18280/ts.380228 ◽

2021 ◽

Vol 38 (2) ◽

pp. 481-494

Author(s):

Yurong Guan ◽

Muhammad Aamir ◽

Zhihua Hu ◽

Waheed Ahmed Abro ◽

Ziaur Rahman ◽

...

Keyword(s):

Object Detection ◽

Detection Efficiency ◽

Visual Object ◽

Average Precision ◽

High Quality ◽

Image Objects ◽

Object Proposal ◽

Proposed Model ◽

Image Object Detection ◽

Image Object

Object detection in images is an important task in image processing and computer vision. Many approaches are available for object detection. For example, there are numerous algorithms for object positioning and classification in images. However, the current methods perform poorly and lack experimental verification. Thus, it is a fascinating and challenging issue to position and classify image objects. Drawing on the recent advances in image object detection, this paper develops a region-baed efficient network for accurate object detection in images. To improve the overall detection performance, image object detection was treated as a twofold problem, involving object proposal generation and object classification. First, a framework was designed to generate high-quality, class-independent, accurate proposals. Then, these proposals, together with their input images, were imported to our network to learn convolutional features. To boost detection efficiency, the number of proposals was reduced by a network refinement module, leaving only a few eligible candidate proposals. After that, the refined candidate proposals were loaded into the detection module to classify the objects. The proposed model was tested on the test set of the famous PASCAL Visual Object Classes Challenge 2007 (VOC2007). The results clearly demonstrate that our model achieved robust overall detection efficiency over existing approaches using fewer or more proposals, in terms of recall, mean average best overlap (MABO), and mean average precision (mAP).

Download Full-text