DAR-Net: Dense Attentional Residual Network for Vehicle Detection in Aerial Images

With the rapid development of deep learning and the wide usage of Unmanned Aerial Vehicles (UAVs), CNN-based algorithms of vehicle detection in aerial images have been widely studied in the past several years. As a downstream task of the general object detection, there are some differences between the vehicle detection in aerial images and the general object detection in ground view images, e.g., larger image areas, smaller target sizes, and more complex background. In this paper, to improve the performance of this task, a Dense Attentional Residual Network (DAR-Net) is proposed. The proposed network employs a novel dense waterfall residual block (DW res-block) to effectively preserve the spatial information and extract high-level semantic information at the same time. A multiscale receptive field attention (MRFA) module is also designed to select the informative feature from the feature maps and enhance the ability of multiscale perception. Based on the DW res-block and MRFA module, to protect the spatial information, the proposed framework adopts a new backbone that only downsamples the feature map 3 times; i.e., the total downsampling ratio of the proposed backbone is 8. These designs could alleviate the degradation problem, improve the information flow, and strengthen the feature reuse. In addition, deep-projection units are used to reduce the impact of information loss caused by downsampling operations, and the identity mapping is applied to each stage of the proposed backbone to further improve the information flow. The proposed DAR-Net is evaluated on VEDAI, UCAS-AOD, and DOTA datasets. The experimental results demonstrate that the proposed framework outperforms other state-of-the-art algorithms.

Download Full-text

A feature fusion deep-projection convolution neural network for vehicle detection in aerial images

PLoS ONE ◽

10.1371/journal.pone.0250782 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0250782

Author(s):

Bin Wang ◽

Bin Xu

Keyword(s):

Neural Network ◽

Feature Fusion ◽

Rapid Development ◽

Vehicle Detection ◽

Convolution Neural Network ◽

Aerial Images ◽

Semantic Features ◽

General Object ◽

High Level ◽

The Impact

With the rapid development of Unmanned Aerial Vehicles, vehicle detection in aerial images plays an important role in different applications. Comparing with general object detection problems, vehicle detection in aerial images is still a challenging research topic since it is plagued by various unique factors, e.g. different camera angle, small vehicle size and complex background. In this paper, a Feature Fusion Deep-Projection Convolution Neural Network is proposed to enhance the ability to detect small vehicles in aerial images. The backbone of the proposed framework utilizes a novel residual block named stepwise res-block to explore high-level semantic features as well as conserve low-level detail features at the same time. A specially designed feature fusion module is adopted in the proposed framework to further balance the features obtained from different levels of the backbone. A deep-projection deconvolution module is used to minimize the impact of the information contamination introduced by down-sampling/up-sampling processes. The proposed framework has been evaluated by UCAS-AOD, VEDAI, and DOTA datasets. According to the evaluation results, the proposed framework outperforms other state-of-the-art vehicle detection algorithms for aerial images.

Download Full-text

Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network

Sensors ◽

10.3390/s18103341 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3341 ◽

Cited By ~ 40

Author(s):

Hilal Tayara ◽

Kil Chong

Keyword(s):

High Resolution ◽

Object Detection ◽

Computation Time ◽

Aerial Images ◽

Feature Maps ◽

Two Stage ◽

One Stage ◽

Wide Range ◽

Feature Pyramid ◽

Very High

Object detection in very high-resolution (VHR) aerial images is an essential step for a wide range of applications such as military applications, urban planning, and environmental management. Still, it is a challenging task due to the different scales and appearances of the objects. On the other hand, object detection task in VHR aerial images has improved remarkably in recent years due to the achieved advances in convolution neural networks (CNN). Most of the proposed methods depend on a two-stage approach, namely: a region proposal stage and a classification stage such as Faster R-CNN. Even though two-stage approaches outperform the traditional methods, their optimization is not easy and they are not suitable for real-time applications. In this paper, a uniform one-stage model for object detection in VHR aerial images has been proposed. In order to tackle the challenge of different scales, a densely connected feature pyramid network has been proposed by which high-level multi-scale semantic feature maps with high-quality information are prepared for object detection. This work has been evaluated on two publicly available datasets and outperformed the current state-of-the-art results on both in terms of mean average precision (mAP) and computation time.

Download Full-text

Object Detection Based on Global-Local Saliency Constraint in Aerial Images

Remote Sensing ◽

10.3390/rs12091435 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1435 ◽

Cited By ~ 1

Author(s):

Chengyuan Li ◽

Bin Luo ◽

Hailong Hong ◽

Xin Su ◽

Yajun Wang ◽

...

Keyword(s):

Object Detection ◽

Semantic Information ◽

Feature Fusion ◽

Aerial Images ◽

Natural Image ◽

Optical Remote Sensing ◽

Complex Background ◽

Bounding Boxes ◽

The Impact ◽

Oriented Bounding Boxes

Different from object detection in natural image, optical remote sensing object detection is a challenging task, due to the diverse meteorological conditions, complex background, varied orientations, scale variations, etc. In this paper, to address this issue, we propose a novel object detection network (the global-local saliency constraint network, GLS-Net) that can make full use of the global semantic information and achieve more accurate oriented bounding boxes. More precisely, to improve the quality of the region proposals and bounding boxes, we first propose a saliency pyramid which combines a saliency algorithm with a feature pyramid network, to reduce the impact of complex background. Based on the saliency pyramid, we then propose a global attention module branch to enhance the semantic connection between the target and the global scenario. A fast feature fusion strategy is also used to combine the local object information based on the saliency pyramid with the global semantic information optimized by the attention mechanism. Finally, we use an angle-sensitive intersection over union (IoU) method to obtain a more accurate five-parameter representation of the oriented bounding boxes. Experiments with a publicly available object detection dataset for aerial images demonstrate that the proposed GLS-Net achieves a state-of-the-art detection performance.

Download Full-text

Novel Method of Semantic Segmentation Applicable to Augmented Reality

Sensors ◽

10.3390/s20061737 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1737 ◽

Cited By ~ 1

Author(s):

Tae-young Ko ◽

Seung-ho Lee

Keyword(s):

Augmented Reality ◽

Spatial Information ◽

Ground Truth ◽

Semantic Segmentation ◽

Frame Rate ◽

Feature Maps ◽

Residual Network ◽

Feature Map ◽

Pascal Voc ◽

Novel Method

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.

Download Full-text

ORIENTED VEHICLE DETECTION IN HIGH-RESOLUTION REMOTE SENSING IMAGES BASED ON FEATURE AMPLIFICATION AND CATEGORY BALANCE BY OVERSAMPLING DATA AUGMENTATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-153-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 153-159

Author(s):

N. Mo ◽

L. Yan

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Vehicle Detection ◽

Training Dataset ◽

Remote Sensing Images ◽

Feature Maps ◽

Fine Grained ◽

Bounding Boxes ◽

The Impact ◽

Oriented Bounding Boxes

Abstract. Vehicles usually lack detailed information and are difficult to be trained on the high-resolution remote sensing images because of small size. In addition, vehicles contain multiple fine-grained categories that are slightly different, randomly located and oriented. Therefore, it is difficult to locate and identify these fine categories of vehicles. Considering the above problems in high-resolution remote sensing images, this paper proposes an oriented vehicle detection approach. First of all, we propose an oversampling and stitching method to augment the training dataset by increasing the frequency of objects with fewer training samples in order to balance the number of objects in each fine-grained vehicle category. Then considering the effect of the pooling operations on representing small objects, we propose to improve the resolution of feature maps so that detailed information hidden in feature maps can be enriched and they can better distinguish the fine-grained vehicle categories. Finally, we design a joint training loss function for horizontal and oriented bounding boxes with center loss, to decrease the impact of small between-class diversity on vehicle detection. Experimental verification is performed on the VEDAI dataset consisting of 9 fine-grained vehicle categories so as to evaluate the proposed framework. The experimental results show that the proposed framework performs better than most of competitive approaches in terms of a mean average precision of 60.7% and 60.4% in detecting horizontal and oriented bounding boxes respectively.

Download Full-text

A Review of Intelligent Driving Pedestrian Detection Based on Deep Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5410049 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Di Tian ◽

Yi Han ◽

Biyao Wang ◽

Tian Guan ◽

Wei Wei

Keyword(s):

Deep Learning ◽

Object Detection ◽

Rapid Development ◽

Pedestrian Detection ◽

Evaluation Criteria ◽

Human Perception ◽

Detection Accuracy ◽

Learning Stage ◽

Detection Technology ◽

General Object

Pedestrian detection is a specific application of object detection. Compared with general object detection, it shows similarities and unique characteristics. In addition, it has important application value in the fields of intelligent driving and security monitoring. In recent years, with the rapid development of deep learning, pedestrian detection technology has also made great progress. However, there still exists a huge gap between it and human perception. Meanwhile, there are still a lot of problems, and there remains a lot of room for research. Regarding the application of pedestrian detection in intelligent driving technology, it is of necessity to ensure its real-time performance. Additionally, it is necessary to lighten the model while ensuring detection accuracy. This paper first briefly describes the development process of pedestrian detection and then concentrates on summarizing the research results of pedestrian detection technology in the deep learning stage. Subsequently, by summarizing the pedestrian detection dataset and evaluation criteria, the core issues of the current development of pedestrian detection are analyzed. Finally, the next possible development direction of pedestrian detection technology is explained at the end of the paper.

Download Full-text

An Evaluation of Deep Learning Methods for Small Object Detection

Journal of Electrical and Computer Engineering ◽

10.1155/2020/3189691 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18 ◽

Cited By ~ 2

Author(s):

Nhat-Duy Nguyen ◽

Tien Do ◽

Thanh Duc Ngo ◽

Duy-Dinh Le

Keyword(s):

Deep Learning ◽

Object Detection ◽

State Of The Art ◽

Rapid Development ◽

Empirical Evaluation ◽

Grid Cell ◽

Small Object ◽

Feature Maps ◽

Comparative Results ◽

Small Object Detection

Small object detection is an interesting topic in computer vision. With the rapid development in deep learning, it has drawn attention of several researchers with innovations in approaches to join a race. These innovations proposed comprise region proposals, divided grid cell, multiscale feature maps, and new loss function. As a result, performance of object detection has recently had significant improvements. However, most of the state-of-the-art detectors, both in one-stage and two-stage approaches, have struggled with detecting small objects. In this study, we evaluate current state-of-the-art models based on deep learning in both approaches such as Fast RCNN, Faster RCNN, RetinaNet, and YOLOv3. We provide a profound assessment of the advantages and limitations of models. Specifically, we run models with different backbones on different datasets with multiscale objects to find out what types of objects are suitable for each model along with backbones. Extensive empirical evaluation was conducted on 2 standard datasets, namely, a small object dataset and a filtered dataset from PASCAL VOC 2007. Finally, comparative results and analyses are then presented.

Download Full-text

Multi-Scale and Occlusion Aware Network for Vehicle Detection and Segmentation on UAV Aerial Images

Remote Sensing ◽

10.3390/rs12111760 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1760 ◽

Cited By ~ 2

Author(s):

Wang Zhang ◽

Chunsheng Liu ◽

Faliang Chang ◽

Ye Song

Keyword(s):

Feature Fusion ◽

Vehicle Detection ◽

Aerial Images ◽

Feature Maps ◽

Vehicle Monitoring ◽

Multi Scale ◽

Adaptive Fusion ◽

Feature Pyramid ◽

Vehicle Information ◽

Vehicle Segmentation

With the advantage of high maneuverability, Unmanned Aerial Vehicles (UAVs) have been widely deployed in vehicle monitoring and controlling. However, processing the images captured by UAV for the extracting vehicle information is hindered by some challenges including arbitrary orientations, huge scale variations and partial occlusion. In seeking to address these challenges, we propose a novel Multi-Scale and Occlusion Aware Network (MSOA-Net) for UAV based vehicle segmentation, which consists of two parts including a Multi-Scale Feature Adaptive Fusion Network (MSFAF-Net) and a Regional Attention based Triple Head Network (RATH-Net). In MSFAF-Net, a self-adaptive feature fusion module is proposed, which can adaptively aggregate hierarchical feature maps from multiple levels to help Feature Pyramid Network (FPN) deal with the scale change of vehicles. The RATH-Net with a self-attention mechanism is proposed to guide the location-sensitive sub-networks to enhance the vehicle of interest and suppress background noise caused by occlusions. In this study, we release a large comprehensive UAV based vehicle segmentation dataset (UVSD), which is the first public dataset for UAV based vehicle detection and segmentation. Experiments are conducted on the challenging UVSD dataset. Experimental results show that the proposed method is efficient in detecting and segmenting vehicles, and outperforms the compared state-of-the-art works.

Download Full-text

Improved Faster RCNN Based on Feature Amplification and Oversampling Data Augmentation for Oriented Vehicle Detection in Aerial Images

Remote Sensing ◽

10.3390/rs12162558 ◽

2020 ◽

Vol 12 (16) ◽

pp. 2558 ◽

Cited By ~ 2

Author(s):

Nan Mo ◽

Li Yan

Keyword(s):

Data Augmentation ◽

Vehicle Detection ◽

Aerial Images ◽

Training Dataset ◽

Discriminative Ability ◽

Feature Map ◽

Negative Effect ◽

Bounding Boxes ◽

The Impact ◽

Oriented Bounding Boxes

Vehicles in aerial images are generally with small sizes and unbalanced number of samples, which leads to the poor performances of the existing vehicle detection algorithms. Therefore, an oriented vehicle detection framework based on improved Faster RCNN is proposed for aerial images. First of all, we propose an oversampling and stitching data augmentation method to decrease the negative effect of category imbalance in the training dataset and construct a new dataset with balanced number of samples. Then considering that the pooling operation may loss the discriminative ability of features for small objects, we propose to amplify the feature map so that detailed information hidden in the last feature map can be enriched. Finally, we design a joint training loss function including center loss for both horizontal and oriented bounding boxes, and reduce the impact of small inter-class diversity on vehicle detection. The proposed framework is evaluated on the VEDAI dataset that consists of 9 vehicle categories. The experimental results show that the proposed framework outperforms previous approaches with a mean average precision of 60.4% and 60.1% in detecting horizontal and oriented bounding boxes respectively, which is about 8% better than Faster RCNN.

Download Full-text

Axis Learning for Orientated Objects Detection in Aerial Images

Remote Sensing ◽

10.3390/rs12060908 ◽

2020 ◽

Vol 12 (6) ◽

pp. 908 ◽

Cited By ~ 2

Author(s):

Zhifeng Xiao ◽

Linjun Qian ◽

Weiping Shao ◽

Xiaowei Tan ◽

Kai Wang

Keyword(s):

Computational Complexity ◽

Aspect Ratio ◽

Object Detection ◽

Aerial Images ◽

Aerial Image ◽

Large Aspect Ratio ◽

Computationally Efficient ◽

Feature Maps ◽

One Stage ◽

Objects Detection

Orientated object detection in aerial images is still a challenging task due to the bird’s eye view and the various scales and arbitrary angles of objects in aerial images. Most current methods for orientated object detection are anchor-based, which require considerable pre-defined anchors and are time consuming. In this article, we propose a new one-stage anchor-free method to detect orientated objects in per-pixel prediction fashion with less computational complexity. Arbitrary orientated objects are detected by predicting the axis of the object, which is the line connecting the head and tail of the object, and the width of the object is vertical to the axis. By predicting objects at the pixel level of feature maps directly, the method avoids setting a number of hyperparameters related to anchor and is computationally efficient. Besides, a new aspect-ratio-aware orientation centerness method is proposed to better weigh positive pixel points, in order to guide the network to learn discriminative features from a complex background, which brings improvements for large aspect ratio object detection. The method is tested on two common aerial image datasets, achieving better performance compared with most one-stage orientated methods and many two-stage anchor-based methods with a simpler procedure and lower computational complexity.

Download Full-text