Extended Feature Pyramid Network with Adaptive Scale Training Strategy and Anchors for Object Detection in Aerial Images

Multi-scale object detection is a basic challenge in computer vision. Although many advanced methods based on convolutional neural networks have succeeded in natural images, the progress in aerial images has been relatively slow mainly due to the considerably huge scale variations of objects and many densely distributed small objects. In this paper, considering that the semantic information of the small objects may be weakened or even disappear in the deeper layers of neural network, we propose a new detection framework called Extended Feature Pyramid Network (EFPN) for strengthening the information extraction ability of the neural network. In the EFPN, we first design the multi-branched dilated bottleneck (MBDB) module in the lateral connections to capture much more semantic information. Then, we further devise an attention pathway for better locating the objects. Finally, an augmented bottom-up pathway is conducted for making shallow layer information easier to spread and further improving performance. Moreover, we present an adaptive scale training strategy to enable the network to better recognize multi-scale objects. Meanwhile, we present a novel clustering method to achieve adaptive anchors and make the neural network better learn data features. Experiments on the public aerial datasets indicate that the presented method obtain state-of-the-art performance.

Download Full-text

Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet

Sensors ◽

10.3390/s21051820 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1820

Author(s):

Xiaotao Shao ◽

Qing Wang ◽

Wei Yang ◽

Yun Chen ◽

Yi Xie ◽

...

Keyword(s):

Semantic Information ◽

Detection System ◽

Pedestrian Detection ◽

Detection Accuracy ◽

The Public ◽

Scale Feature ◽

Detection Algorithms ◽

Multi Scale ◽

Art Works ◽

Feature Pyramid

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.

Download Full-text

Detection of Aerobics Action Based on Convolutional Neural Network

Computational Intelligence and Neuroscience ◽

10.1155/2022/1857406 ◽

2022 ◽

Vol 2022 ◽

pp. 1-10

Author(s):

Siyu Zhang

Keyword(s):

Neural Network ◽

High Resolution ◽

Loss Function ◽

Semantic Information ◽

Deep Level ◽

Image Features ◽

Action Detection ◽

The Neural Network ◽

Feature Pyramid ◽

Anchor Points

To further improve the accuracy of aerobics action detection, a method of aerobics action detection based on improving multiscale characteristics is proposed. In this method, based on faster R-CNN and aiming at the problems existing in faster R-CNN, the feature pyramid network (FPN) is used to extract aerobics action image features. So, the low-level semantic information in the images can be extracted, and it can be converted into high-resolution deep-level semantic information. Finally, the target detector is constructed by the above-extracted anchor points so as to realize the detection of aerobics action. The results show that the loss function of the neural network is reduced to 0.2 by using the proposed method, and the accuracy of the proposed method can reach 96.5% compared with other methods, which proves the feasibility of this study.

Download Full-text

MM-FPN: Multi-path and Multi-scale Feature Pyramid Network for Object Detection

10.1109/isceic53685.2021.00072 ◽

2021 ◽

Author(s):

Sheng Dong ◽

Jiaxin Zhang ◽

Zehui Qu

Keyword(s):

Object Detection ◽

Scale Feature ◽

Multi Scale ◽

Feature Pyramid

Download Full-text

UNSUPERVISED MULTI-CONSTRAINT DEEP NEURAL NETWORK FOR DENSE IMAGE MATCHING

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-163-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 163-167

Author(s):

W. Yuan ◽

Z. Fan ◽

X. Yuan ◽

J. Gong ◽

R. Shibasaki

Keyword(s):

Neural Network ◽

Object Detection ◽

Image Matching ◽

Deep Neural Network ◽

Aerial Images ◽

Surface Model ◽

Learning Technology ◽

Ground Sample ◽

Volume Calculation ◽

Dense Image

Abstract. Dense image matching is essential to photogrammetry applications, including Digital Surface Model (DSM) generation, three dimensional (3D) reconstruction, and object detection and recognition. The development of an efficient and robust method for dense image matching has been one of the technical challenges due to high variations in illumination and ground features of aerial images of large areas. Nowadays, due to the development of deep learning technology, deep neural network-based algorithms outperform traditional methods on a variety of tasks such as object detection, semantic segmentation and stereo matching. The proposed network includes cost-volume computation, cost-volume aggregation, and disparity prediction. It starts with a pre-trained VGG-16 network as a backend and using the U-net architecture with nine layers for feature map extraction and a correlation layer for cost volume calculation, after that a guided filter based cost aggregation is adopted for cost volume filtering and finally the soft Argmax function is utilized for disparity prediction. The experimental conducted on a UAV dataset demonstrated that the proposed method achieved the RMSE (root mean square error) of the reprojection error better than 1 pixel in image coordinate and in-ground positioning accuracy within 2.5 ground sample distance. The comparison experiments on KITTI 2015 dataset shows the proposed unsupervised method even comparably with other supervised methods.

Download Full-text

Multi-Scale Cascade Guided Object Detection in Aerial Images

10.1109/igarss47720.2021.9553767 ◽

2021 ◽

Author(s):

Jiajia Liao ◽

Yingchao Piao ◽

Guorong Cai ◽

Yundong Wu ◽

Jinhe Su

Keyword(s):

Object Detection ◽

Aerial Images ◽

Multi Scale

Download Full-text

3D Object Detection Algorithm for Panoramic Images With Multi-Scale Convolutional Neural Network

IEEE Access ◽

10.1109/access.2019.2955995 ◽

2019 ◽

Vol 7 ◽

pp. 171461-171470

Author(s):

Dianwei Wang ◽

Yanhui He ◽

Ying Liu ◽

Daxiang Li ◽

Shiqian Wu ◽

...

Keyword(s):

Neural Network ◽

Object Detection ◽

Convolutional Neural Network ◽

Detection Algorithm ◽

3D Object ◽

Multi Scale ◽

Panoramic Images ◽

3D Object Detection

Download Full-text

Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2020.01.025 ◽

2020 ◽

Vol 161 ◽

pp. 294-308 ◽

Cited By ~ 17

Author(s):

Kun Fu ◽

Zhonghan Chang ◽

Yue Zhang ◽

Guangluan Xu ◽

Keshu Zhang ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Object Detection ◽

Convolutional Neural Network ◽

Remote Sensing Images ◽

Multi Scale

Download Full-text

Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network

Sensors ◽

10.3390/s18103341 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3341 ◽

Cited By ~ 40

Author(s):

Hilal Tayara ◽

Kil Chong

Keyword(s):

High Resolution ◽

Object Detection ◽

Computation Time ◽

Aerial Images ◽

Feature Maps ◽

Two Stage ◽

One Stage ◽

Wide Range ◽

Feature Pyramid ◽

Very High

Object detection in very high-resolution (VHR) aerial images is an essential step for a wide range of applications such as military applications, urban planning, and environmental management. Still, it is a challenging task due to the different scales and appearances of the objects. On the other hand, object detection task in VHR aerial images has improved remarkably in recent years due to the achieved advances in convolution neural networks (CNN). Most of the proposed methods depend on a two-stage approach, namely: a region proposal stage and a classification stage such as Faster R-CNN. Even though two-stage approaches outperform the traditional methods, their optimization is not easy and they are not suitable for real-time applications. In this paper, a uniform one-stage model for object detection in VHR aerial images has been proposed. In order to tackle the challenge of different scales, a densely connected feature pyramid network has been proposed by which high-level multi-scale semantic feature maps with high-quality information are prepared for object detection. This work has been evaluated on two publicly available datasets and outperformed the current state-of-the-art results on both in terms of mean average precision (mAP) and computation time.

Download Full-text

Object Detection Based on Global-Local Saliency Constraint in Aerial Images

Remote Sensing ◽

10.3390/rs12091435 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1435 ◽

Cited By ~ 1

Author(s):

Chengyuan Li ◽

Bin Luo ◽

Hailong Hong ◽

Xin Su ◽

Yajun Wang ◽

...

Keyword(s):

Object Detection ◽

Semantic Information ◽

Feature Fusion ◽

Aerial Images ◽

Natural Image ◽

Optical Remote Sensing ◽

Complex Background ◽

Bounding Boxes ◽

The Impact ◽

Oriented Bounding Boxes

Different from object detection in natural image, optical remote sensing object detection is a challenging task, due to the diverse meteorological conditions, complex background, varied orientations, scale variations, etc. In this paper, to address this issue, we propose a novel object detection network (the global-local saliency constraint network, GLS-Net) that can make full use of the global semantic information and achieve more accurate oriented bounding boxes. More precisely, to improve the quality of the region proposals and bounding boxes, we first propose a saliency pyramid which combines a saliency algorithm with a feature pyramid network, to reduce the impact of complex background. Based on the saliency pyramid, we then propose a global attention module branch to enhance the semantic connection between the target and the global scenario. A fast feature fusion strategy is also used to combine the local object information based on the saliency pyramid with the global semantic information optimized by the attention mechanism. Finally, we use an angle-sensitive intersection over union (IoU) method to obtain a more accurate five-parameter representation of the oriented bounding boxes. Experiments with a publicly available object detection dataset for aerial images demonstrate that the proposed GLS-Net achieves a state-of-the-art detection performance.

Download Full-text

Phonetics and Ambiguity Comprehension Gated Attention Network for Humor Recognition

Complexity ◽

10.1155/2020/2509018 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

Xiaochao Fan ◽

Hongfei Lin ◽

Liang Yang ◽

Yufeng Diao ◽

Chen Shen ◽

...

Keyword(s):

Neural Network ◽

Semantic Information ◽

Semantic Representation ◽

Research Attention ◽

Attention Network ◽

Phonetic Information ◽

The Neural Network ◽

Ambiguous Words ◽

Public Datasets

Humor refers to the quality of being amusing. With the development of artificial intelligence, humor recognition is attracting a lot of research attention. Although phonetics and ambiguity have been introduced by previous studies, existing recognition methods still lack suitable feature design for neural networks. In this paper, we illustrate that phonetics structure and ambiguity associated with confusing words need to be learned for their own representations via the neural network. Then, we propose the Phonetics and Ambiguity Comprehension Gated Attention network (PACGA) to learn phonetic structures and semantic representation for humor recognition. The PACGA model can well represent phonetic information and semantic information with ambiguous words, which is of great benefit to humor recognition. Experimental results on two public datasets demonstrate the effectiveness of our model.

Download Full-text