scholarly journals Extended Feature Pyramid Network with Adaptive Scale Training Strategy and Anchors for Object Detection in Aerial Images

2020 ◽  
Vol 12 (5) ◽  
pp. 784 ◽  
Author(s):  
Wei Guo ◽  
Weihong Li ◽  
Weiguo Gong ◽  
Jinkai Cui

Multi-scale object detection is a basic challenge in computer vision. Although many advanced methods based on convolutional neural networks have succeeded in natural images, the progress in aerial images has been relatively slow mainly due to the considerably huge scale variations of objects and many densely distributed small objects. In this paper, considering that the semantic information of the small objects may be weakened or even disappear in the deeper layers of neural network, we propose a new detection framework called Extended Feature Pyramid Network (EFPN) for strengthening the information extraction ability of the neural network. In the EFPN, we first design the multi-branched dilated bottleneck (MBDB) module in the lateral connections to capture much more semantic information. Then, we further devise an attention pathway for better locating the objects. Finally, an augmented bottom-up pathway is conducted for making shallow layer information easier to spread and further improving performance. Moreover, we present an adaptive scale training strategy to enable the network to better recognize multi-scale objects. Meanwhile, we present a novel clustering method to achieve adaptive anchors and make the neural network better learn data features. Experiments on the public aerial datasets indicate that the presented method obtain state-of-the-art performance.

Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1820
Author(s):  
Xiaotao Shao ◽  
Qing Wang ◽  
Wei Yang ◽  
Yun Chen ◽  
Yi Xie ◽  
...  

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.


2022 ◽  
Vol 2022 ◽  
pp. 1-10
Author(s):  
Siyu Zhang

To further improve the accuracy of aerobics action detection, a method of aerobics action detection based on improving multiscale characteristics is proposed. In this method, based on faster R-CNN and aiming at the problems existing in faster R-CNN, the feature pyramid network (FPN) is used to extract aerobics action image features. So, the low-level semantic information in the images can be extracted, and it can be converted into high-resolution deep-level semantic information. Finally, the target detector is constructed by the above-extracted anchor points so as to realize the detection of aerobics action. The results show that the loss function of the neural network is reduced to 0.2 by using the proposed method, and the accuracy of the proposed method can reach 96.5% compared with other methods, which proves the feasibility of this study.


Author(s):  
W. Yuan ◽  
Z. Fan ◽  
X. Yuan ◽  
J. Gong ◽  
R. Shibasaki

Abstract. Dense image matching is essential to photogrammetry applications, including Digital Surface Model (DSM) generation, three dimensional (3D) reconstruction, and object detection and recognition. The development of an efficient and robust method for dense image matching has been one of the technical challenges due to high variations in illumination and ground features of aerial images of large areas. Nowadays, due to the development of deep learning technology, deep neural network-based algorithms outperform traditional methods on a variety of tasks such as object detection, semantic segmentation and stereo matching. The proposed network includes cost-volume computation, cost-volume aggregation, and disparity prediction. It starts with a pre-trained VGG-16 network as a backend and using the U-net architecture with nine layers for feature map extraction and a correlation layer for cost volume calculation, after that a guided filter based cost aggregation is adopted for cost volume filtering and finally the soft Argmax function is utilized for disparity prediction. The experimental conducted on a UAV dataset demonstrated that the proposed method achieved the RMSE (root mean square error) of the reprojection error better than 1 pixel in image coordinate and in-ground positioning accuracy within 2.5 ground sample distance. The comparison experiments on KITTI 2015 dataset shows the proposed unsupervised method even comparably with other supervised methods.


Author(s):  
Jiajia Liao ◽  
Yingchao Piao ◽  
Guorong Cai ◽  
Yundong Wu ◽  
Jinhe Su

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 171461-171470
Author(s):  
Dianwei Wang ◽  
Yanhui He ◽  
Ying Liu ◽  
Daxiang Li ◽  
Shiqian Wu ◽  
...  

Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3341 ◽  
Author(s):  
Hilal Tayara ◽  
Kil Chong

Object detection in very high-resolution (VHR) aerial images is an essential step for a wide range of applications such as military applications, urban planning, and environmental management. Still, it is a challenging task due to the different scales and appearances of the objects. On the other hand, object detection task in VHR aerial images has improved remarkably in recent years due to the achieved advances in convolution neural networks (CNN). Most of the proposed methods depend on a two-stage approach, namely: a region proposal stage and a classification stage such as Faster R-CNN. Even though two-stage approaches outperform the traditional methods, their optimization is not easy and they are not suitable for real-time applications. In this paper, a uniform one-stage model for object detection in VHR aerial images has been proposed. In order to tackle the challenge of different scales, a densely connected feature pyramid network has been proposed by which high-level multi-scale semantic feature maps with high-quality information are prepared for object detection. This work has been evaluated on two publicly available datasets and outperformed the current state-of-the-art results on both in terms of mean average precision (mAP) and computation time.


2020 ◽  
Vol 12 (9) ◽  
pp. 1435 ◽  
Author(s):  
Chengyuan Li ◽  
Bin Luo ◽  
Hailong Hong ◽  
Xin Su ◽  
Yajun Wang ◽  
...  

Different from object detection in natural image, optical remote sensing object detection is a challenging task, due to the diverse meteorological conditions, complex background, varied orientations, scale variations, etc. In this paper, to address this issue, we propose a novel object detection network (the global-local saliency constraint network, GLS-Net) that can make full use of the global semantic information and achieve more accurate oriented bounding boxes. More precisely, to improve the quality of the region proposals and bounding boxes, we first propose a saliency pyramid which combines a saliency algorithm with a feature pyramid network, to reduce the impact of complex background. Based on the saliency pyramid, we then propose a global attention module branch to enhance the semantic connection between the target and the global scenario. A fast feature fusion strategy is also used to combine the local object information based on the saliency pyramid with the global semantic information optimized by the attention mechanism. Finally, we use an angle-sensitive intersection over union (IoU) method to obtain a more accurate five-parameter representation of the oriented bounding boxes. Experiments with a publicly available object detection dataset for aerial images demonstrate that the proposed GLS-Net achieves a state-of-the-art detection performance.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-9 ◽  
Author(s):  
Xiaochao Fan ◽  
Hongfei Lin ◽  
Liang Yang ◽  
Yufeng Diao ◽  
Chen Shen ◽  
...  

Humor refers to the quality of being amusing. With the development of artificial intelligence, humor recognition is attracting a lot of research attention. Although phonetics and ambiguity have been introduced by previous studies, existing recognition methods still lack suitable feature design for neural networks. In this paper, we illustrate that phonetics structure and ambiguity associated with confusing words need to be learned for their own representations via the neural network. Then, we propose the Phonetics and Ambiguity Comprehension Gated Attention network (PACGA) to learn phonetic structures and semantic representation for humor recognition. The PACGA model can well represent phonetic information and semantic information with ambiguous words, which is of great benefit to humor recognition. Experimental results on two public datasets demonstrate the effectiveness of our model.


Sign in / Sign up

Export Citation Format

Share Document