Object Detection Using Improved Bi-Directional Feature Pyramid Network

Conventional single-stage object detectors have been able to efficiently detect objects of various sizes using a feature pyramid network. However, because they adopt a too simple manner of aggregating feature maps, they cannot avoid performance degradation due to information loss. To solve this problem, this paper proposes a new framework for single-stage object detection. The proposed aggregation scheme introduces two independent modules to extract global and local information. First, the global information extractor is designed so that each feature vector can reflect the information of the entire image through a non-local neural network (NLNN). Next, the local information extractor aggregates each feature map more effectively through the improved bi-directional network. The proposed method can achieve better performance than the existing single-stage object detection methods by providing improved feature maps to the detection heads. For example, the proposed method shows 1.6% higher average precision (AP) than the efficient featurized image pyramid network (EFIPNet) for the MicroSoft Common Objects in COntext (MS COCO) dataset.

Download Full-text

Generating Anchor Boxes Based on Attention Mechanism for Object Detection in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12152416 ◽

2020 ◽

Vol 12 (15) ◽

pp. 2416 ◽

Cited By ~ 1

Author(s):

Zhuangzhuang Tian ◽

Ronghui Zhan ◽

Jiemin Hu ◽

Wei Wang ◽

Zhiqiang He ◽

...

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Attention Mechanism ◽

Detection Methods ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Feature Maps ◽

Wide Range ◽

Feature Pyramid ◽

Comprehensive Evaluations

Nowadays, object detection methods based on deep learning are applied more and more to the interpretation of optical remote sensing images. However, the complex background and the wide range of object sizes in remote sensing images increase the difficulty of object detection. In this paper, we improve the detection performance by combining the attention information, and generate adaptive anchor boxes based on the attention map. Specifically, the attention mechanism is introduced into the proposed method to enhance the features of the object regions while reducing the influence of the background. The generated attention map is then used to obtain diverse and adaptable anchor boxes using the guided anchoring method. The generated anchor boxes can match better with the scene and the objects, compared with the traditional proposal boxes. Finally, the modulated feature adaptation module is applied to transform the feature maps to adapt to the diverse anchor boxes. Comprehensive evaluations on the DIOR dataset demonstrate the superiority of the proposed method over the state-of-the-art methods, such as RetinaNet, FCOS and CornerNet. The mean average precision of the proposed method is 4.5% higher than the feature pyramid network. In addition, the ablation experiments are also implemented to further analyze the respective influence of different blocks on the performance improvement.

Download Full-text

Self-Adaptive Aspect Ratio Anchor for Oriented Object Detection in Remote Sensing Images

Remote Sensing ◽

10.3390/rs13071318 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1318

Author(s):

Jie-Bo Hou ◽

Xiaobin Zhu ◽

Xu-Cheng Yin

Keyword(s):

Remote Sensing ◽

Aspect Ratio ◽

Object Detection ◽

Detection Methods ◽

Remote Sensing Images ◽

Feature Maps ◽

Aspect Ratios ◽

Feature Pyramid ◽

Oriented Object ◽

Self Adaptive

Object detection is a significant and challenging problem in the study of remote sensing. Since remote sensing images are typically captured with a bird’s-eye view, the aspect ratios of objects in the same category may obey a Gaussian distribution. Generally, existing object detection methods ignore exploring the distribution character of aspect ratios for improving performance in remote sensing tasks. In this paper, we propose a novel Self-Adaptive Aspect Ratio Anchor (SARA) to explicitly explore aspect ratio variations of objects in remote sensing images. To be concrete, our SARA can self-adaptively learn an appropriate aspect ratio for each category. In this way, we can only utilize a simple squared anchor (related to the strides of feature maps in Feature Pyramid Networks) to regress objects in various aspect ratios. Finally, we adopt an Oriented Box Decoder (OBD) to align the feature maps and encode the orientation information of oriented objects. Our method achieves a promising mAP value of 79.91% on the DOTA dataset.

Download Full-text

GC-YOLOv3: You Only Look Once with Global Context Block

Electronics ◽

10.3390/electronics9081235 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1235

Author(s):

Yang Yang ◽

Hongmin Deng

Keyword(s):

Object Detection ◽

Irrelevant Information ◽

Detection Algorithm ◽

Visual Object ◽

Detection Accuracy ◽

Feature Maps ◽

Average Precision ◽

Global Context ◽

Pascal Voc ◽

Feature Pyramid

In order to make the classification and regression of single-stage detectors more accurate, an object detection algorithm named Global Context You-Only-Look-Once v3 (GC-YOLOv3) is proposed based on the You-Only-Look-Once (YOLO) in this paper. Firstly, a better cascading model with learnable semantic fusion between a feature extraction network and a feature pyramid network is designed to improve detection accuracy using a global context block. Secondly, the information to be retained is screened by combining three different scaling feature maps together. Finally, a global self-attention mechanism is used to highlight the useful information of feature maps while suppressing irrelevant information. Experiments show that our GC-YOLOv3 reaches a maximum of 55.5 object detection mean Average Precision (mAP)@0.5 on Common Objects in Context (COCO) 2017 test-dev and that the mAP is 5.1% higher than that of the YOLOv3 algorithm on Pascal Visual Object Classes (PASCAL VOC) 2007 test set. Therefore, experiments indicate that the proposed GC-YOLOv3 model exhibits optimal performance on the PASCAL VOC and COCO datasets.

Download Full-text

Object Detection in Very High-Resolution Aerial Images Using One-Stage Densely Connected Feature Pyramid Network

Sensors ◽

10.3390/s18103341 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3341 ◽

Cited By ~ 40

Author(s):

Hilal Tayara ◽

Kil Chong

Keyword(s):

High Resolution ◽

Object Detection ◽

Computation Time ◽

Aerial Images ◽

Feature Maps ◽

Two Stage ◽

One Stage ◽

Wide Range ◽

Feature Pyramid ◽

Very High

Object detection in very high-resolution (VHR) aerial images is an essential step for a wide range of applications such as military applications, urban planning, and environmental management. Still, it is a challenging task due to the different scales and appearances of the objects. On the other hand, object detection task in VHR aerial images has improved remarkably in recent years due to the achieved advances in convolution neural networks (CNN). Most of the proposed methods depend on a two-stage approach, namely: a region proposal stage and a classification stage such as Faster R-CNN. Even though two-stage approaches outperform the traditional methods, their optimization is not easy and they are not suitable for real-time applications. In this paper, a uniform one-stage model for object detection in VHR aerial images has been proposed. In order to tackle the challenge of different scales, a densely connected feature pyramid network has been proposed by which high-level multi-scale semantic feature maps with high-quality information are prepared for object detection. This work has been evaluated on two publicly available datasets and outperformed the current state-of-the-art results on both in terms of mean average precision (mAP) and computation time.

Download Full-text

Transferable Adversarial Attacks for Image and Video Object Detection

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/134 ◽

2019 ◽

Cited By ~ 8

Author(s):

Xingxing Wei ◽

Siyuan Liang ◽

Ning Chen ◽

Xiaochun Cao

Keyword(s):

Object Detection ◽

Video Data ◽

Detection Methods ◽

Feature Maps ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Examples ◽

Adversarial Example ◽

High Level ◽

Image Object Detection

Identifying adversarial examples is beneficial for understanding deep networks and developing robust models. However, existing attacking methods for image object detection have two limitations: weak transferability---the generated adversarial examples often have a low success rate to attack other kinds of detection methods, and high computation cost---they need much time to deal with video data, where many frames need polluting. To address these issues, we present a generative method to obtain adversarial images and videos, thereby significantly reducing the processing time. To enhance transferability, we manipulate the feature maps extracted by a feature network, which usually constitutes the basis of object detectors. Our method is based on the Generative Adversarial Network (GAN) framework, where we combine a high-level class loss and a low-level feature loss to jointly train the adversarial example generator. Experimental results on PASCAL VOC and ImageNet VID datasets show that our method efficiently generates image and video adversarial examples, and more importantly, these adversarial examples have better transferability, therefore being able to simultaneously attack two kinds of representative object detection models: proposal based models like Faster-RCNN and regression based models like SSD.

Download Full-text

Global and local information aggregation network for edge-aware salient object detection

Journal of Visual Communication and Image Representation ◽

10.1016/j.jvcir.2021.103350 ◽

2021 ◽

pp. 103350

Author(s):

Qing Zhang ◽

Liqian Zhang ◽

Dong Wang ◽

Yanjiao Shi ◽

Jiajun Lin

Keyword(s):

Object Detection ◽

Information Aggregation ◽

Local Information ◽

Salient Object Detection ◽

Salient Object ◽

Global And Local

Download Full-text

Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds

Sensors ◽

10.3390/s20030704 ◽

2020 ◽

Vol 20 (3) ◽

pp. 704 ◽

Cited By ~ 6

Author(s):

Hongwu Kuang ◽

Bei Wang ◽

Jianping An ◽

Ming Zhang ◽

Zehan Zhang

Keyword(s):

Object Detection ◽

Point Clouds ◽

Autonomous Driving ◽

Feature Maps ◽

3D Object ◽

Cloud Data ◽

Multi Scale ◽

Feature Pyramid ◽

Point Data ◽

3D Object Detection

Object detection in point cloud data is one of the key components in computer vision systems, especially for autonomous driving applications. In this work, we present Voxel-Feature Pyramid Network, a novel one-stage 3D object detector that utilizes raw data from LIDAR sensors only. The core framework consists of an encoder network and a corresponding decoder followed by a region proposal network. Encoder extracts and fuses multi-scale voxel information in a bottom-up manner, whereas decoder fuses multiple feature maps from various scales by Feature Pyramid Network in a top-down way. Extensive experiments show that the proposed method has better performance on extracting features from point data and demonstrates its superiority over some baselines on the challenging KITTI-3D benchmark, obtaining good performance on both speed and accuracy in real-world scenarios.

Download Full-text

A Mountain Summit Recognition Method Based on Improved Faster R-CNN

Complexity ◽

10.1155/2021/8235108 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Yueping Kong ◽

Yun Wang ◽

Song Guo ◽

Jiajing Wang

Keyword(s):

Detection Methods ◽

Feature Points ◽

Feature Maps ◽

Recognition Method ◽

Objective Criterion ◽

Detection Algorithms ◽

Digital Elevation ◽

Feature Pyramid ◽

Mountain Summit ◽

Elevation Model

Mountain summits are vital topographic feature points, which are essential for understanding landform processes and their impacts on the environment and ecosystem. Traditional summit detection methods operate on handcrafted features extracted from digital elevation model (DEM) data and apply parametric detection algorithms to locate mountain summits. However, these methods may no longer be effective to achieve desirable recognition results in small summits and suffer from the objective criterion lacking problem. Thus, to address these problems, we propose an improved Faster region-convolutional neural network (R-CNN) to accurately detect the mountain summits from DEM data. Based on Faster R-CNN, the improved network adopts a residual convolution block to replace the traditional part and adds a feature pyramid network (FPN) to fuse the features with adjacent layers to better address the mountain summit detection task. The residual convolution is employed to capture the deep correlation between visual and physical morphological features. The FPN is utilized to integrate the location and semantic information in the extracted feature maps to effectively represent the mountain summit area. The experimental results demonstrate that the proposed network could achieve the highest recall and precision without manually designed summit features and accurately identify small summits.

Download Full-text

RESOLUTION-AWARE NETWORK WITH ATTENTION MECHANISMS FOR REMOTE SENSING OBJECT DETECTION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-909-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 909-916

Author(s):

Z. Tian ◽

W. Wang ◽

B. Tian ◽

R. Zhan ◽

J. Zhang

Keyword(s):

Remote Sensing ◽

Object Detection ◽

Attention Mechanism ◽

Image Resolution ◽

Uneven Distribution ◽

Detection Methods ◽

Optical Remote Sensing ◽

Feature Maps ◽

Optical Remote Sensing Image ◽

Comprehensive Evaluations

Abstract. Nowadays, deep-learning-based object detection methods are more and more broadly applied to the interpretation of optical remote sensing image. Although these methods can obtain promising results in general conditions, the designed networks usually ignore the characteristics of remote sensing images, such as large image resolution and uneven distribution of object location. In this paper, an effective detection method based on the convolutional neural network is proposed. First, in order to make the designed network more suitable for the image resolution, EfficientNet is incorporated into the detection framework as the backbone network. EfficientNet employs the compound scaling method to adjust the depth and width of the network, thereby meeting the needs of different resolutions of input images. Then, the attention mechanism is introduced into the proposed method to improve the extracted feature maps. The attention mechanism makes the network more focused on the object areas while reducing the influence of the background areas, so as to reduce the influence of uneven distribution. Comprehensive evaluations on a public object detection dataset demonstrate the effectiveness of the proposed method.

Download Full-text

Small Object Detection Algorithm Based on Feature Pyramid-Enhanced Fusion SSD

Complexity ◽

10.1155/2019/7297960 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Haotian Li ◽

Kezheng Lin ◽

Jingxuan Bai ◽

Ao Li ◽

Jiali Yu

Keyword(s):

Object Detection ◽

Detection Rate ◽

Detection Algorithm ◽

Single Shot ◽

Small Object ◽

Feature Maps ◽

Scale Invariant ◽

Feature Pyramid ◽

Small Object Detection ◽

Detection And Localization

In order to improve the detection rate of the traditional single-shot multibox detection algorithm in small object detection, a feature-enhanced fusion SSD object detection algorithm based on the pyramid network is proposed. Firstly, the selected multiscale feature layer is merged with the scale-invariant convolutional layer through the feature pyramid network structure; at the same time, the multiscale feature map is separately converted into the channel number using the scale-invariant convolution kernel. Then, the obtained two sets of pyramid-shaped feature layers are further feature fused to generate a set of enhanced multiscale feature maps, and the scale-invariant convolution is performed again on these layers. Finally, the obtained layer is used for detection and localization. The final location coordinates and confidence are output after nonmaximum suppression. Experimental results on the Pascal VOC 2007 and 2012 datasets confirm that there is a 8.2% improvement in mAP compared to the original SSD and some existing algorithms.

Download Full-text