Backbone Network for Object Detection with Multiple Dilated Convolutions and Feature Summation

2018 ◽  
Vol 45 (8) ◽  
pp. 786-791
Author(s):  
Vani Natalia Kuntjono ◽  
Seunghyun Ko ◽  
Yang Fang ◽  
Geunsik Jo
2021 ◽  
Vol 104 (2) ◽  
pp. 003685042110113
Author(s):  
Xianghua Ma ◽  
Zhenkun Yang

Real-time object detection on mobile platforms is a crucial but challenging computer vision task. However, it is widely recognized that although the lightweight object detectors have a high detection speed, the detection accuracy is relatively low. In order to improve detecting accuracy, it is beneficial to extract complete multi-scale image features in visual cognitive tasks. Asymmetric convolutions have a useful quality, that is, they have different aspect ratios, which can be used to exact image features of objects, especially objects with multi-scale characteristics. In this paper, we exploit three different asymmetric convolutions in parallel and propose a new multi-scale asymmetric convolution unit, namely MAC block to enhance multi-scale representation ability of CNNs. In addition, MAC block can adaptively merge the features with different scales by allocating learnable weighted parameters to three different asymmetric convolution branches. The proposed MAC blocks can be inserted into the state-of-the-art backbone such as ResNet-50 to form a new multi-scale backbone network of object detectors. To evaluate the performance of MAC block, we conduct experiments on CIFAR-100, PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO 2014 datasets. Experimental results show that the detection precision can be greatly improved while a fast detection speed is guaranteed as well.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 136
Author(s):  
Fangyu Li ◽  
Weizheng Jin ◽  
Cien Fan ◽  
Lian Zou ◽  
Qingsheng Chen ◽  
...  

3D object detection in LiDAR point clouds has been extensively used in autonomous driving, intelligent robotics, and augmented reality. Although the one-stage 3D detector has satisfactory training and inference speed, there are still some performance problems due to insufficient utilization of bird’s eye view (BEV) information. In this paper, a new backbone network is proposed to complete the cross-layer fusion of multi-scale BEV feature maps, which makes full use of various information for detection. Specifically, our proposed backbone network can be divided into a coarse branch and a fine branch. In the coarse branch, we use the pyramidal feature hierarchy (PFH) to generate multi-scale BEV feature maps, which retain the advantages of different levels and serves as the input of the fine branch. In the fine branch, our proposed pyramid splitting and aggregation (PSA) module deeply integrates different levels of multi-scale feature maps, thereby improving the expressive ability of the final features. Extensive experiments on the challenging KITTI-3D benchmark show that our method has better performance in both 3D and BEV object detection compared with some previous state-of-the-art methods. Experimental results with average precision (AP) prove the effectiveness of our network.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Zeqing Zhang ◽  
Weiwei Lin ◽  
Yuqiang Zheng

Focusing on DOTA, the multidirectional object dataset in aerial view of vehicles, CMDTD has been proposed. The reason why it is difficult for applying the general object detection algorithm in multidirectional object detection has been analyzed in this paper. Based on this, the detection principle of CMDTD including its backbone network and multidirectional multi-information detection end module has been studied. In addition, in view of the complexity of the scene faced by aerial view of vehicles, a unique data expansion method is proposed. At last, three datasets have been experimented using the CMDTD algorithm, proving that the cascaded multidirectional object detection algorithm with high effectiveness is superior to other methods.


2020 ◽  
Vol 57 (4) ◽  
pp. 041021
Author(s):  
宋雅麟 Song Yalin ◽  
庞彦伟 Pang Yanwei

2020 ◽  
Vol 34 (07) ◽  
pp. 11653-11660 ◽  
Author(s):  
Yudong Liu ◽  
Yongtao Wang ◽  
Siwei Wang ◽  
Tingting Liang ◽  
Qijie Zhao ◽  
...  

In existing CNN based detectors, the backbone network is a very important component for basic feature1 extraction, and the performance of the detectors highly depends on it. In this paper, we aim to achieve better detection performance by building a more powerful backbone from existing ones like ResNet and ResNeXt. Specifically, we propose a novel strategy for assembling multiple identical backbones by composite connections between the adjacent backbones, to form a more powerful backbone named Composite Backbone Network (CBNet). In this way, CBNet iteratively feeds the output features of the previous backbone, namely high-level features, as part of input features to the succeeding backbone, in a stage-by-stage fashion, and finally the feature maps of the last backbone (named Lead Backbone) are used for object detection. We show that CBNet can be very easily integrated into most state-of-the-art detectors and significantly improve their performances. For example, it boosts the mAP of FPN, Mask R-CNN and Cascade R-CNN on the COCO dataset by about 1.5 to 3.0 points. Moreover, experimental results show that the instance segmentation results can be improved as well. Specifically, by simply integrating the proposed CBNet into the baseline detector Cascade Mask R-CNN, we achieve a new state-of-the-art result on COCO dataset (mAP of 53.3) with a single model, which demonstrates great effectiveness of the proposed CBNet architecture. Code will be made available at https://github.com/PKUbahuangliuhe/CBNet.


Processes ◽  
2021 ◽  
Vol 9 (9) ◽  
pp. 1654
Author(s):  
Xiaoliang Zhang ◽  
Kehe Wu ◽  
Qi Ma ◽  
Zuge Chen

As the object detection dataset scale is smaller than the image recognition dataset ImageNet scale, transfer learning has become a basic training method for deep learning object detection models, which pre-trains the backbone network of the object detection model on an ImageNet dataset to extract features for detection tasks. However, the classification task of detection focuses on the salient region features of an object, while the location task of detection focuses on the edge features, so there is a certain deviation between the features extracted by a pretrained backbone network and those needed by a localization task. To solve this problem, a decoupled self-attention (DSA) module is proposed for one-stage object-detection models in this paper. A DSA includes two decoupled self-attention branches, so it can extract appropriate features for different tasks. It is located between the Feature Pyramid Networks (FPN) and head networks of subtasks, and used to independently extract global features for different tasks based on FPN-fused features. Although the DSA network module is simple, it can effectively improve the performance of object detection, and can easily be embedded in many detection models. Our experiments are based on the representative one-stage detection model RetinaNet. In the Common Objects in Context (COCO) dataset, when ResNet50 and ResNet101 are used as backbone networks, the detection performances can be increased by 0.4 and 0.5% AP, respectively. When the DSA module and object confidence task are both applied in RetinaNet, the detection performances based on ResNet50 and ResNet101 can be increased by 1.0 and 1.4% AP, respectively. The experiment results show the effectiveness of the DSA module.


Sign in / Sign up

Export Citation Format

Share Document