A Novel Multi-Scale Feature Fusion Method for Region Proposal Network in Fast Object Detection

2020 ◽  
Vol 16 (3) ◽  
pp. 132-145
Author(s):  
Gang Liu ◽  
Chuyi Wang

Neural network models have been widely used in the field of object detecting. The region proposal methods are widely used in the current object detection networks and have achieved well performance. The common region proposal methods hunt the objects by generating thousands of the candidate boxes. Compared to other region proposal methods, the region proposal network (RPN) method improves the accuracy and detection speed with several hundred candidate boxes. However, since the feature maps contains insufficient information, the ability of RPN to detect and locate small-sized objects is poor. A novel multi-scale feature fusion method for region proposal network to solve the above problems is proposed in this article. The proposed method is called multi-scale region proposal network (MS-RPN) which can generate suitable feature maps for the region proposal network. In MS-RPN, the selected feature maps at multiple scales are fine turned respectively and compressed into a uniform space. The generated fusion feature maps are called refined fusion features (RFFs). RFFs incorporate abundant detail information and context information. And RFFs are sent to RPN to generate better region proposals. The proposed approach is evaluated on PASCAL VOC 2007 and MS COCO benchmark tasks. MS-RPN obtains significant improvements over the comparable state-of-the-art detection models.

2019 ◽  
Vol 11 (5) ◽  
pp. 594 ◽  
Author(s):  
Shuo Zhuang ◽  
Ping Wang ◽  
Boran Jiang ◽  
Gang Wang ◽  
Cong Wang

With the rapid advances in remote-sensing technologies and the larger number of satellite images, fast and effective object detection plays an important role in understanding and analyzing image information, which could be further applied to civilian and military fields. Recently object detection methods with region-based convolutional neural network have shown excellent performance. However, these two-stage methods contain region proposal generation and object detection procedures, resulting in low computation speed. Because of the expensive manual costs, the quantity of well-annotated aerial images is scarce, which also limits the progress of geospatial object detection in remote sensing. In this paper, on the one hand, we construct and release a large-scale remote-sensing dataset for geospatial object detection (RSD-GOD) that consists of 5 different categories with 18,187 annotated images and 40,990 instances. On the other hand, we design a single shot detection framework with multi-scale feature fusion. The feature maps from different layers are fused together through the up-sampling and concatenation blocks to predict the detection results. High-level features with semantic information and low-level features with fine details are fully explored for detection tasks, especially for small objects. Meanwhile, a soft non-maximum suppression strategy is put into practice to select the final detection results. Extensive experiments have been conducted on two datasets to evaluate the designed network. Results show that the proposed approach achieves a good detection performance and obtains the mean average precision value of 89.0% on a newly constructed RSD-GOD dataset and 83.8% on the Northwestern Polytechnical University very high spatial resolution-10 (NWPU VHR-10) dataset at 18 frames per second (FPS) on a NVIDIA GTX-1080Ti GPU.


2019 ◽  
Vol 56 (2) ◽  
pp. 021002
Author(s):  
单倩文 Shan Qianwen ◽  
郑新波 Zheng Xinbo ◽  
何小海 He Xiaohai ◽  
滕奇志 Teng Qizhi ◽  
吴晓红 Wu Xiaohong

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3031
Author(s):  
Jing Lian ◽  
Yuhang Yin ◽  
Linhui Li ◽  
Zhenghao Wang ◽  
Yafu Zhou

There are many small objects in traffic scenes, but due to their low resolution and limited information, their detection is still a challenge. Small object detection is very important for the understanding of traffic scene environments. To improve the detection accuracy of small objects in traffic scenes, we propose a small object detection method in traffic scenes based on attention feature fusion. First, a multi-scale channel attention block (MS-CAB) is designed, which uses local and global scales to aggregate the effective information of the feature maps. Based on this block, an attention feature fusion block (AFFB) is proposed, which can better integrate contextual information from different layers. Finally, the AFFB is used to replace the linear fusion module in the object detection network and obtain the final network structure. The experimental results show that, compared to the benchmark model YOLOv5s, this method has achieved a higher mean Average Precison (mAP) under the premise of ensuring real-time performance. It increases the mAP of all objects by 0.9 percentage points on the validation set of the traffic scene dataset BDD100K, and at the same time, increases the mAP of small objects by 3.5%.


2020 ◽  
Author(s):  
Fengli Lu ◽  
Chengcai Fu ◽  
Guoying Zhang ◽  
Jie Shi

Abstract Accurate segmentation of fractures in coal rock CT images is important for safe production and the development of coalbed methane. However, the coal rock fractures formed through natural geological evolution, which are complex, low contrast and different scales. Furthermore, there is no published data set of coal rock. In this paper, we proposed adaptive multi-scale feature fusion based residual U-uet (AMSFFR-U-uet) for fracture segmentation in coal rock CT images. The dilated residual blocks (DResBlock) with dilated ratio (1,2,3) are embedded into encoding branch of the U-uet structure, which can improve the ability of extract feature of network and capture different scales fractures. Furthermore, feature maps of different sizes in the encoding branch are concatenated by adaptive multi-scale feature fusion (AMSFF) module. And AMSFF can not only capture different scales fractures but also improve the restoration of spatial information. To alleviate the lack of coal rock fractures training data, we applied a set of comprehensive data augmentation operations to increase the diversity of training samples. Our network, U-net and Res-U-net are tested on our test set of coal rock CT images with five different region coal rock samples. The experimental results show that our proposed approach improve the average Dice coefficient by 2.9%, the average precision by 7.2% and the average Recall by 9.1% , respectively. Therefore, AMSFFR-U-net can achieve better segmentation results of coal rock fractures, and has stronger generalization ability and robustness.


2020 ◽  
Vol 40 (10) ◽  
pp. 1015002
Author(s):  
刘芳 Liu Fang ◽  
吴志威 Wu Zhiwei ◽  
杨安喆 Yang Anzhe ◽  
韩笑 Han Xiao

2021 ◽  
Vol 13 (2) ◽  
pp. 38
Author(s):  
Yao Xu ◽  
Qin Yu

Great achievements have been made in pedestrian detection through deep learning. For detectors based on deep learning, making better use of features has become the key to their detection effect. While current pedestrian detectors have made efforts in feature utilization to improve their detection performance, the feature utilization is still inadequate. To solve the problem of inadequate feature utilization, we proposed the Multi-Level Feature Fusion Module (MFFM) and its Multi-Scale Feature Fusion Unit (MFFU) sub-module, which connect feature maps of the same scale and different scales by using horizontal and vertical connections and shortcut structures. All of these connections are accompanied by weights that can be learned; thus, they can be used as adaptive multi-level and multi-scale feature fusion modules to fuse the best features. Then, we built a complete pedestrian detector, the Adaptive Feature Fusion Detector (AFFDet), which is an anchor-free one-stage pedestrian detector that can make full use of features for detection. As a result, compared with other methods, our method has better performance on the challenging Caltech Pedestrian Detection Benchmark (Caltech) and has quite competitive speed. It is the current state-of-the-art one-stage pedestrian detection method.


Entropy ◽  
2020 ◽  
Vol 22 (8) ◽  
pp. 811
Author(s):  
Dan Yang ◽  
Guoru Liu ◽  
Mengcheng Ren ◽  
Bin Xu ◽  
Jiao Wang

Computer-aided automatic segmentation of retinal blood vessels plays an important role in the diagnosis of diseases such as diabetes, glaucoma, and macular degeneration. In this paper, we propose a multi-scale feature fusion retinal vessel segmentation model based on U-Net, named MSFFU-Net. The model introduces the inception structure into the multi-scale feature extraction encoder part, and the max-pooling index is applied during the upsampling process in the feature fusion decoder of an improved network. The skip layer connection is used to transfer each set of feature maps generated on the encoder path to the corresponding feature maps on the decoder path. Moreover, a cost-sensitive loss function based on the Dice coefficient and cross-entropy is designed. Four transformations—rotating, mirroring, shifting and cropping—are used as data augmentation strategies, and the CLAHE algorithm is applied to image preprocessing. The proposed framework is tested and trained on DRIVE and STARE, and sensitivity (Sen), specificity (Spe), accuracy (Acc), and area under curve (AUC) are adopted as the evaluation metrics. Detailed comparisons with U-Net model, at last, it verifies the effectiveness and robustness of the proposed model. The Sen of 0.7762 and 0.7721, Spe of 0.9835 and 0.9885, Acc of 0.9694 and 0.9537 and AUC value of 0.9790 and 0.9680 were achieved on DRIVE and STARE databases, respectively. Results are also compared to other state-of-the-art methods, demonstrating that the performance of the proposed method is superior to that of other methods and showing its competitive results.


Sign in / Sign up

Export Citation Format

Share Document