Remote Sensing Image Target Detection Model Based on Attention and Feature Fusion

2021 ◽  
Vol 58 (2) ◽  
pp. 0228003
Author(s):  
汪亚妮 Wang Yani ◽  
汪西莉 Wang Xili
2021 ◽  
Vol 13 (19) ◽  
pp. 3908
Author(s):  
Zhenfang Qu ◽  
Fuzhen Zhu ◽  
Chengxiao Qi

Remote sensing image target detection is widely used for both civil and military purposes. However, two factors need to be considered for remote sensing image target detection: real-time and accuracy for detecting targets that occupy few pixels. Considering the two above issues, the main research objective of this paper is to improve the performance of the YOLO algorithm in remote sensing image target detection. The reason is that the YOLO models can guarantee both detection speed and accuracy. More specifically, the YOLOv3 model with an auxiliary network is further improved in this paper. Our model improvement consists of four main components. Firstly, an image blocking module is used to feed fixed size images to the YOLOv3 network; secondly, to speed up the training of YOLOv3, DIoU is used, which can speed up the convergence and increase the training speed; thirdly, the Convolutional Block Attention Module (CBAM) is used to connect the auxiliary network to the backbone network, making it easier for the network to notice specific features so that some key information is not easily lost during the training of the network; and finally, the adaptive feature fusion (ASFF) method is applied to our network model with the aim of improving the detection speed by reducing the inference overhead. The experiments on the DOTA dataset were conducted to validate the effectiveness of our model on the DOTA dataset. Our model can achieve satisfactory detection performance on remote sensing images, and our model performs significantly better than the unimproved YOLOv3 model with an auxiliary network. The experimental results show that the mAP of the optimised network model is 5.36% higher than that of the original YOLOv3 model with the auxiliary network, and the detection frame rate was also increased by 3.07 FPS.


2018 ◽  
Vol 10 (12) ◽  
pp. 1922 ◽  
Author(s):  
Kun Fu ◽  
Yang Li ◽  
Hao Sun ◽  
Xue Yang ◽  
Guangluan Xu ◽  
...  

Ship detection plays an important role in automatic remote sensing image interpretation. The scale difference, large aspect ratio of ship, complex remote sensing image background and ship dense parking scene make the detection task difficult. To handle the challenging problems above, we propose a ship rotation detection model based on a Feature Fusion Pyramid Network and deep reinforcement learning (FFPN-RL) in this paper. The detection network can efficiently generate the inclined rectangular box for ship. First, we propose the Feature Fusion Pyramid Network (FFPN) that strengthens the reuse of different scales features, and FFPN can extract the low level location and high level semantic information that has an important impact on multi-scale ship detection and precise location of dense parking ships. Second, in order to get accurate ship angle information, we apply deep reinforcement learning to the inclined ship detection task for the first time. In addition, we put forward prior policy guidance and a long-term training method to train an angle prediction agent constructed through a dueling structure Q network, which is able to iteratively and accurately obtain the ship angle. In addition, we design soft rotation non-maximum suppression to reduce the missed ship detection while suppressing the redundant detection boxes. We carry out detailed experiments on the remote sensing ship image dataset, and the experiments validate that our FFPN-RL ship detection model has efficient detection performance.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8113
Author(s):  
Kun Fang ◽  
Jianquan Ouyang ◽  
Buwei Hu

Traffic port stations are composed of buildings, infrastructure, and transportation vehicles. The target detection of traffic port stations in high-resolution remote sensing images needs to collect feature information of nearby small targets, comprehensively analyze and classify, and finally complete the traffic port station positioning. At present, deep learning methods based on convolutional neural networks have made great progress in single-target detection of high-resolution remote sensing images. How to show good adaptability to the recognition of multi-target complexes of high-resolution remote sensing images is a difficult point in the current remote sensing field. This paper constructs a novel high-resolution remote sensing image traffic port station detection model (Swin-HSTPS) to achieve high-resolution remote sensing image traffic port station detection (such as airports, ports) and improve the multi-target complex in high-resolution remote sensing images The recognition accuracy of high-resolution remote sensing images solves the problem of high-precision positioning by comprehensive analysis of the feature combination information of multiple small targets in high-resolution remote sensing images. The model combines the characteristics of the MixUp hybrid enhancement algorithm, and enhances the image feature information in the preprocessing stage. The PReLU activation function is added to the forward network of the Swin Transformer model network to construct a ResNet-like residual network and perform convolutional feature maps. Non-linear transformation strengthens the information interaction of each pixel block. This experiment evaluates the superiority of the model training by comparing the two indicators of average precision and average recall in the training phase. At the same time, in the prediction stage, the accuracy of the prediction target is measured by confidence. Experimental results show that the optimal average precision of the Swin-HSTPS reaches 85.3%, which is about 8% higher than the average precision of the Swin Transformer detection model. At the same time, the target prediction accuracy is also higher than the Swin Transformer detection model, which can accurately locate traffic port stations such as airports and ports in high-resolution remote sensing images. This model inherits the advantages of the Swin Transformer detection model, and is superior to mainstream models such as R-CNN and YOLOv5 in terms of the target prediction ability of high-resolution remote sensing image traffic port stations.


2021 ◽  
Vol 13 (10) ◽  
pp. 1950
Author(s):  
Cuiping Shi ◽  
Xin Zhao ◽  
Liguo Wang

In recent years, with the rapid development of computer vision, increasing attention has been paid to remote sensing image scene classification. To improve the classification performance, many studies have increased the depth of convolutional neural networks (CNNs) and expanded the width of the network to extract more deep features, thereby increasing the complexity of the model. To solve this problem, in this paper, we propose a lightweight convolutional neural network based on attention-oriented multi-branch feature fusion (AMB-CNN) for remote sensing image scene classification. Firstly, we propose two convolution combination modules for feature extraction, through which the deep features of images can be fully extracted with multi convolution cooperation. Then, the weights of the feature are calculated, and the extracted deep features are sent to the attention mechanism for further feature extraction. Next, all of the extracted features are fused by multiple branches. Finally, depth separable convolution and asymmetric convolution are implemented to greatly reduce the number of parameters. The experimental results show that, compared with some state-of-the-art methods, the proposed method still has a great advantage in classification accuracy with very few parameters.


2009 ◽  
Vol 14 (1) ◽  
pp. 125-136 ◽  
Author(s):  
Joseph W. Richards ◽  
Johanna Hardin ◽  
Eric B. Grosfils

IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 4673-4687
Author(s):  
Jixiang Zhao ◽  
Shanwei Liu ◽  
Jianhua Wan ◽  
Muhammad Yasir ◽  
Huayu Li

Sign in / Sign up

Export Citation Format

Share Document