scholarly journals Object Detection in Remote Sensing Images Based on Improved Bounding Box Regression and Multi-Level Features Fusion

2020 ◽  
Vol 12 (1) ◽  
pp. 143 ◽  
Author(s):  
Xiaoliang Qian ◽  
Sheng Lin ◽  
Gong Cheng ◽  
Xiwen Yao ◽  
Hangli Ren ◽  
...  

The objective of detection in remote sensing images is to determine the location and category of all targets in these images. The anchor based methods are the most prevalent deep learning based methods, and still have some problems that need to be addressed. First, the existing metric (i.e., intersection over union (IoU)) could not measure the distance between two bounding boxes when they are nonoverlapping. Second, the exsiting bounding box regression loss could not directly optimize the metric in the training process. Third, the existing methods which adopt a hierarchical deep network only choose a single level feature layer for the feature extraction of region proposals, meaning they do not take full use of the advantage of multi-level features. To resolve the above problems, a novel object detection method for remote sensing images based on improved bounding box regression and multi-level features fusion is proposed in this paper. First, a new metric named generalized IoU is applied, which can quantify the distance between two bounding boxes, regardless of whether they are overlapping or not. Second, a novel bounding box regression loss is proposed, which can not only optimize the new metric (i.e., generalized IoU) directly but also overcome the problem that existing bounding box regression loss based on the new metric cannot adaptively change the gradient based on the metric value. Finally, a multi-level features fusion module is proposed and incorporated into the existing hierarchical deep network, which can make full use of the multi-level features for each region proposal. The quantitative comparisons between the proposed method and baseline method on the large scale dataset DIOR demonstrate that incorporating the proposed bounding box regression loss, multi-level features fusion module, and a combination of both into the baseline method can obtain an absolute gain of 0.7%, 1.4%, and 2.2% or so in terms of mAP, respectively. Comparing this with the state-of-the-art methods demonstrates that the proposed method has achieved a state-of-the-art performance. The curves of average precision with different thresholds show that the advantage of the proposed method is more evident when the threshold of generalized IoU (or IoU) is relatively high, which means that the proposed method can improve the precision of object localization. Similar conclusions can be obtained on a NWPU VHR-10 dataset.

2021 ◽  
Vol 13 (22) ◽  
pp. 4517
Author(s):  
Falin Wu ◽  
Jiaqi He ◽  
Guopeng Zhou ◽  
Haolun Li ◽  
Yushuang Liu ◽  
...  

Object detection in remote sensing images plays an important role in both military and civilian remote sensing applications. Objects in remote sensing images are different from those in natural images. They have the characteristics of scale diversity, arbitrary directivity, and dense arrangement, which causes difficulties in object detection. For objects with a large aspect ratio and that are oblique and densely arranged, using an oriented bounding box can help to avoid deleting some correct detection bounding boxes by mistake. The classic rotational region convolutional neural network (R2CNN) has advantages for text detection. However, R2CNN has poor performance in the detection of slender objects with arbitrary directivity in remote sensing images, and its fault tolerance rate is low. In order to solve this problem, this paper proposes an improved R2CNN based on a double detection head structure and a three-point regression method, namely, TPR-R2CNN. The proposed network modifies the original R2CNN network structure by applying a double fully connected (2-fc) detection head and classification fusion. One detection head is for classification and horizontal bounding box regression, the other is for classification and oriented bounding box regression. The three-point regression method (TPR) is proposed for oriented bounding box regression, which determines the positions of the oriented bounding box by regressing the coordinates of the center point and the first two vertices. The proposed network was validated on the DOTA-v1.5 and HRSC2016 datasets, and it achieved a mean average precision (mAP) of 3.90% and 15.27%, respectively, from feature pyramid network (FPN) baselines with a ResNet-50 backbone.


2019 ◽  
Vol 11 (3) ◽  
pp. 286 ◽  
Author(s):  
Jiangqiao Yan ◽  
Hongqi Wang ◽  
Menglong Yan ◽  
Wenhui Diao ◽  
Xian Sun ◽  
...  

Recently, methods based on Faster region-based convolutional neural network (R-CNN)have been popular in multi-class object detection in remote sensing images due to their outstandingdetection performance. The methods generally propose candidate region of interests (ROIs) througha region propose network (RPN), and the regions with high enough intersection-over-union (IoU)values against ground truth are treated as positive samples for training. In this paper, we find thatthe detection result of such methods is sensitive to the adaption of different IoU thresholds. Specially,detection performance of small objects is poor when choosing a normal higher threshold, while alower threshold will result in poor location accuracy caused by a large quantity of false positives.To address the above issues, we propose a novel IoU-Adaptive Deformable R-CNN framework formulti-class object detection. Specially, by analyzing the different roles that IoU can play in differentparts of the network, we propose an IoU-guided detection framework to reduce the loss of small objectinformation during training. Besides, the IoU-based weighted loss is designed, which can learn theIoU information of positive ROIs to improve the detection accuracy effectively. Finally, the class aspectratio constrained non-maximum suppression (CARC-NMS) is proposed, which further improves theprecision of the results. Extensive experiments validate the effectiveness of our approach and weachieve state-of-the-art detection performance on the DOTA dataset.


2021 ◽  
Vol 13 (6) ◽  
pp. 1132
Author(s):  
Zhibao Wang ◽  
Lu Bai ◽  
Guangfu Song ◽  
Jie Zhang ◽  
Jinhua Tao ◽  
...  

Estimation of the number and geo-location of oil wells is important for policy holders considering their impact on energy resource planning. With the recent development in optical remote sensing, it is possible to identify oil wells from satellite images. Moreover, the recent advancement in deep learning frameworks for object detection in remote sensing makes it possible to automatically detect oil wells from remote sensing images. In this paper, we collected a dataset named Northeast Petroleum University–Oil Well Object Detection Version 1.0 (NEPU–OWOD V1.0) based on high-resolution remote sensing images from Google Earth Imagery. Our database includes 1192 oil wells in 432 images from Daqing City, which has the largest oilfield in China. In this study, we compared nine different state-of-the-art deep learning models based on algorithms for object detection from optical remote sensing images. Experimental results show that the state-of-the-art deep learning models achieve high precision on our collected dataset, which demonstrate the great potential for oil well detection in remote sensing.


2021 ◽  
Vol 13 (11) ◽  
pp. 2163
Author(s):  
Zhou Huang ◽  
Huaixin Chen ◽  
Biyuan Liu ◽  
Zhixi Wang

Although remarkable progress has been made in salient object detection (SOD) in natural scene images (NSI), the SOD of optical remote sensing images (RSI) still faces significant challenges due to various spatial resolutions, cluttered backgrounds, and complex imaging conditions, mainly for two reasons: (1) accurate location of salient objects; and (2) subtle boundaries of salient objects. This paper explores the inherent properties of multi-level features to develop a novel semantic-guided attention refinement network (SARNet) for SOD of NSI. Specifically, the proposed semantic guided decoder (SGD) roughly but accurately locates the multi-scale object by aggregating multiple high-level features, and then this global semantic information guides the integration of subsequent features in a step-by-step feedback manner to make full use of deep multi-level features. Simultaneously, the proposed parallel attention fusion (PAF) module combines cross-level features and semantic-guided information to refine the object’s boundary and highlight the entire object area gradually. Finally, the proposed network architecture is trained through an end-to-end fully supervised model. Quantitative and qualitative evaluations on two public RSI datasets and additional NSI datasets across five metrics show that our SARNet is superior to 14 state-of-the-art (SOTA) methods without any post-processing.


2021 ◽  
Vol 13 (13) ◽  
pp. 2459
Author(s):  
Yangyang Li ◽  
Heting Mao ◽  
Ruijiao Liu ◽  
Xuan Pei ◽  
Licheng Jiao ◽  
...  

Object detection in remote sensing images has been widely used in military and civilian fields and is a challenging task due to the complex background, large-scale variation, and dense arrangement in arbitrary orientations of objects. In addition, existing object detection methods rely on the increasingly deeper network, which increases a lot of computational overhead and parameters, and is unfavorable to deployment on the edge devices. In this paper, we proposed a lightweight keypoint-based oriented object detector for remote sensing images. First, we propose a semantic transfer block (STB) when merging shallow and deep features, which reduces noise and restores the semantic information. Then, the proposed adaptive Gaussian kernel (AGK) is adapted to objects of different scales, and further improves detection performance. Finally, we propose the distillation loss associated with object detection to obtain a lightweight student network. Experiments on the HRSC2016 and UCAS-AOD datasets show that the proposed method adapts to different scale objects, obtains accurate bounding boxes, and reduces the influence of complex backgrounds. The comparison with mainstream methods proves that our method has comparable performance under lightweight.


2021 ◽  
Vol 13 (21) ◽  
pp. 4291
Author(s):  
Luyang Zhang ◽  
Haitao Wang ◽  
Lingfeng Wang ◽  
Chunhong Pan ◽  
Qiang Liu ◽  
...  

Rotated object detection is an extension of object detection that uses an oriented bounding box instead of a general horizontal bounding box to define the object position. It is widely used in remote sensing images, scene text, and license plate recognition. The existing rotated object detection methods usually add an angle prediction channel in the bounding box prediction branch, and smooth L1 loss is used as the regression loss function. However, we argue that smooth L1 loss causes a sudden change in loss and slow convergence due to the angle solving mechanism of openCV (the angle between the horizontal line and the first side of the bounding box in the counter-clockwise direction is defined as the rotation angle), and this problem exists in most existing regression loss functions. To solve the above problems, we propose a decoupling modulation mechanism to overcome the problem of sudden changes in loss. On this basis, we also proposed a constraint mechanism, the purpose of which is to accelerate the convergence of the network and ensure optimization toward the ideal direction. In addition, the proposed decoupling modulation mechanism and constraint mechanism can be integrated into the popular regression loss function individually or together, which further improves the performance of the model and makes the model converge faster. The experimental results show that our method achieves 75.2% performance on the aerial image dataset DOTA (OBB task), and saves more than 30% of computing resources. The method also achieves a state-of-the-art performance in HRSC2016, and saved more than 40% of computing resources, which confirms the applicability of the approach.


2020 ◽  
Vol 12 (3) ◽  
pp. 389 ◽  
Author(s):  
Yangyang Li ◽  
Qin Huang ◽  
Xuan Pei ◽  
Licheng Jiao ◽  
Ronghua Shang

Object detection has made significant progress in many real-world scenes. Despite this remarkable progress, the common use case of detection in remote sensing images remains challenging even for leading object detectors, due to the complex background, objects with arbitrary orientation, and large difference in scale of objects. In this paper, we propose a novel rotation detector for remote sensing images, mainly inspired by Mask R-CNN, namely RADet. RADet can obtain the rotation bounding box of objects with shape mask predicted by the mask branch, which is a novel, simple and effective way to get the rotation bounding box of objects. Specifically, a refine feature pyramid network is devised with an improved building block constructing top-down feature maps, to solve the problem of large difference in scales. Meanwhile, the position attention network and the channel attention network are jointly explored by modeling the spatial position dependence between global pixels and highlighting the object feature, for detecting small object surrounded by complex background. Extensive experiments on two remote sensing public datasets, DOTA and NWPUVHR -10, show our method to outperform existing leading object detectors in remote sensing field.


Sensors ◽  
2018 ◽  
Vol 18 (8) ◽  
pp. 2702 ◽  
Author(s):  
Shuxin Li ◽  
Zhilong Zhang ◽  
Biao Li ◽  
Chuwei Li

Since remote sensing images are captured from the top of the target, such as from a satellite or plane platform, ship targets can be presented at any orientation. When detecting ship targets using horizontal bounding boxes, there will be background clutter in the box. This clutter makes it harder to detect the ship and find its precise location, especially when the targets are in close proximity or staying close to the shore. To solve these problems, this paper proposes a deep learning algorithm using a multiscale rotated bounding box to detect the ship target in a complex background and obtain the location and orientation information of the ship. When labeling the oriented targets, we use the five-parameter method to ensure that the box shape is maintained rectangular. The algorithm uses a pretrained deep network to extract features and produces two divided flow paths to output the result. One flow path predicts the target class, while the other predicts the location and angle information. In the training stage, we match the prior multiscale rotated bounding boxes to the ground-truth bounding boxes to obtain the positive sample information and use it to train the deep learning model. When matching the rotated bounding boxes, we narrow down the selection scope to reduce the amount of calculation. In the testing stage, we use the trained model to predict and obtain the final result after comparing with the score threshold and nonmaximum suppression post-processing. Experiments conducted on a remote sensing dataset show that the algorithm is robust in detecting ship targets under complex conditions, such as wave clutter background, target in close proximity, ship close to the shore, and multiscale varieties. Compared to other algorithms, our algorithm not only exhibits better performance in ship detection but also obtains the precise location and orientation information of the ship.


2021 ◽  
Vol 10 (11) ◽  
pp. 736
Author(s):  
Han Fu ◽  
Xiangtao Fan ◽  
Zhenzhen Yan ◽  
Xiaoping Du

The detection of primary and secondary schools (PSSs) is a meaningful task for composite object detection in remote sensing images (RSIs). As a typical composite object in RSIs, PSSs have diverse appearances with complex backgrounds, which makes it difficult to effectively extract their features using the existing deep-learning-based object detection algorithms. Aiming at the challenges of PSSs detection, we propose an end-to-end framework called the attention-guided dense network (ADNet), which can effectively improve the detection accuracy of PSSs. First, a dual attention module (DAM) is designed to enhance the ability in representing complex characteristics and alleviate distractions in the background. Second, a dense feature fusion module (DFFM) is built to promote attention cues flow into low layers, which guides the generation of hierarchical feature representation. Experimental results demonstrate that our proposed method outperforms the state-of-the-art methods and achieves 79.86% average precision. The study proves the effectiveness of our proposed method on PSSs detection.


Sign in / Sign up

Export Citation Format

Share Document