scholarly journals A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery

2019 ◽  
Vol 11 (11) ◽  
pp. 1376 ◽  
Author(s):  
Weiying Xie ◽  
Haonan Qin ◽  
Yunsong Li ◽  
Zhuo Wang ◽  
Jie Lei

With great significance in military and civilian applications, the topic of detecting small and densely arranged objects in wide-scale remote sensing imagery is still challenging nowadays. To solve this problem, we propose a novel effectively optimized one-stage network (NEOON). As a fully convolutional network, NEOON consists of four parts: Feature extraction, feature fusion, feature enhancement, and multi-scale detection. To extract effective features, the first part has implemented bottom-up and top-down coherent processing by taking successive down-sampling and up-sampling operations in conjunction with residual modules. The second part consolidates high-level and low-level features by adopting concatenation operations with subsequent convolutional operations to explicitly yield strong feature representation and semantic information. The third part is implemented by constructing a receptive field enhancement (RFE) module and incorporating it into the fore part of the network where the information of small objects exists. The final part is achieved by four detectors with different sensitivities accessing the fused features, all four parallel, to enable the network to make full use of information of objects in different scales. Besides, the Focal Loss is set to enable the cross entropy for classification to solve the tough problem of class imbalance in one-stage methods. In addition, we introduce the Soft-NMS to preserve accurate bounding boxes in the post-processing stage especially for densely arranged objects. Note that the split and merge strategy and multi-scale training strategy are employed in training. Thorough experiments are performed on ACS datasets constructed by us and NWPU VHR-10 datasets to evaluate the performance of NEOON. Specifically, 4.77% and 5.50% improvements in mAP and recall, respectively, on the ACS dataset as compared to YOLOv3 powerfully prove that NEOON can effectually improve the detection accuracy of small objects in remote sensing imagery. In addition, extensive experiments and comprehensive evaluations on the NWPU VHR-10 dataset with 10 classes have illustrated the superiority of NEOON in the extraction of spatial information of high-resolution remote sensing images.

2021 ◽  
Vol 13 (10) ◽  
pp. 1925
Author(s):  
Shengzhou Xiong ◽  
Yihua Tan ◽  
Yansheng Li ◽  
Cai Wen ◽  
Pei Yan

Object detection in remote sensing images (RSIs) is one of the basic tasks in the field of remote sensing image automatic interpretation. In recent years, the deep object detection frameworks of natural scene images (NSIs) have been introduced into object detection on RSIs, and the detection performance has improved significantly because of the powerful feature representation. However, there are still many challenges concerning the particularities of remote sensing objects. One of the main challenges is the missed detection of small objects which have less than five percent of the pixels of the big objects. Generally, the existing algorithms choose to deal with this problem by multi-scale feature fusion based on a feature pyramid. However, the benefits of this strategy are limited, considering that the location of small objects in the feature map will disappear when the detection task is processed at the end of the network. In this study, we propose a subtask attention network (StAN), which handles the detection task directly on the shallow layer of the network. First, StAN contains one shared feature branch and two subtask attention branches of a semantic auxiliary subtask and a detection subtask based on the multi-task attention network (MTAN). Second, the detection branch uses only low-level features considering small objects. Third, the attention map guidance mechanism is put forward to optimize the network for keeping the identification ability. Fourth, the multi-dimensional sampling module (MdS), global multi-view channel weights (GMulW) and target-guided pixel attention (TPA) are designed for further improvement of the detection accuracy in complex scenes. The experimental results on the NWPU VHR-10 dataset and DOTA dataset demonstrated that the proposed algorithm achieved the SOTA performance, and the missed detection of small objects decreased. On the other hand, ablation experiments also proved the effects of MdS, GMulW and TPA.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1142
Author(s):  
Xinying Wang ◽  
Yingdan Wu ◽  
Yang Ming ◽  
Hui Lv

Due to increasingly complex factors of image degradation, inferring high-frequency details of remote sensing imagery is more difficult compared to ordinary digital photos. This paper proposes an adaptive multi-scale feature fusion network (AMFFN) for remote sensing image super-resolution. Firstly, the features are extracted from the original low-resolution image. Then several adaptive multi-scale feature extraction (AMFE) modules, the squeeze-and-excited and adaptive gating mechanisms are adopted for feature extraction and fusion. Finally, the sub-pixel convolution method is used to reconstruct the high-resolution image. Experiments are performed on three datasets, the key characteristics, such as the number of AMFEs and the gating connection way are studied, and super-resolution of remote sensing imagery of different scale factors are qualitatively and quantitatively analyzed. The results show that our method outperforms the classic methods, such as Super-Resolution Convolutional Neural Network(SRCNN), Efficient Sub-Pixel Convolutional Network (ESPCN), and multi-scale residual CNN(MSRN).


2018 ◽  
Vol 11 (1) ◽  
pp. 20 ◽  
Author(s):  
Yuhao Wang ◽  
Binxiu Liang ◽  
Meng Ding ◽  
Jiangyun Li

Dense semantic labeling is significant in high-resolution remote sensing imagery research and it has been widely used in land-use analysis and environment protection. With the recent success of fully convolutional networks (FCN), various types of network architectures have largely improved performance. Among them, atrous spatial pyramid pooling (ASPP) and encoder-decoder are two successful ones. The former structure is able to extract multi-scale contextual information and multiple effective field-of-view, while the latter structure can recover the spatial information to obtain sharper object boundaries. In this study, we propose a more efficient fully convolutional network by combining the advantages from both structures. Our model utilizes the deep residual network (ResNet) followed by ASPP as the encoder and combines two scales of high-level features with corresponding low-level features as the decoder at the upsampling stage. We further develop a multi-scale loss function to enhance the learning procedure. In the postprocessing, a novel superpixel-based dense conditional random field is employed to refine the predictions. We evaluate the proposed method on the Potsdam and Vaihingen datasets and the experimental results demonstrate that our method performs better than other machine learning or deep learning methods. Compared with the state-of-the-art DeepLab_v3+ our model gains 0.4% and 0.6% improvements in overall accuracy on these two datasets respectively.


2019 ◽  
Vol 11 (18) ◽  
pp. 2095 ◽  
Author(s):  
Kun Fu ◽  
Zhuo Chen ◽  
Yue Zhang ◽  
Xian Sun

In recent years, deep learning has led to a remarkable breakthrough in object detection in remote sensing images. In practice, two-stage detectors perform well regarding detection accuracy but are slow. On the other hand, one-stage detectors integrate the detection pipeline of two-stage detectors to simplify the detection process, and are faster, but with lower detection accuracy. Enhancing the capability of feature representation may be a way to improve the detection accuracy of one-stage detectors. For this goal, this paper proposes a novel one-stage detector with enhanced capability of feature representation. The enhanced capability benefits from two proposed structures: dual top-down module and dense-connected inception module. The former efficiently utilizes multi-scale features from multiple layers of the backbone network. The latter both widens and deepens the network to enhance the ability of feature representation with limited extra computational cost. To evaluate the effectiveness of proposed structures, we conducted experiments on horizontal bounding box detection tasks on the challenging DOTA dataset and gained 73.49% mean Average Precision (mAP), achieving state-of-the-art performance. Furthermore, our method ran significantly faster than the best public two-stage detector on the DOTA dataset.


2021 ◽  
Vol 10 (11) ◽  
pp. 736
Author(s):  
Han Fu ◽  
Xiangtao Fan ◽  
Zhenzhen Yan ◽  
Xiaoping Du

The detection of primary and secondary schools (PSSs) is a meaningful task for composite object detection in remote sensing images (RSIs). As a typical composite object in RSIs, PSSs have diverse appearances with complex backgrounds, which makes it difficult to effectively extract their features using the existing deep-learning-based object detection algorithms. Aiming at the challenges of PSSs detection, we propose an end-to-end framework called the attention-guided dense network (ADNet), which can effectively improve the detection accuracy of PSSs. First, a dual attention module (DAM) is designed to enhance the ability in representing complex characteristics and alleviate distractions in the background. Second, a dense feature fusion module (DFFM) is built to promote attention cues flow into low layers, which guides the generation of hierarchical feature representation. Experimental results demonstrate that our proposed method outperforms the state-of-the-art methods and achieves 79.86% average precision. The study proves the effectiveness of our proposed method on PSSs detection.


2021 ◽  
Vol 13 (4) ◽  
pp. 683
Author(s):  
Lang Huyan ◽  
Yunpeng Bai ◽  
Ying Li ◽  
Dongmei Jiang ◽  
Yanning Zhang ◽  
...  

Onboard real-time object detection in remote sensing images is a crucial but challenging task in this computation-constrained scenario. This task not only requires the algorithm to yield excellent performance but also requests limited time and space complexity of the algorithm. However, previous convolutional neural networks (CNN) based object detectors for remote sensing images suffer from heavy computational cost, which hinders them from being deployed on satellites. Moreover, an onboard detector is desired to detect objects at vastly different scales. To address these issues, we proposed a lightweight one-stage multi-scale feature fusion detector called MSF-SNET for onboard real-time object detection of remote sensing images. Using lightweight SNET as the backbone network reduces the number of parameters and computational complexity. To strengthen the detection performance of small objects, three low-level features are extracted from the three stages of SNET respectively. In the detection part, another three convolutional layers are designed to further extract deep features with rich semantic information for large-scale object detection. To improve detection accuracy, the deep features and low-level features are fused to enhance the feature representation. Extensive experiments and comprehensive evaluations on the openly available NWPU VHR-10 dataset and DIOR dataset are conducted to evaluate the proposed method. Compared with other state-of-art detectors, the proposed detection framework has fewer parameters and calculations, while maintaining consistent accuracy.


2021 ◽  
Vol 13 (23) ◽  
pp. 4805
Author(s):  
Guangbin Zhang ◽  
Xianjun Gao ◽  
Yuanwei Yang ◽  
Mingwei Wang ◽  
Shuhao Ran

Clouds and snow in remote sensing imageries cover underlying surface information, reducing image availability. Moreover, they interact with each other, decreasing the cloud and snow detection accuracy. In this study, we propose a convolutional neural network for cloud and snow detection, named the cloud and snow detection network (CSD-Net). It incorporates the multi-scale feature fusion module (MFF) and the controllably deep supervision and feature fusion structure (CDSFF). MFF can capture and aggregate features at various scales, ensuring that the extracted high-level semantic features of clouds and snow are more distinctive. CDSFF can provide a deeply supervised mechanism with hinge loss and combine information from adjacent layers to gain more representative features. It ensures the gradient flow is more oriented and error-less, while retaining more effective information. Additionally, a high-resolution cloud and snow dataset based on WorldView2 (CSWV) was created and released. This dataset meets the training requirements of deep learning methods for clouds and snow in high-resolution remote sensing images. Based on the datasets with varied resolutions, CSD-Net is compared to eight state-of-the-art deep learning methods. The experiment results indicate that CSD-Net has an excellent detection accuracy and efficiency. Specifically, the mean intersection over the union (MIoU) of CSD-Net is the highest in the corresponding experiment. Furthermore, the number of parameters in our proposed network is just 7.61 million, which is the lowest of the tested methods. It only has 88.06 GFLOPs of floating point operations, which is less than the U-Net, DeepLabV3+, PSPNet, SegNet-Modified, MSCFF, and GeoInfoNet. Meanwhile, CSWV has a higher annotation quality since the same method can obtain a greater accuracy on it.


2021 ◽  
Vol 13 (18) ◽  
pp. 3771
Author(s):  
Tao Lei ◽  
Linze Li ◽  
Zhiyong Lv ◽  
Mingzhe Zhu ◽  
Xiaogang Du ◽  
...  

Land cover classification from very high-resolution (VHR) remote sensing images is a challenging task due to the complexity of geography scenes and the varying shape and size of ground targets. It is difficult to utilize the spectral data directly, or to use traditional multi-scale feature extraction methods, to improve VHR remote sensing image classification results. To address the problem, we proposed a multi-modality and multi-scale attention fusion network for land cover classification from VHR remote sensing images. First, based on the encoding-decoding network, we designed a multi-modality fusion module that can simultaneously fuse more useful features and avoid redundant features. This addresses the problem of low classification accuracy for some objects caused by the weak ability of feature representation from single modality data. Second, a novel multi-scale spatial context enhancement module was introduced to improve feature fusion, which solves the problem of a large-scale variation of objects in remote sensing images, and captures long-range spatial relationships between objects. The proposed network and comparative networks were evaluated on two public datasets—the Vaihingen and the Potsdam datasets. It was observed that the proposed network achieves better classification results, with a mean F1-score of 88.6% for the Vaihingen dataset and 92.3% for the Potsdam dataset. Experimental results show that our model is superior to the state-of-the-art network models.


2021 ◽  
Vol 10 (7) ◽  
pp. 488
Author(s):  
Peng Li ◽  
Dezheng Zhang ◽  
Aziguli Wulamu ◽  
Xin Liu ◽  
Peng Chen

A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1426
Author(s):  
Chuanyang Liu ◽  
Yiquan Wu ◽  
Jingjing Liu ◽  
Jiaming Han

Insulator detection is an essential task for the safety and reliable operation of intelligent grids. Owing to insulator images including various background interferences, most traditional image-processing methods cannot achieve good performance. Some You Only Look Once (YOLO) networks are employed to meet the requirements of actual applications for insulator detection. To achieve a good trade-off among accuracy, running time, and memory storage, this work proposes the modified YOLO-tiny for insulator (MTI-YOLO) network for insulator detection in complex aerial images. First of all, composite insulator images are collected in common scenes and the “CCIN_detection” (Chinese Composite INsulator) dataset is constructed. Secondly, to improve the detection accuracy of different sizes of insulator, multi-scale feature detection headers, a structure of multi-scale feature fusion, and the spatial pyramid pooling (SPP) model are adopted to the MTI-YOLO network. Finally, the proposed MTI-YOLO network and the compared networks are trained and tested on the “CCIN_detection” dataset. The average precision (AP) of our proposed network is 17% and 9% higher than YOLO-tiny and YOLO-v2. Compared with YOLO-tiny and YOLO-v2, the running time of the proposed network is slightly higher. Furthermore, the memory usage of the proposed network is 25.6% and 38.9% lower than YOLO-v2 and YOLO-v3, respectively. Experimental results and analysis validate that the proposed network achieves good performance in both complex backgrounds and bright illumination conditions.


Sign in / Sign up

Export Citation Format

Share Document