Non-locally Enhanced Feature Fusion Network for Aircraft Recognition in Remote Sensing Images

Yunsheng Xiong; Xin Niu; Yong Dou; Hang Qie; Kang Wang

doi:10.3390/rs12040681

Non-locally Enhanced Feature Fusion Network for Aircraft Recognition in Remote Sensing Images

Remote Sensing ◽

10.3390/rs12040681 ◽

2020 ◽

Vol 12 (4) ◽

pp. 681

Author(s):

Yunsheng Xiong ◽

Xin Niu ◽

Yong Dou ◽

Hang Qie ◽

Kang Wang

Keyword(s):

Remote Sensing ◽

Loss Function ◽

Feature Fusion ◽

Remote Sensing Images ◽

Feature Maps ◽

Long Distance ◽

Test Dataset ◽

Fine Grained ◽

Recognition Ability ◽

Discriminative Parts

Aircraft recognition has great application value, but aircraft in remote sensing images have some problems such as low resolution, poor contrasts, poor sharpness, and lack of details caused by the vertical view, which make the aircraft recognition very difficult. Especially when there are many kinds of aircraft and the differences between aircraft are subtle, the fine-grained recognition of aircraft is more challenging. In this paper, we propose a non-locally enhanced feature fusion network(NLFFNet) and attempt to make full use of the features from discriminative parts of aircraft. First, according to the long-distance self-correlation in aircraft images, we adopt non-locally enhanced operation and guide the network to pay more attention to the discriminating areas and enhance the features beneficial to classification. Second, we propose a part-level feature fusion mechanism(PFF), which crops 5 parts of the aircraft on the shared feature maps, then extracts the subtle features inside the parts through the part full connection layer(PFC) and fuses the features of these parts together through the combined full connection layer(CFC). In addition, by adopting the improved loss function, we can enhance the weight of hard examples in the loss function meanwhile reducing the weight of excessively hard examples, which improves the overall recognition ability of the network. The dataset includes 47 categories of aircraft, including many aircraft of the same family with slight differences in appearance, and our method can achieve 89.12% accuracy on the test dataset, which proves the effectiveness of our method.

Get full-text (via PubEx)

MultiCAM: Multiple Class Activation Mapping for Aircraft Recognition in Remote Sensing Images

Remote Sensing ◽

10.3390/rs11050544 ◽

2019 ◽

Vol 11 (5) ◽

pp. 544 ◽

Cited By ~ 13

Author(s):

Kun Fu ◽

Wei Dai ◽

Yue Zhang ◽

Zhirui Wang ◽

Menglong Yan ◽

...

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

State Of The Art ◽

Remote Sensing Images ◽

Visual Classification ◽

Fine Grained ◽

Object Parts ◽

Multiple Class ◽

Activation Mapping ◽

Discriminative Parts

Aircraft recognition in remote sensing images has long been a meaningful topic. Most related methods treat entire images as a whole and do not concentrate on the features of parts. In fact, a variety of aircraft types have small interclass variance, and the main evidence for classifying subcategories is related to some discriminative object parts. In this paper, we introduce the idea of fine-grained visual classification (FGVC) and attempt to make full use of the features from discriminative object parts. First, multiple class activation mapping (MultiCAM) is proposed to extract the discriminative parts of aircrafts of different categories. Second, we present a mask filter (MF) strategy to enhance the discriminative object parts and filter the interference of the background from original images. Third, a selective connected feature fusion method is proposed to fuse the features extracted from both networks, focusing on the original images and the results of MF, respectively. Compared with the single prediction category in class activation mapping (CAM), MultiCAM makes full use of the predictions of all categories to overcome the wrong discriminative parts produced by a wrong single prediction category. Additionally, the designed MF preserves the object scale information and helps the network to concentrate on the object itself rather than the interfering background. Experiments on a challenging dataset prove that our method can achieve state-of-the-art performance.

Get full-text (via PubEx)

Intelligent Ship Detection in Remote Sensing Images Based on Multi-Layer Convolutional Feature Fusion

Remote Sensing ◽

10.3390/rs12203316 ◽

2020 ◽

Vol 12 (20) ◽

pp. 3316 ◽

Cited By ~ 1

Author(s):

Yulian Zhang ◽

Lihong Guo ◽

Zengfa Wang ◽

Yang Yu ◽

Xinwei Liu ◽

...

Keyword(s):

Remote Sensing ◽

Feature Fusion ◽

Atmospheric Correction ◽

Recall Rate ◽

Google Earth ◽

Superior Performance ◽

Detection Accuracy ◽

Remote Sensing Images ◽

Feature Maps ◽

Ship Detection

Intelligent detection and recognition of ships from high-resolution remote sensing images is an extraordinarily useful task in civil and military reconnaissance. It is difficult to detect ships with high precision because various disturbances are present in the sea such as clouds, mist, islands, coastlines, ripples, and so on. To solve this problem, we propose a novel ship detection network based on multi-layer convolutional feature fusion (CFF-SDN). Our ship detection network consists of three parts. Firstly, the convolutional feature extraction network is used to extract ship features of different levels. Residual connection is introduced so that the model can be designed very deeply, and it is easy to train and converge. Secondly, the proposed network fuses fine-grained features from shallow layers with semantic features from deep layers, which is beneficial for detecting ship targets with different sizes. At the same time, it is helpful to improve the localization accuracy and detection accuracy of small objects. Finally, multiple fused feature maps are used for classification and regression, which can adapt to ships of multiple scales. Since the CFF-SDN model uses a pruning strategy, the detection speed is greatly improved. In the experiment, we create a dataset for ship detection in remote sensing images (DSDR), including actual satellite images from Google Earth and aerial images from electro-optical pod. The DSDR dataset contains not only visible light images, but also infrared images. To improve the robustness to various sea scenes, images under different scales, perspectives and illumination are obtained through data augmentation or affine transformation methods. To reduce the influence of atmospheric absorption and scattering, a dark channel prior is adopted to solve atmospheric correction on the sea scenes. Moreover, soft non-maximum suppression (NMS) is introduced to increase the recall rate for densely arranged ships. In addition, better detection performance is observed in comparison with the existing models in terms of precision rate and recall rate. The experimental results show that the proposed detection model can achieve the superior performance of ship detection in optical remote sensing image.

Get full-text (via PubEx)

ORIENTED VEHICLE DETECTION IN HIGH-RESOLUTION REMOTE SENSING IMAGES BASED ON FEATURE AMPLIFICATION AND CATEGORY BALANCE BY OVERSAMPLING DATA AUGMENTATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b3-2020-153-2020 ◽

2020 ◽

Vol XLIII-B3-2020 ◽

pp. 153-159

Author(s):

N. Mo ◽

L. Yan

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Vehicle Detection ◽

Training Dataset ◽

Remote Sensing Images ◽

Feature Maps ◽

Fine Grained ◽

Bounding Boxes ◽

The Impact ◽

Oriented Bounding Boxes

Abstract. Vehicles usually lack detailed information and are difficult to be trained on the high-resolution remote sensing images because of small size. In addition, vehicles contain multiple fine-grained categories that are slightly different, randomly located and oriented. Therefore, it is difficult to locate and identify these fine categories of vehicles. Considering the above problems in high-resolution remote sensing images, this paper proposes an oriented vehicle detection approach. First of all, we propose an oversampling and stitching method to augment the training dataset by increasing the frequency of objects with fewer training samples in order to balance the number of objects in each fine-grained vehicle category. Then considering the effect of the pooling operations on representing small objects, we propose to improve the resolution of feature maps so that detailed information hidden in feature maps can be enriched and they can better distinguish the fine-grained vehicle categories. Finally, we design a joint training loss function for horizontal and oriented bounding boxes with center loss, to decrease the impact of small between-class diversity on vehicle detection. Experimental verification is performed on the VEDAI dataset consisting of 9 fine-grained vehicle categories so as to evaluate the proposed framework. The experimental results show that the proposed framework performs better than most of competitive approaches in terms of a mean average precision of 60.7% and 60.4% in detecting horizontal and oriented bounding boxes respectively.

Get full-text (via PubEx)

AttentionBased Deep Feature Fusion for the Scene Classification of HighResolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs11171996 ◽

2019 ◽

Vol 11 (17) ◽

pp. 1996 ◽

Cited By ~ 7

Author(s):

Zhu ◽

Yan ◽

Mo ◽

Liu

Keyword(s):

Remote Sensing ◽

Loss Function ◽

Feature Fusion ◽

Cross Entropy ◽

Scene Classification ◽

Remote Sensing Images ◽

Graphic Processing Units ◽

Entropy Loss ◽

Deep Feature

Scene classification of highresolution remote sensing images (HRRSI) is one of the most important means of landcover classification. Deep learning techniques, especially the convolutional neural network (CNN) have been widely applied to the scene classification of HRRSI due to the advancement of graphic processing units (GPU). However, they tend to extract features from the whole images rather than discriminative regions. The visual attention mechanism can force the CNN to focus on discriminative regions, but it may suffer from the influence of intraclass diversity and repeated texture. Motivated by these problems, we propose an attention-based deep feature fusion (ADFF) framework that constitutes three parts, namely attention maps generated by Gradientweighted Class Activation Mapping (GradCAM), a multiplicative fusion of deep features and the centerbased cross-entropy loss function. First of all, we propose to make attention maps generated by GradCAM as an explicit input in order to force the network to concentrate on discriminative regions. Then, deep features derived from original images and attention maps are proposed to be fused by multiplicative fusion in order to consider both improved abilities to distinguish scenes of repeated texture and the salient regions. Finally, the centerbased cross-entropy loss function that utilizes both the cross-entropy loss and center loss function is proposed to backpropagate fused features so as to reduce the effect of intraclass diversity on feature representations. The proposed ADFF architecture is tested on three benchmark datasets to show its performance in scene classification. The experiments confirm that the proposed method outperforms most competitive scene classification methods with an average overall accuracy of 94% under different training ratios.

Get full-text (via PubEx)

Efficient Object Detection Framework and Hardware Architecture for Remote Sensing Images

Remote Sensing ◽

10.3390/rs11202376 ◽

2019 ◽

Vol 11 (20) ◽

pp. 2376 ◽

Cited By ~ 4

Author(s):

Li ◽

Zhang ◽

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Computational Complexity ◽

Object Detection ◽

Graphics Processing Units ◽

Feature Fusion ◽

Hardware Architecture ◽

Single Shot ◽

Remote Sensing Images ◽

Feature Maps

Object detection in remote sensing images on a satellite or aircraft has important economic and military significance and is full of challenges. This task requires not only accurate and efficient algorithms, but also highperformance and low power hardware architecture. However, existing deep learning based object detection algorithms require further optimization in small objects detection, reduced computational complexity and parameter size. Meanwhile, the generalpurpose processor cannot achieve better power efficiency, and the previous design of deep learning processor has still potential for mining parallelism. To address these issues, we propose an efficient contextbased feature fusion single shot multibox detector (CBFFSSD) framework, using lightweight MobileNet as the backbone network to reduce parameters and computational complexity, adding feature fusion units and detecting feature maps to enhance the recognition of small objects and improve detection accuracy. Based on the analysis and optimization of the calculation of each layer in the algorithm, we propose efficient hardware architecture of deep learning processor with multiple neural processing units (NPUs) composed of 2D processing elements (PEs), which can simultaneously calculate multiple output feature maps. The parallel architecture, hierarchical onchip storage organization, and the local register are used to achieve parallel processing, sharing and reuse of data, and make the calculation of processor more efficient. Extensive experiments and comprehensive evaluations on the public NWPU VHR10 dataset and comparisons with some stateoftheart approaches demonstrate the effectiveness and superiority of the proposed framework. Moreover, for evaluating the performance of proposed hardware architecture, we implement it on Xilinx XC7Z100 field programmable gate array (FPGA) and test on the proposed CBFFSSD and VGG16 models. Experimental results show that our processor are more power efficient than general purpose central processing units (CPUs) and graphics processing units (GPUs), and have better performance density than other stateoftheart FPGAbased designs.

Get full-text (via PubEx)

MFANet: A Multi-Level Feature Aggregation Network for Semantic Segmentation of Land Cover

Remote Sensing ◽

10.3390/rs13040731 ◽

2021 ◽

Vol 13 (4) ◽

pp. 731 ◽

Cited By ~ 2

Author(s):

Bingyu Chen ◽

Min Xia ◽

Junqing Huang

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Aerial Images ◽

Semantic Features ◽

Remote Sensing Images ◽

Land Utilization ◽

Feature Maps ◽

Feature Aggregation ◽

Multi Level

Detailed information regarding land utilization/cover is a valuable resource in various fields. In recent years, remote sensing images, especially aerial images, have become higher in resolution and larger span in time and space, and the phenomenon that the objects in an identical category may yield a different spectrum would lead to the fact that relying on spectral features only is often insufficient to accurately segment the target objects. In convolutional neural networks, down-sampling operations are usually used to extract abstract semantic features, which leads to loss of details and fuzzy edges. To solve these problems, the paper proposes a Multi-level Feature Aggregation Network (MFANet), which is improved in two aspects: deep feature extraction and up-sampling feature fusion. Firstly, the proposed Channel Feature Compression module extracts the deep features and filters the redundant channel information from the backbone to optimize the learned context. Secondly, the proposed Multi-level Feature Aggregation Upsample module nestedly uses the idea that high-level features provide guidance information for low-level features, which is of great significance for positioning the restoration of high-resolution remote sensing images. Finally, the proposed Channel Ladder Refinement module is used to refine the restored high-resolution feature maps. Experimental results show that the proposed method achieves state-of-the-art performance 86.45% mean IOU on LandCover dataset.

Get full-text (via PubEx)

Remote Sensing Road Extraction by Road Segmentation Network

Applied Sciences ◽

10.3390/app11115050 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5050

Author(s):

Jiahai Tan ◽

Ming Gao ◽

Kai Yang ◽

Tao Duan

Keyword(s):

Remote Sensing ◽

Attention Mechanism ◽

Context Information ◽

Road Extraction ◽

Remote Sensing Images ◽

Long Distance ◽

The Road ◽

Road Segmentation ◽

Context Characteristics

Road extraction from remote sensing images has attracted much attention in geospatial applications. However, the existing methods do not accurately identify the connectivity of the road. The identification of the road pixels may be interfered with by the abundant ground such as buildings, trees, and shadows. The objective of this paper is to enhance context and strip features of the road by designing UNet-like architecture. The overall method first enhances the context characteristics in the segmentation step and then maintains the stripe characteristics in a refinement step. The segmentation step exploits an attention mechanism to enhance the context information between the adjacent layers. To obtain the strip features of the road, the refinement step introduces the strip pooling in a refinement network to restore the long distance dependent information of the road. Extensive comparative experiments demonstrate that the proposed method outperforms other methods, achieving an overall accuracy of 98.25% on the DeepGlobe dataset, and 97.68% on the Massachusetts dataset.

Get full-text (via PubEx)

Road Extraction from Unmanned Aerial Vehicle Remote Sensing Images Based on Improved Neural Networks

Sensors ◽

10.3390/s19194115 ◽

2019 ◽

Vol 19 (19) ◽

pp. 4115 ◽

Cited By ~ 1

Author(s):

Yuxia Li ◽

Bo Peng ◽

Lei He ◽

Kunlong Fan ◽

Zhenxu Li ◽

...

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Neural Networks ◽

Unmanned Aerial Vehicle ◽

Computational Efficiency ◽

Neural Nets ◽

Road Extraction ◽

Remote Sensing Images ◽

Feature Maps ◽

Aerial Vehicle

Roads are vital components of infrastructure, the extraction of which has become a topic of significant interest in the field of remote sensing. Because deep learning has been a popular method in image processing and information extraction, researchers have paid more attention to extracting road using neural networks. This article proposes the improvement of neural networks to extract roads from Unmanned Aerial Vehicle (UAV) remote sensing images. D-Linknet was first considered for its high performance; however, the huge scale of the net reduced computational efficiency. With a focus on the low computational efficiency problem of the popular D-LinkNet, this article made some improvements: (1) Replace the initial block with a stem block. (2) Rebuild the entire network based on ResNet units with a new structure, allowing for the construction of an improved neural network D-Linknetplus. (3) Add a 1 × 1 convolution layer before DBlock to reduce the input feature maps, reducing parameters and improving computational efficiency. Add another 1 × 1 convolution layer after DBlock to recover the required number of output channels. Accordingly, another improved neural network B-D-LinknetPlus was built. Comparisons were performed between the neural nets, and the verification were made with the Massachusetts Roads Dataset. The results show improved neural networks are helpful in reducing the network size and developing the precision needed for road extraction.

Get full-text (via PubEx)

A Public Dataset for Fine-Grained Ship Classification in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs13040747 ◽

2021 ◽

Vol 13 (4) ◽

pp. 747

Author(s):

Yanghua Di ◽

Zhiguo Jiang ◽

Haopeng Zhang

Keyword(s):

Remote Sensing ◽

Image Data ◽

Remote Sensing Image ◽

Google Earth ◽

Optical Remote Sensing ◽

Remote Sensing Images ◽

Visual Categorization ◽

Class Differences ◽

Fine Grained ◽

Ship Classification

Fine-grained visual categorization (FGVC) is an important and challenging problem due to large intra-class differences and small inter-class differences caused by deformation, illumination, angles, etc. Although major advances have been achieved in natural images in the past few years due to the release of popular datasets such as the CUB-200-2011, Stanford Cars and Aircraft datasets, fine-grained ship classification in remote sensing images has been rarely studied because of relative scarcity of publicly available datasets. In this paper, we investigate a large amount of remote sensing image data of sea ships and determine most common 42 categories for fine-grained visual categorization. Based our previous DSCR dataset, a dataset for ship classification in remote sensing images, we collect more remote sensing images containing warships and civilian ships of various scales from Google Earth and other popular remote sensing image datasets including DOTA, HRSC2016, NWPU VHR-10, We call our dataset FGSCR-42, meaning a dataset for Fine-Grained Ship Classification in Remote sensing images with 42 categories. The whole dataset of FGSCR-42 contains 9320 images of most common types of ships. We evaluate popular object classification algorithms and fine-grained visual categorization algorithms to build a benchmark. Our FGSCR-42 dataset is publicly available at our webpages.

Get full-text (via PubEx)

DFFAN: Dual Function Feature Aggregation Network for Semantic Segmentation of Land Cover

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030125 ◽

2021 ◽

Vol 10 (3) ◽

pp. 125

Author(s):

Junqing Huang ◽

Liguo Weng ◽

Bingyu Chen ◽

Min Xia

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Dual Function ◽

Context Information ◽

Remote Sensing Images ◽

Feature Aggregation ◽

Image Context

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.

Get full-text (via PubEx)