A Novel Detector Based on Convolution Neural Networks for Multiscale SAR Ship Detection in Complex Background

Convolution neural network (CNN)-based detectors have shown great performance on ship detections of synthetic aperture radar (SAR) images. However, the performance of current models has not been satisfactory enough for detecting multiscale ships and small-size ones in front of complex backgrounds. To address the problem, we propose a novel SAR ship detector based on CNN, which consist of three subnetworks: the Fusion Feature Extractor Network (FFEN), Region Proposal Network (RPN), and Refine Detection Network (RDN). Instead of using a single feature map, we fuse feature maps in bottom–up and top–down ways and generate proposals from each fused feature map in FFEN. Furthermore, we further merge features generated by the region-of-interest (RoI) pooling layer in RDN. Based on the feature representation strategy, the CNN framework constructed can significantly enhance the location and semantics information for the multiscale ships, in particular for the small ships. On the other hand, the residual block is introduced to increase the network depth, through which the detection precision could be further improved. The public SAR ship dataset (SSDD) and China Gaofen-3 satellite SAR image are used to validate the proposed method. Our method shows excellent performance for detecting the multiscale and small-size ships with respect to some competitive models and exhibits high potential in practical application.

Download Full-text

Disentangled Feature Learning Network for Vehicle Re-Identification

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/66 ◽

2020 ◽

Author(s):

Yan Bai ◽

Yihang Lou ◽

Yongxing Dai ◽

Jun Liu ◽

Ziqian Chen ◽

...

Keyword(s):

State Of The Art ◽

Feature Learning ◽

Feature Representation ◽

Public Security ◽

The Public ◽

Common Features ◽

Learning Network ◽

Single Feature ◽

Art Performance

Vehicle Re-Identification (ReID) has attracted lots of research efforts due to its great significance to the public security. In vehicle ReID, we aim to learn features that are powerful in discriminating subtle differences between vehicles which are visually similar, and also robust against different orientations of the same vehicle. However, these two characteristics are hard to be encapsulated into a single feature representation simultaneously with unified supervision. Here we propose a Disentangled Feature Learning Network (DFLNet) to learn orientation specific and common features concurrently, which are discriminative at details and invariant to orientations, respectively. Moreover, to effectively use these two types of features for ReID, we further design a feature metric alignment scheme to ensure the consistency of the metric scales. The experiments show the effectiveness of our method that achieves state-of-the-art performance on three challenging datasets.

Download Full-text

Anchor Generation Optimization and Region of Interest Assignment for Vehicle Detection

Sensors ◽

10.3390/s19051089 ◽

2019 ◽

Vol 19 (5) ◽

pp. 1089 ◽

Cited By ~ 3

Author(s):

Ye Wang ◽

Zhenyi Liu ◽

Weiwen Deng

Keyword(s):

Pedestrian Detection ◽

Region Of Interest ◽

Vehicle Detection ◽

Detection Accuracy ◽

Fixed Size ◽

Feature Maps ◽

Feature Map ◽

Bounding Box ◽

New Feature ◽

And Training

Region proposal network (RPN) based object detection, such as Faster Regions with CNN (Faster R-CNN), has gained considerable attention due to its high accuracy and fast speed. However, it has room for improvements when used in special application situations, such as the on-board vehicle detection. Original RPN locates multiscale anchors uniformly on each pixel of the last feature map and classifies whether an anchor is part of the foreground or background with one pixel in the last feature map. The receptive field of each pixel in the last feature map is fixed in the original faster R-CNN and does not coincide with the anchor size. Hence, only a certain part can be seen for large vehicles and too much useless information is contained in the feature for small vehicles. This reduces detection accuracy. Furthermore, the perspective projection results in the vehicle bounding box size becoming related to the bounding box position, thereby reducing the effectiveness and accuracy of the uniform anchor generation method. This reduces both detection accuracy and computing speed. After the region proposal stage, many regions of interest (ROI) are generated. The ROI pooling layer projects an ROI to the last feature map and forms a new feature map with a fixed size for final classification and box regression. The number of feature map pixels in the projected region can also influence the detection performance but this is not accurately controlled in former works. In this paper, the original faster R-CNN is optimized, especially for the on-board vehicle detection. This paper tries to solve these above-mentioned problems. The proposed method is tested on the KITTI dataset and the result shows a significant improvement without too many tricky parameter adjustments and training skills. The proposed method can also be used on other objects with obvious foreshortening effects, such as on-board pedestrian detection. The basic idea of the proposed method does not rely on concrete implementation and thus, most deep learning based object detectors with multiscale feature maps can be optimized with it.

Download Full-text

Feature Statistics Guided Efficient Filter Pruning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/363 ◽

2020 ◽

Author(s):

Hang Li ◽

Chen Ma ◽

Wei Xu ◽

Xue Liu

Keyword(s):

Neural Networks ◽

Real World ◽

Feature Maps ◽

L1 Norm ◽

Feature Map ◽

Single Feature ◽

Real World Applications ◽

Information Diversity ◽

Reliable Performance ◽

Pruning Methods

Building compact convolutional neural networks (CNNs) with reliable performance is a critical but challenging task, especially when deploying them in real-world applications. As a common approach to reduce the size of CNNs, pruning methods delete part of the CNN filters according to some metrics such as l1-norm. However, previous methods hardly leverage the information variance in a single feature map and the similarity characteristics among feature maps. In this paper, we propose a novel filter pruning method, which incorporates two kinds of feature map selections: diversity-aware selection (DFS) and similarity-aware selection (SFS). DFS aims to discover features with low information diversity while SFS removes features that have high similarities with others. We conduct extensive empirical experiments with various CNN architectures on publicly available datasets. The experimental results demonstrate that our model obtains up to 91.6% parameter decrease and 83.7% FLOPs reduction with almost no accuracy loss.

Download Full-text

A Multilayer Fusion Light-Head Detector for SAR Ship Detection

Sensors ◽

10.3390/s19051124 ◽

2019 ◽

Vol 19 (5) ◽

pp. 1124 ◽

Cited By ~ 10

Author(s):

Yunchuan Gui ◽

Xiuhe Li ◽

Lei Xue

Keyword(s):

Deep Neural Networks ◽

Feature Representation ◽

Superior Performance ◽

False Detection ◽

Ship Detection ◽

Single Feature ◽

Great Performance ◽

Position Sensitive ◽

Large Kernel ◽

And Training

Synthetic aperture radar (SAR) ship detection is a heated and challenging problem. Traditional methods are based on hand-crafted feature extraction or limited shallow-learning features representation. Recently, with the excellent ability of feature representation, deep neural networks such as faster region based convolution neural network (FRCN) have shown great performance in object detection tasks. However, several challenges limit the applications of FRCN in SAR ship detection: (1) FRCN with a fixed receptive field cannot match the scale variability of multiscale SAR ship objects, and the performance degrade when the objects are small; (2) as a two-stage detector, FRCN performs an intensive computation and leads to low-speed detection; (3) when the background is complex, the imbalance of easy and hard examples will lead to a high false detection. To tackle the above issues, we design a multilayer fusion light-head detector (MFLHD) for SAR ship detection. Instead of using a single feature map, shallow high-resolution and deep semantic feature are combined to produce region proposal. In detection subnetwork, we propose a light-head detector with large-kernel separable convolution and position sensitive pooling to improve the detection speed. In addition, we adapt focal loss to loss function and training more hard examples to reduce the false alarm. Extensive experiments on SAR ship detection dataset (SSDD) show that the proposed method achieves superior performance in SAR ship detection both in accuracy and speed.

Download Full-text

A Self-Spatial Adaptive Weighting Based U-Net for Image Segmentation

Electronics ◽

10.3390/electronics10030348 ◽

2021 ◽

Vol 10 (3) ◽

pp. 348

Author(s):

Choongsang Cho ◽

Young Han Lee ◽

Jongyoul Park ◽

Sangkeun Lee

Keyword(s):

Image Segmentation ◽

Medical Image ◽

Medical Image Segmentation ◽

Feature Maps ◽

Data Set ◽

Feature Map ◽

Adaptive Weighting ◽

Spatially Adaptive ◽

Wide Range ◽

Decoder Architecture

Semantic image segmentation has a wide range of applications. When it comes to medical image segmentation, its accuracy is even more important than those of other areas because the performance gives useful information directly applicable to disease diagnosis, surgical planning, and history monitoring. The state-of-the-art models in medical image segmentation are variants of encoder-decoder architecture, which is called U-Net. To effectively reflect the spatial features in feature maps in encoder-decoder architecture, we propose a spatially adaptive weighting scheme for medical image segmentation. Specifically, the spatial feature is estimated from the feature maps, and the learned weighting parameters are obtained from the computed map, since segmentation results are predicted from the feature map through a convolutional layer. Especially in the proposed networks, the convolutional block for extracting the feature map is replaced with the widely used convolutional frameworks: VGG, ResNet, and Bottleneck Resent structures. In addition, a bilinear up-sampling method replaces the up-convolutional layer to increase the resolution of the feature map. For the performance evaluation of the proposed architecture, we used three data sets covering different medical imaging modalities. Experimental results show that the network with the proposed self-spatial adaptive weighting block based on the ResNet framework gave the highest IoU and DICE scores in the three tasks compared to other methods. In particular, the segmentation network combining the proposed self-spatially adaptive block and ResNet framework recorded the highest 3.01% and 2.89% improvements in IoU and DICE scores, respectively, in the Nerve data set. Therefore, we believe that the proposed scheme can be a useful tool for image segmentation tasks based on the encoder-decoder architecture.

Download Full-text

High-Resolution SAR Image Classification Using Multi-Scale Deep Feature Fusion and Covariance Pooling Manifold Network

Remote Sensing ◽

10.3390/rs13020328 ◽

2021 ◽

Vol 13 (2) ◽

pp. 328

Author(s):

Wenkai Liang ◽

Yan Wu ◽

Ming Li ◽

Yice Cao ◽

Xin Hu

Keyword(s):

High Resolution ◽

Image Classification ◽

Feature Fusion ◽

Representation Learning ◽

Sar Image ◽

Gabor Filtering ◽

Feature Maps ◽

Sar Images ◽

Multi Scale ◽

Deep Feature

The classification of high-resolution (HR) synthetic aperture radar (SAR) images is of great importance for SAR scene interpretation and application. However, the presence of intricate spatial structural patterns and complex statistical nature makes SAR image classification a challenging task, especially in the case of limited labeled SAR data. This paper proposes a novel HR SAR image classification method, using a multi-scale deep feature fusion network and covariance pooling manifold network (MFFN-CPMN). MFFN-CPMN combines the advantages of local spatial features and global statistical properties and considers the multi-feature information fusion of SAR images in representation learning. First, we propose a Gabor-filtering-based multi-scale feature fusion network (MFFN) to capture the spatial pattern and get the discriminative features of SAR images. The MFFN belongs to a deep convolutional neural network (CNN). To make full use of a large amount of unlabeled data, the weights of each layer of MFFN are optimized by unsupervised denoising dual-sparse encoder. Moreover, the feature fusion strategy in MFFN can effectively exploit the complementary information between different levels and different scales. Second, we utilize a covariance pooling manifold network to extract further the global second-order statistics of SAR images over the fusional feature maps. Finally, the obtained covariance descriptor is more distinct for various land covers. Experimental results on four HR SAR images demonstrate the effectiveness of the proposed method and achieve promising results over other related algorithms.

Download Full-text

Research on Efficient Deep Learning Algorithm Based on ShuffleGhost in the Field of Virtual Reality

Wireless Communications and Mobile Computing ◽

10.1155/2021/1382781 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Bangtong Huang ◽

Hongquan Zhang ◽

Zihong Chen ◽

Lingling Li ◽

Lihua Shi

Keyword(s):

Virtual Reality ◽

Deep Learning ◽

Large Scale ◽

Learning Algorithm ◽

Feature Maps ◽

Embedded Devices ◽

Feature Map ◽

Deep Learning Algorithm ◽

Proper Design ◽

The Cost

Deep learning algorithms are facing the limitation in virtual reality application due to the cost of memory, computation, and real-time computation problem. Models with rigorous performance might suffer from enormous parameters and large-scale structure, and it would be hard to replant them onto embedded devices. In this paper, with the inspiration of GhostNet, we proposed an efficient structure ShuffleGhost to make use of the redundancy in feature maps to alleviate the cost of computations, as well as tackling some drawbacks of GhostNet. Since GhostNet suffers from high computation of convolution in Ghost module and shortcut, the restriction of downsampling would make it more difficult to apply Ghost module and Ghost bottleneck to other backbone. This paper proposes three new kinds of ShuffleGhost structure to tackle the drawbacks of GhostNet. The ShuffleGhost module and ShuffleGhost bottlenecks are utilized by the shuffle layer and group convolution from ShuffleNet, and they are designed to redistribute the feature maps concatenated from Ghost Feature Map and Primary Feature Map. Besides, they eliminate the gap of them and extract the features. Then, SENet layer is adopted to reduce the computation cost of group convolution, as well as evaluating the importance of the feature maps which concatenated from Ghost Feature Maps and Primary Feature Maps and giving proper weights for the feature maps. This paper conducted some experiments and proved that the ShuffleGhostV3 has smaller trainable parameters and FLOPs with the ensurance of accuracy. And with proper design, it could be more efficient in both GPU and CPU side.

Download Full-text

Recognizing Human Actions in Basketball Video Sequences on the Basis of Global and Local Pairwise Representation

Computer Vision ◽

10.4018/978-1-5225-5204-8.ch091 ◽

2018 ◽

pp. 2083-2101

Author(s):

Masaki Takahashi ◽

Masahide Naemura ◽

Mahito Fujii ◽

James J. Little

Keyword(s):

Feature Representation ◽

Video Sequences ◽

Human Actions ◽

The Public ◽

Motion Features ◽

Spatio Temporal ◽

Representation Method ◽

Global And Local ◽

The Relationship ◽

Local Representations

A feature-representation method for recognizing actions in sports videos on the basis of the relationship between human actions and camera motions is proposed. The method involves the following steps: First, keypoint trajectories are extracted as motion features in spatio-temporal sub-regions called “spatio-temporal multiscale bags” (STMBs). Global representations and local representations from one sub-region in the STMBs are then combined to create a “glocal pairwise representation” (GPR). The GPR considers the co-occurrence of camera motions and human actions. Finally, two-stage SVM classifiers are trained with STMB-based GPRs, and specified human actions in video sequences are identified. An experimental evaluation of the recognition accuracy of the proposed method (by using the public OSUPEL basketball video dataset and broadcast videos) demonstrated that the method can robustly detect specific human actions in both public and broadcast basketball video sequences.

Download Full-text

Classification of Mammograms Using Texture and CNN Based Extracted Features

Journal of Biomimetics Biomaterials and Biomedical Engineering ◽

10.4028/www.scientific.net/jbbbe.42.79 ◽

2019 ◽

Vol 42 ◽

pp. 79-97 ◽

Cited By ~ 2

Author(s):

Taye Girma Debelee ◽

Abrham Gebreselasie ◽

Friedhelm Schwenker ◽

Mohammadreza Amirian ◽

Dereje Yohannes

Keyword(s):

Feature Fusion ◽

Region Of Interest ◽

Computation Time ◽

Texture Features ◽

The Public ◽

Gabor Features ◽

Analysis Society ◽

Public Datasets ◽

Mlp Classifier ◽

Sensitivity Specificity

In this paper, a modified adaptive K-means (MAKM) method is proposed to extract the region of interest (ROI) from the local and public datasets. The local image datasets are collected from Bethezata General Hospital (BGH) and the public datasets are from Mammographic Image Analysis Society (MIAS). The same image number is used for both datasets, 112 are abnormal and 208 are normal. Two texture features (GLCM and Gabor) from ROIs and one CNN based extracted features are considered in the experiment. CNN features are extracted using Inception-V3 pre-trained model after simple preprocessing and cropping. The quality of the features are evaluated individually and by fusing features to one another and five classifiers (SVM, KNN, MLP, RF, and NB) are used to measure the descriptive power of the features using cross-validation. The proposed approach was first evaluated on the local dataset and then applied to the public dataset. The results of the classifiers are measured using accuracy, sensitivity, specificity, kappa, computation time and AUC. The experimental analysis made using GLCM features from the two datasets indicates that GLCM features from BGH dataset outperformed that of MIAS dataset in all five classifiers. However, Gabor features from the two datasets scored the best result with two classifiers (SVM and MLP). For BGH and MIAS, SVM scored an accuracy of 99%, 97.46%, the sensitivity of 99.48%, 96.26% and specificity of 98.16%, 100% respectively. And MLP achieved an accuracy of 97%, 87.64%, the sensitivity of 97.40%, 96.65% and specificity of 96.26%, 75.73% respectively. Relatively maximum performance is achieved for feature fusion between Gabor and CNN based extracted features using MLP classifier. However, KNN, MLP, RF, and NB classifiers achieved almost 100% performance for GLCM texture features and SVM scored an accuracy of 96.88%, the sensitivity of 97.14% and specificity of 96.36%. As compared to other classifiers, NB has scored the least computation time in all experiments.

Download Full-text

Multiscale Object Detection in Infrared Streetscape Images Based on Deep Learning and Instance Level Data Augmentation

Applied Sciences ◽

10.3390/app9030565 ◽

2019 ◽

Vol 9 (3) ◽

pp. 565 ◽

Cited By ~ 6

Author(s):

Hao Qu ◽

Lilian Zhang ◽

Xuesong Wu ◽

Xiaofeng He ◽

Xiaoping Hu ◽

...

Keyword(s):

Object Detection ◽

Data Augmentation ◽

Region Of Interest ◽

Complex Environments ◽

Feature Maps ◽

Multi Scale ◽

Level Data ◽

Training Stage ◽

Street Scene ◽

Layer Region

The development of object detection in infrared images has attracted more attention in recent years. However, there are few studies on multi-scale object detection in infrared street scene images. Additionally, the lack of high-quality infrared datasets hinders research into such algorithms. In order to solve these issues, we firstly make a series of modifications based on Faster Region-Convolutional Neural Network (R-CNN). In this paper, a double-layer region proposal network (RPN) is proposed to predict proposals of different scales on both fine and coarse feature maps. Secondly, a multi-scale pooling module is introduced into the backbone of the network to explore the response of objects on different scales. Furthermore, the inception4 module and the position sensitive region of interest (ROI) align (PSalign) pooling layer are utilized to explore richer features of the objects. Thirdly, this paper proposes instance level data augmentation, which takes into account the imbalance between categories while enlarging dataset. In the training stage, the online hard example mining method is utilized to further improve the robustness of the algorithm in complex environments. The experimental results show that, compared with baseline, our detection method has state-of-the-art performance.

Download Full-text