scholarly journals Ship Detection via Dilated Rate Search and Attention-Guided Feature Representation

2021 ◽  
Vol 13 (23) ◽  
pp. 4840
Author(s):  
Jianming Hu ◽  
Xiyang Zhi ◽  
Tianjun Shi ◽  
Lijian Yu ◽  
Wei Zhang

Due to the complexity of scene interference and the variability of ship scale and position, automatic ship detection in remote sensing images makes for challenging research. The existing deep networks rarely design receptive fields that fit the target scale based on training data. Moreover, most of them ignore the effective retention of position information in the feature extraction process, which reduces the contribution of features to subsequent classification. To overcome these limitations, we propose a novel ship detection framework combining the dilated rate selection and attention-guided feature representation strategies, which can efficiently detect ships of different scales under the interference of complex environments such as clouds, sea clutter and mist. Specifically, we present a dilated convolution parameter search strategy to adaptively select the dilated rate for the multi-branch extraction architecture, adaptively obtaining context information of different receptive fields without sacrificing the image resolution. Moreover, to enhance the spatial position information of the feature maps, we calculate the correlation of spatial points from the vertical and horizontal directions and embed it into the channel compression coding process, thus generating the multi-dimensional feature descriptors which are sensitive to direction and position characteristics of ships. Experimental results on the Airbus dataset demonstrate that the proposed method achieves state-of-the-art performance compared with other detection models.

2020 ◽  
Vol 2 (2) ◽  
pp. 173-183
Author(s):  
Nurhalimah Nurhalimah ◽  
I Gede Pasek Suta Wijaya ◽  
Fitri Bimantoro

Songket is one of Indonesia's cultural heritage that is still present today. One of the most famous songket woven fabrics is the Lombok songket. Lombok songket has diverse, unique, and beautiful motifs. However public knowledge of Lombok songket motifs is still minimal and the difference between one motif with another is still unknown. The lack of digitalized data collection is one reason for this. Therefore, we need a system that can classify the Lombok songket automatically. In this study, a system was developed based on texture features and shape features using Linear Discriminant Analysis (LDA). The GLCM method is used in the texture feature extraction process and the Invariant Moment method is used in the feature extraction process. The total data used in this study is 1000 images from 10 Lombok songket motifs which are divided into training data and test data. The highest accuracy is obtained on the Invariant Moment and GLCM feature with an image resolution of 300x300 pixels using the most effective feature that is equal to 96.67%.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1234
Author(s):  
Lei Zha ◽  
Yu Yang ◽  
Zicheng Lai ◽  
Ziwei Zhang ◽  
Juan Wen

In recent years, neural networks for single image super-resolution (SISR) have applied more profound and deeper network structures to extract extra image details, which brings difficulties in model training. To deal with deep model training problems, researchers utilize dense skip connections to promote the model’s feature representation ability by reusing deep features of different receptive fields. Benefiting from the dense connection block, SRDensenet has achieved excellent performance in SISR. Despite the fact that the dense connected structure can provide rich information, it will also introduce redundant and useless information. To tackle this problem, in this paper, we propose a Lightweight Dense Connected Approach with Attention for Single Image Super-Resolution (LDCASR), which employs the attention mechanism to extract useful information in channel dimension. Particularly, we propose the recursive dense group (RDG), consisting of Dense Attention Blocks (DABs), which can obtain more significant representations by extracting deep features with the aid of both dense connections and the attention module, making our whole network attach importance to learning more advanced feature information. Additionally, we introduce the group convolution in DABs, which can reduce the number of parameters to 0.6 M. Extensive experiments on benchmark datasets demonstrate the superiority of our proposed method over five chosen SISR methods.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2547 ◽  
Author(s):  
Wenxin Dai ◽  
Yuqing Mao ◽  
Rongao Yuan ◽  
Yijing Liu ◽  
Xuemei Pu ◽  
...  

Convolution neural network (CNN)-based detectors have shown great performance on ship detections of synthetic aperture radar (SAR) images. However, the performance of current models has not been satisfactory enough for detecting multiscale ships and small-size ones in front of complex backgrounds. To address the problem, we propose a novel SAR ship detector based on CNN, which consist of three subnetworks: the Fusion Feature Extractor Network (FFEN), Region Proposal Network (RPN), and Refine Detection Network (RDN). Instead of using a single feature map, we fuse feature maps in bottom–up and top–down ways and generate proposals from each fused feature map in FFEN. Furthermore, we further merge features generated by the region-of-interest (RoI) pooling layer in RDN. Based on the feature representation strategy, the CNN framework constructed can significantly enhance the location and semantics information for the multiscale ships, in particular for the small ships. On the other hand, the residual block is introduced to increase the network depth, through which the detection precision could be further improved. The public SAR ship dataset (SSDD) and China Gaofen-3 satellite SAR image are used to validate the proposed method. Our method shows excellent performance for detecting the multiscale and small-size ships with respect to some competitive models and exhibits high potential in practical application.


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2815
Author(s):  
Shih-Hung Yang ◽  
Yao-Mao Cheng ◽  
Jyun-We Huang ◽  
Yon-Ping Chen

Automatic fingerspelling recognition tackles the communication barrier between deaf and hearing individuals. However, the accuracy of fingerspelling recognition is reduced by high intra-class variability and low inter-class variability. In the existing methods, regular convolutional kernels, which have limited receptive fields (RFs) and often cannot detect subtle discriminative details, are applied to learn features. In this study, we propose a receptive field-aware network with finger attention (RFaNet) that highlights the finger regions and builds inter-finger relations. To highlight the discriminative details of these fingers, RFaNet reweights the low-level features of the hand depth image with those of the non-forearm image and improves finger localization, even when the wrist is occluded. RFaNet captures neighboring and inter-region dependencies between fingers in high-level features. An atrous convolution procedure enlarges the RFs at multiple scales and a non-local operation computes the interactions between multi-scale feature maps, thereby facilitating the building of inter-finger relations. Thus, the representation of a sign is invariant to viewpoint changes, which are primarily responsible for intra-class variability. On an American Sign Language fingerspelling dataset, RFaNet achieved 1.77% higher classification accuracy than state-of-the-art methods. RFaNet achieved effective transfer learning when the number of labeled depth images was insufficient. The fingerspelling representation of a depth image can be effectively transferred from large- to small-scale datasets via highlighting the finger regions and building inter-finger relations, thereby reducing the requirement for expensive fingerspelling annotations.


2021 ◽  
Vol 11 ◽  
Author(s):  
Jinkui Hao ◽  
Jianyang Xie ◽  
Ri Liu ◽  
Huaying Hao ◽  
Yuhui Ma ◽  
...  

ObjectiveTo develop an accurate and rapid computed tomography (CT)-based interpretable AI system for the diagnosis of lung diseases.BackgroundMost existing AI systems only focus on viral pneumonia (e.g., COVID-19), specifically, ignoring other similar lung diseases: e.g., bacterial pneumonia (BP), which should also be detected during CT screening. In this paper, we propose a unified sequence-based pneumonia classification network, called SLP-Net, which utilizes consecutiveness information for the differential diagnosis of viral pneumonia (VP), BP, and normal control cases from chest CT volumes.MethodsConsidering consecutive images of a CT volume as a time sequence input, compared with previous 2D slice-based or 3D volume-based methods, our SLP-Net can effectively use the spatial information and does not need a large amount of training data to avoid overfitting. Specifically, sequential convolutional neural networks (CNNs) with multi-scale receptive fields are first utilized to extract a set of higher-level representations, which are then fed into a convolutional long short-term memory (ConvLSTM) module to construct axial dimensional feature maps. A novel adaptive-weighted cross-entropy loss (ACE) is introduced to optimize the output of the SLP-Net with a view to ensuring that as many valid features from the previous images as possible are encoded into the later CT image. In addition, we employ sequence attention maps for auxiliary classification to enhance the confidence level of the results and produce a case-level prediction.ResultsFor evaluation, we constructed a dataset of 258 chest CT volumes with 153 VP, 42 BP, and 63 normal control cases, for a total of 43,421 slices. We implemented a comprehensive comparison between our SLP-Net and several state-of-the-art methods across the dataset. Our proposed method obtained significant performance without a large amount of data, outperformed other slice-based and volume-based approaches. The superior evaluation performance achieved in the classification experiments demonstrated the ability of our model in the differential diagnosis of VP, BP and normal cases.


2019 ◽  
Vol 9 (13) ◽  
pp. 2686 ◽  
Author(s):  
Jianming Zhang ◽  
Chaoquan Lu ◽  
Jin Wang ◽  
Lei Wang ◽  
Xiao-Guang Yue

In civil engineering, the stability of concrete is of great significance to safety of people’s life and property, so it is necessary to detect concrete damage effectively. In this paper, we treat crack detection on concrete surface as a semantic segmentation task that distinguishes background from crack at the pixel level. Inspired by Fully Convolutional Networks (FCN), we propose a full convolution network based on dilated convolution for concrete crack detection, which consists of an encoder and a decoder. Specifically, we first used the residual network to extract the feature maps of the input image, designed the dilated convolutions with different dilation rates to extract the feature maps of different receptive fields, and fused the extracted features from multiple branches. Then, we exploited the stacked deconvolution to do up-sampling operator in the fused feature maps. Finally, we used the SoftMax function to classify the feature maps at the pixel level. In order to verify the validity of the model, we introduced the commonly used evaluation indicators of semantic segmentation: Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), Mean Intersection over Union (MIoU), and Frequency Weighted Intersection over Union (FWIoU). The experimental results show that the proposed model converges faster and has better generalization performance on the test set by introducing dilated convolutions with different dilation rates and a multi-branch fusion strategy. Our model has a PA of 96.84%, MPA of 92.55%, MIoU of 86.05% and FWIoU of 94.22% on the test set, which is superior to other models.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1556 ◽  
Author(s):  
Zhenyu Li ◽  
Aiguo Zhou ◽  
Yong Shen

Scene recognition is an essential part in the vision-based robot navigation domain. The successful application of deep learning technology has triggered more extensive preliminary studies on scene recognition, which all use extracted features from networks that are trained for recognition tasks. In the paper, we interpret scene recognition as a region-based image retrieval problem and present a novel approach for scene recognition with an end-to-end trainable Multi-column convolutional neural network (MCNN) architecture. The proposed MCNN utilizes filters with receptive fields of different sizes to have Multi-level and Multi-layer image perception, and consists of three components: front-end, middle-end and back-end. The first seven layers VGG16 are taken as front-end for two-dimensional feature extraction, Inception-A is taken as the middle-end for deeper learning feature representation, and Large-Margin Softmax Loss (L-Softmax) is taken as the back-end for enhancing intra-class compactness and inter-class-separability. Extensive experiments have been conducted to evaluate the performance according to compare our proposed network to existing state-of-the-art methods. Experimental results on three popular datasets demonstrate the robustness and accuracy of our approach. To the best of our knowledge, the presented approach has not been applied for the scene recognition in literature.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 909
Author(s):  
Shuo Li ◽  
Chiru Ge ◽  
Xiaodan Sui ◽  
Yuanjie Zheng ◽  
Weikuan Jia

Cup-to-disc ratio (CDR) is of great importance during assessing structural changes at the optic nerve head (ONH) and diagnosis of glaucoma. While most efforts have been put on acquiring the CDR number through CNN-based segmentation algorithms followed by the calculation of CDR, these methods usually only focus on the features in the convolution kernel, which is, after all, the operation of the local region, ignoring the contribution of rich global features (such as distant pixels) to the current features. In this paper, a new end-to-end channel and spatial attention regression deep learning network is proposed to deduces CDR number from the regression perspective and combine the self-attention mechanism with the regression network. Our network consists of four modules: the feature extraction module to extract deep features expressing the complicated pattern of optic disc (OD) and optic cup (OC), the attention module including the channel attention block (CAB) and the spatial attention block (SAB) to improve feature representation by aggregating long-range contextual information, the regression module to deduce CDR number directly, and the segmentation-auxiliary module to focus the model’s attention on the relevant features instead of the background region. Especially, the CAB selects relatively important feature maps in channel dimension, shifting the emphasis on the OD and OC region; meanwhile, the SAB learns the discriminative ability of feature representation at pixel level by capturing the relationship of intra-feature map. The experimental results of ORIGA dataset show that our method obtains absolute CDR error of 0.067 and the Pearson’s correlation coefficient of 0.694 in estimating CDR and our method has a great potential in predicting the CDR number.


2019 ◽  
Vol 11 (19) ◽  
pp. 2220 ◽  
Author(s):  
Ximin Cui ◽  
Ke Zheng ◽  
Lianru Gao ◽  
Bing Zhang ◽  
Dong Yang ◽  
...  

Jointly using spatial and spectral information has been widely applied to hyperspectral image (HSI) classification. Especially, convolutional neural networks (CNN) have gained attention in recent years due to their detailed representation of features. However, most of CNN-based HSI classification methods mainly use patches as input classifier. This limits the range of use for spatial neighbor information and reduces processing efficiency in training and testing. To overcome this problem, we propose an image-based classification framework that is efficient and straightforward. Based on this framework, we propose a multiscale spatial-spectral CNN for HSIs (HyMSCN) to integrate both multiple receptive fields fused features and multiscale spatial features at different levels. The fused features are exploited using a lightweight block called the multiple receptive field feature block (MRFF), which contains various types of dilation convolution. By fusing multiple receptive field features and multiscale spatial features, the HyMSCN has comprehensive feature representation for classification. Experimental results from three real hyperspectral images prove the efficiency of the proposed framework. The proposed method also achieves superior performance for HSI classification.


2020 ◽  
Vol 12 (12) ◽  
pp. 2031 ◽  
Author(s):  
Shiqi Chen ◽  
Jun Zhang ◽  
Ronghui Zhan

Recently, convolutional neural network (CNN)-based methods have been extensively explored for ship detection in synthetic aperture radar (SAR) images due to their powerful feature representation abilities. However, there are still several obstacles hindering the development. First, ships appear in various scenarios, which makes it difficult to exclude the disruption of the cluttered background. Second, it becomes more complicated to precisely locate the targets with large aspect ratios, arbitrary orientations and dense distributions. Third, the trade-off between accurate localization and improved detection efficiency needs to be considered. To address these issues, this paper presents a rotate refined feature alignment detector (R 2 FA-Det), which ingeniously balances the quality of bounding box prediction and the high speed of the single-stage framework. Specifically, first, we devise a lightweight non-local attention module and embed it into the stem network. The recalibration of features not only strengthens the object-related features yet adequately suppresses the background interference. In addition, both forms of anchors are integrated into our modified anchor mechanism and thus can enable better representation of densely arranged targets with less computation burden. Furthermore, considering the shortcoming of the feature misalignment existing in the cascaded refinement scheme, a feature-guided alignment module which encodes both the position and shape information of current refined anchors into the feature points is adopted. Extensive experimental validations on two SAR ship datasets are performed and the results demonstrate that our algorithm has higher accuracy with faster speed than some state-of-the-art methods.


Sign in / Sign up

Export Citation Format

Share Document