scholarly journals A Novel Multi-Scale Attention PFE-UNet for Forest Image Segmentation

Forests ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 937
Author(s):  
Boyang Zhang ◽  
Hongbo Mu ◽  
Mingyu Gao ◽  
Haiming Ni ◽  
Jianfeng Chen ◽  
...  

The precise segmentation of forest areas is essential for monitoring tasks related to forest exploration, extraction, and statistics. However, the effective and accurate segmentation of forest images will be affected by factors such as blurring and discontinuity of forest boundaries. Therefore, a Pyramid Feature Extraction-UNet network (PFE-UNet) based on traditional UNet is proposed to be applied to end-to-end forest image segmentation. Among them, the Pyramid Feature Extraction module (PFE) is introduced in the network transition layer, which obtains multi-scale forest image information through different receptive fields. The spatial attention module (SA) and the channel-wise attention module (CA) are applied to low-level feature maps and PFE feature maps, respectively, to highlight specific segmentation task features while fusing context information and suppressing irrelevant regions. The standard convolution block is replaced by a novel depthwise separable convolutional unit (DSC Unit), which not only reduces the computational cost but also prevents overfitting. This paper presents an extensive evaluation with the DeepGlobe dataset and a comparative analysis with several state-of-the-art networks. The experimental results show that the PFE-UNet network obtains an accuracy of 94.23% in handling the real-time forest image segmentation, which is significantly higher than other advanced networks. This means that the proposed PFE-UNet also provides a valuable reference for the precise segmentation of forest images.

Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1881
Author(s):  
Yuhui Chang ◽  
Jiangtao Xu ◽  
Zhiyuan Gao

To improve the accuracy of stereo matching, the multi-scale dense attention network (MDA-Net) is proposed. The network introduces two novel modules in the feature extraction stage to achieve better exploit of context information: dual-path upsampling (DU) block and attention-guided context-aware pyramid feature extraction (ACPFE) block. The DU block is introduced to fuse different scale feature maps. It introduces sub-pixel convolution to compensate for the loss of information caused by the traditional interpolation upsampling method. The ACPFE block is proposed to extract multi-scale context information. Pyramid atrous convolution is adopted to exploit multi-scale features and the channel-attention is used to fuse the multi-scale features. The proposed network has been evaluated on several benchmark datasets. The three-pixel-error evaluated over all ground truth pixels is 2.10% on KITTI 2015 dataset. The experiment results prove that MDA-Net achieves state-of-the-art accuracy on KITTI 2012 and 2015 datasets.


2020 ◽  
Vol 12 (13) ◽  
pp. 2161 ◽  
Author(s):  
Guang Yang ◽  
Qian Zhang ◽  
Guixu Zhang

Deep learning methods have been used to extract buildings from remote sensing images and have achieved state-of-the-art performance. Most previous work has emphasized the multi-scale fusion of features or the enhancement of more receptive fields to achieve global features rather than focusing on low-level details such as the edges. In this work, we propose a novel end-to-end edge-aware network, the EANet, and an edge-aware loss for getting accurate buildings from aerial images. Specifically, the architecture is composed of image segmentation networks and edge perception networks that, respectively, take charge of building prediction and edge investigation. The International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam segmentation benchmark and the Wuhan University (WHU) building benchmark were used to evaluate our approach, which, respectively, was found to achieve 90.19% and 93.33% intersection-over-union and top performance without using additional datasets, data augmentation, and post-processing. The EANet is effective in extracting buildings from aerial images, which shows that the quality of image segmentation can be improved by focusing on edge details.


2020 ◽  
Vol 17 (4) ◽  
pp. 172988142093606
Author(s):  
Xiaoguo Zhang ◽  
Ye Gao ◽  
Huiqing Wang ◽  
Qing Wang

Effectively and efficiently recognizing multi-scale objects is one of the key challenges of utilizing deep convolutional neural network to the object detection field. YOLOv3 (You only look once v3) is the state-of-the-art object detector with good performance in both aspects of accuracy and speed; however, the scale variation is still the challenging problem which needs to be improved. Considering that the detection performances of multi-scale objects are related to the receptive fields of the network, in this work, we propose a novel dilated spatial pyramid module to integrate multi-scale information to effectively deal with scale variation problem. Firstly, the input of dilated spatial pyramid is fed into multiple parallel branches with different dilation rates to generate feature maps with different receptive fields. Then, the input of dilated spatial pyramid and outputs of different branches are concatenated to integrate multi-scale information. Moreover, dilated spatial pyramid is integrated with YOLOv3 in front of the first detection header to present dilated spatial pyramid-You only look once model. Experiment results on PASCAL VOC2007 demonstrate that dilated spatial pyramid-You only look once model outperforms other state-of-the-art methods in mean average precision, while it still keeps a satisfying real-time detection speed. For 416 × 416 input, dilated spatial pyramid-You only look once model achieves 82.2% mean average precision at 56 frames per second, 3.9% higher than YOLOv3 with only slight speed drops.


Author(s):  
Zhenzhen Yang ◽  
Pengfei Xu ◽  
Yongpeng Yang ◽  
Bing-Kun Bao

The U-Net has become the most popular structure in medical image segmentation in recent years. Although its performance for medical image segmentation is outstanding, a large number of experiments demonstrate that the classical U-Net network architecture seems to be insufficient when the size of segmentation targets changes and the imbalance happens between target and background in different forms of segmentation. To improve the U-Net network architecture, we develop a new architecture named densely connected U-Net (DenseUNet) network in this article. The proposed DenseUNet network adopts a dense block to improve the feature extraction capability and employs a multi-feature fuse block fusing feature maps of different levels to increase the accuracy of feature extraction. In addition, in view of the advantages of the cross entropy and the dice loss functions, a new loss function for the DenseUNet network is proposed to deal with the imbalance between target and background. Finally, we test the proposed DenseUNet network and compared it with the multi-resolutional U-Net (MultiResUNet) and the classic U-Net networks on three different datasets. The experimental results show that the DenseUNet network has significantly performances compared with the MultiResUNet and the classic U-Net networks.


Author(s):  
Rohit Mohan ◽  
Abhinav Valada

AbstractUnderstanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.


2020 ◽  
Vol 34 (07) ◽  
pp. 11693-11700 ◽  
Author(s):  
Ao Luo ◽  
Fan Yang ◽  
Xin Li ◽  
Dong Nie ◽  
Zhicheng Jiao ◽  
...  

Crowd counting is an important yet challenging task due to the large scale and density variation. Recent investigations have shown that distilling rich relations among multi-scale features and exploiting useful information from the auxiliary task, i.e., localization, are vital for this task. Nevertheless, how to comprehensively leverage these relations within a unified network architecture is still a challenging problem. In this paper, we present a novel network structure called Hybrid Graph Neural Network (HyGnn) which targets to relieve the problem by interweaving the multi-scale features for crowd density as well as its auxiliary task (localization) together and performing joint reasoning over a graph. Specifically, HyGnn integrates a hybrid graph to jointly represent the task-specific feature maps of different scales as nodes, and two types of relations as edges: (i) multi-scale relations capturing the feature dependencies across scales and (ii) mutual beneficial relations building bridges for the cooperation between counting and localization. Thus, through message passing, HyGnn can capture and distill richer relations between nodes to obtain more powerful representations, providing robust and accurate results. Our HyGnn performs significantly well on four challenging datasets: ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF_QNRF, outperforming the state-of-the-art algorithms by a large margin.


Author(s):  
Yizhen Chen ◽  
Haifeng Hu

Most existing segmentation networks are built upon a “ U -shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in improving segmentation performance, there are two main drawbacks. On the one hand, the introduction of low-level features brings a significant increase in calculations without an obvious performance gain. On the other hand, general strategies of feature aggregation such as addition and concatenation fuse features without considering the usefulness of each feature vector, which mixes the useful information with massive noises. In this article, we abandon the traditional “ U -shaped” architecture and propose Y-Net, a dual-branch joint network for accurate semantic segmentation. Specifically, it only aggregates the high-level features with low-resolution and utilizes the global context guidance generated by the first branch to refine the second branch. The dual branches are effectively connected through a Semantic Enhancing Module, which can be regarded as the combination of spatial attention and channel attention. We also design a novel Channel-Selective Decoder (CSD) to adaptively integrate features from different receptive fields by assigning specific channelwise weights, where the weights are input-dependent. Our Y-Net is capable of breaking through the limit of singe-branch network and attaining higher performance with less computational cost than “ U -shaped” structure. The proposed CSD can better integrate useful information and suppress interference noises. Comprehensive experiments are carried out on three public datasets to evaluate the effectiveness of our method. Eventually, our Y-Net achieves state-of-the-art performance on PASCAL VOC 2012, PASCAL Person-Part, and ADE20K dataset without pre-training on extra datasets.


2019 ◽  
Vol 9 (13) ◽  
pp. 2686 ◽  
Author(s):  
Jianming Zhang ◽  
Chaoquan Lu ◽  
Jin Wang ◽  
Lei Wang ◽  
Xiao-Guang Yue

In civil engineering, the stability of concrete is of great significance to safety of people’s life and property, so it is necessary to detect concrete damage effectively. In this paper, we treat crack detection on concrete surface as a semantic segmentation task that distinguishes background from crack at the pixel level. Inspired by Fully Convolutional Networks (FCN), we propose a full convolution network based on dilated convolution for concrete crack detection, which consists of an encoder and a decoder. Specifically, we first used the residual network to extract the feature maps of the input image, designed the dilated convolutions with different dilation rates to extract the feature maps of different receptive fields, and fused the extracted features from multiple branches. Then, we exploited the stacked deconvolution to do up-sampling operator in the fused feature maps. Finally, we used the SoftMax function to classify the feature maps at the pixel level. In order to verify the validity of the model, we introduced the commonly used evaluation indicators of semantic segmentation: Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), Mean Intersection over Union (MIoU), and Frequency Weighted Intersection over Union (FWIoU). The experimental results show that the proposed model converges faster and has better generalization performance on the test set by introducing dilated convolutions with different dilation rates and a multi-branch fusion strategy. Our model has a PA of 96.84%, MPA of 92.55%, MIoU of 86.05% and FWIoU of 94.22% on the test set, which is superior to other models.


Author(s):  
Cheng Chen ◽  
Qi Dou ◽  
Hao Chen ◽  
Jing Qin ◽  
Pheng-Ann Heng

This paper presents a novel unsupervised domain adaptation framework, called Synergistic Image and Feature Adaptation (SIFA), to effectively tackle the problem of domain shift. Domain adaptation has become an important and hot topic in recent studies on deep learning, aiming to recover performance degradation when applying the neural networks to new testing domains. Our proposed SIFA is an elegant learning diagram which presents synergistic fusion of adaptations from both image and feature perspectives. In particular, we simultaneously transform the appearance of images across domains and enhance domain-invariance of the extracted features towards the segmentation task. The feature encoder layers are shared by both perspectives to grasp their mutual benefits during the end-to-end learning procedure. Without using any annotation from the target domain, the learning of our unified model is guided by adversarial losses, with multiple discriminators employed from various aspects. We have extensively validated our method with a challenging application of crossmodality medical image segmentation of cardiac structures. Experimental results demonstrate that our SIFA model recovers the degraded performance from 17.2% to 73.0%, and outperforms the state-of-the-art methods by a significant margin.


Sign in / Sign up

Export Citation Format

Share Document