Multilevel feature fusion dilated convolutional network for semantic segmentation

Recently, convolutional neural network (CNN) has led to significant improvement in the field of computer vision, especially the improvement of the accuracy and speed of semantic segmentation tasks, which greatly improved robot scene perception. In this article, we propose a multilevel feature fusion dilated convolution network (Refine-DeepLab). By improving the space pyramid pooling structure, we propose a multiscale hybrid dilated convolution module, which captures the rich context information and effectively alleviates the contradiction between the receptive field size and the dilated convolution operation. At the same time, the high-level semantic information and low-level semantic information obtained through multi-level and multi-scale feature extraction can effectively improve the capture of global information and improve the performance of large-scale target segmentation. The encoder–decoder gradually recovers spatial information while capturing high-level semantic information, resulting in sharper object boundaries. Extensive experiments verify the effectiveness of our proposed Refine-DeepLab model, evaluate our approaches thoroughly on the PASCAL VOC 2012 data set without MS COCO data set pretraining, and achieve a state-of-art result of 81.73% mean interaction-over-union in the validate set.

Download Full-text

Multiscale Semantic Feature Optimization and Fusion Network for Building Extraction Using High-Resolution Aerial Images and LiDAR Data

Remote Sensing ◽

10.3390/rs13132473 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2473

Author(s):

Qinglie Yuan ◽

Helmi Zulhaidi Mohd Shafri ◽

Aidi Hizami Alias ◽

Shaiful Jahari Hashim

Keyword(s):

High Resolution ◽

Large Scale ◽

Spatial Information ◽

Feature Fusion ◽

Aerial Images ◽

Semantic Gap ◽

Superior Performance ◽

Lidar Data ◽

Building Extraction ◽

Hierarchical Features

Automatic building extraction has been applied in many domains. It is also a challenging problem because of the complex scenes and multiscale. Deep learning algorithms, especially fully convolutional neural networks (FCNs), have shown robust feature extraction ability than traditional remote sensing data processing methods. However, hierarchical features from encoders with a fixed receptive field perform weak ability to obtain global semantic information. Local features in multiscale subregions cannot construct contextual interdependence and correlation, especially for large-scale building areas, which probably causes fragmentary extraction results due to intra-class feature variability. In addition, low-level features have accurate and fine-grained spatial information for tiny building structures but lack refinement and selection, and the semantic gap of across-level features is not conducive to feature fusion. To address the above problems, this paper proposes an FCN framework based on the residual network and provides the training pattern for multi-modal data combining the advantage of high-resolution aerial images and LiDAR data for building extraction. Two novel modules have been proposed for the optimization and integration of multiscale and across-level features. In particular, a multiscale context optimization module is designed to adaptively generate the feature representations for different subregions and effectively aggregate global context. A semantic guided spatial attention mechanism is introduced to refine shallow features and alleviate the semantic gap. Finally, hierarchical features are fused via the feature pyramid network. Compared with other state-of-the-art methods, experimental results demonstrate superior performance with 93.19 IoU, 97.56 OA on WHU datasets and 94.72 IoU, 97.84 OA on the Boston dataset, which shows that the proposed network can improve accuracy and achieve better performance for building extraction.

Download Full-text

DFFAN: Dual Function Feature Aggregation Network for Semantic Segmentation of Land Cover

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030125 ◽

2021 ◽

Vol 10 (3) ◽

pp. 125

Author(s):

Junqing Huang ◽

Liguo Weng ◽

Bingyu Chen ◽

Min Xia

Keyword(s):

Remote Sensing ◽

Land Cover ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Dual Function ◽

Context Information ◽

Remote Sensing Images ◽

Feature Aggregation ◽

Image Context

Analyzing land cover using remote sensing images has broad prospects, the precise segmentation of land cover is the key to the application of this technology. Nowadays, the Convolution Neural Network (CNN) is widely used in many image semantic segmentation tasks. However, existing CNN models often exhibit poor generalization ability and low segmentation accuracy when dealing with land cover segmentation tasks. To solve this problem, this paper proposes Dual Function Feature Aggregation Network (DFFAN). This method combines image context information, gathers image spatial information, and extracts and fuses features. DFFAN uses residual neural networks as backbone to obtain different dimensional feature information of remote sensing images through multiple downsamplings. This work designs Affinity Matrix Module (AMM) to obtain the context of each feature map and proposes Boundary Feature Fusion Module (BFF) to fuse the context information and spatial information of an image to determine the location distribution of each image’s category. Compared with existing methods, the proposed method is significantly improved in accuracy. Its mean intersection over union (MIoU) on the LandCover dataset reaches 84.81%.

Download Full-text

A High-Density Crowd Counting Method Based on Convolutional Feature Fusion

Applied Sciences ◽

10.3390/app8122367 ◽

2018 ◽

Vol 8 (12) ◽

pp. 2367 ◽

Cited By ~ 5

Author(s):

Hongling Luo ◽

Jun Sang ◽

Weiqun Wu ◽

Hong Xiang ◽

Zhili Xiang ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Large Scale ◽

Feature Fusion ◽

High Density ◽

Counting Problem ◽

Crowd Counting ◽

Low Level ◽

Density Map ◽

High Level

In recent years, the trampling events due to overcrowding have occurred frequently, which leads to the demand for crowd counting under a high-density environment. At present, there are few studies on monitoring crowds in a large-scale crowded environment, while there exists technology drawbacks and a lack of mature systems. Aiming to solve the crowd counting problem with high-density under complex environments, a feature fusion-based deep convolutional neural network method FF-CNN (Feature Fusion of Convolutional Neural Network) was proposed in this paper. The proposed FF-CNN mapped the crowd image to its crowd density map, and then obtained the head count by integration. The geometry adaptive kernels were adopted to generate high-quality density maps which were used as ground truths for network training. The deconvolution technique was used to achieve the fusion of high-level and low-level features to get richer features, and two loss functions, i.e., density map loss and absolute count loss, were used for joint optimization. In order to increase the sample diversity, the original images were cropped with a random cropping method for each iteration. The experimental results of FF-CNN on the ShanghaiTech public dataset showed that the fusion of low-level and high-level features can extract richer features to improve the precision of density map estimation, and further improve the accuracy of crowd counting.

Download Full-text

Water Areas Segmentation from Remote Sensing Images Using a Separable Residual SegNet Network

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9040256 ◽

2020 ◽

Vol 9 (4) ◽

pp. 256 ◽

Cited By ~ 1

Author(s):

Liguo Weng ◽

Yiming Xu ◽

Min Xia ◽

Yonghong Zhang ◽

Jia Liu ◽

...

Keyword(s):

Remote Sensing ◽

Feature Extraction ◽

Spatial Information ◽

Global Climate ◽

Semantic Segmentation ◽

Water Area ◽

Remote Sensing Images ◽

The One ◽

High Level ◽

Almost All

Changes on lakes and rivers are of great significance for the study of global climate change. Accurate segmentation of lakes and rivers is critical to the study of their changes. However, traditional water area segmentation methods almost all share the following deficiencies: high computational requirements, poor generalization performance, and low extraction accuracy. In recent years, semantic segmentation algorithms based on deep learning have been emerging. Addressing problems associated to a very large number of parameters, low accuracy, and network degradation during training process, this paper proposes a separable residual SegNet (SR-SegNet) to perform the water area segmentation using remote sensing images. On the one hand, without compromising the ability of feature extraction, the problem of network degradation is alleviated by adding modified residual blocks into the encoder, the number of parameters is limited by introducing depthwise separable convolutions, and the ability of feature extraction is improved by using dilated convolutions to expand the receptive field. On the other hand, SR-SegNet removes the convolution layers with relatively more convolution kernels in the encoding stage, and uses the cascading method to fuse the low-level and high-level features of the image. As a result, the whole network can obtain more spatial information. Experimental results show that the proposed method exhibits significant improvements over several traditional methods, including FCN, DeconvNet, and SegNet.

Download Full-text

Locational privacy-preserving distance computations with intersecting sets of randomly labeled grid points

International Journal of Health Geographics ◽

10.1186/s12942-021-00268-y ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Rainer Schnell ◽

Jonas Klingwort ◽

James M. Farrow

Keyword(s):

Programming Languages ◽

Real World ◽

Spatial Data ◽

Large Scale ◽

Privacy Preserving ◽

Real World Data ◽

Data Set ◽

Additional Information ◽

Grid Points ◽

High Level

Abstract Background We introduce and study a recently proposed method for privacy-preserving distance computations which has received little attention in the scientific literature so far. The method, which is based on intersecting sets of randomly labeled grid points, is henceforth denoted as ISGP allows calculating the approximate distances between masked spatial data. Coordinates are replaced by sets of hash values. The method allows the computation of distances between locations L when the locations at different points in time t are not known simultaneously. The distance between $$L_1$$ L 1 and $$L_2$$ L 2 could be computed even when $$L_2$$ L 2 does not exist at $$t_1$$ t 1 and $$L_1$$ L 1 has been deleted at $$t_2$$ t 2 . An example would be patients from a medical data set and locations of later hospitalizations. ISGP is a new tool for privacy-preserving data handling of geo-referenced data sets in general. Furthermore, this technique can be used to include geographical identifiers as additional information for privacy-preserving record-linkage. To show that the technique can be implemented in most high-level programming languages with a few lines of code, a complete implementation within the statistical programming language R is given. The properties of the method are explored using simulations based on large-scale real-world data of hospitals ($$n=850$$ n = 850 ) and residential locations ($$n=13,000$$ n = 13 , 000 ). The method has already been used in a real-world application. Results ISGP yields very accurate results. Our simulation study showed that—with appropriately chosen parameters – 99 % accuracy in the approximated distances is achieved. Conclusion We discussed a new method for privacy-preserving distance computations in microdata. The method is highly accurate, fast, has low computational burden, and does not require excessive storage.

Download Full-text

JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6994 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12951-12958 ◽

Cited By ~ 3

Author(s):

Lin Zhao ◽

Wenbing Tao

Keyword(s):

Point Cloud ◽

Large Scale ◽

Feature Fusion ◽

Mean Shift ◽

Semantic Segmentation ◽

Point Clouds ◽

Semantic Features ◽

Backbone Network ◽

3D Point Clouds ◽

Instance Segmentation

In this paper, we propose a novel joint instance and semantic segmentation approach, which is called JSNet, in order to address the instance and semantic segmentation of 3D point clouds simultaneously. Firstly, we build an effective backbone network to extract robust features from the raw point clouds. Secondly, to obtain more discriminative features, a point cloud feature fusion module is proposed to fuse the different layer features of the backbone network. Furthermore, a joint instance semantic segmentation module is developed to transform semantic features into instance embedding space, and then the transformed features are further fused with instance features to facilitate instance segmentation. Meanwhile, this module also aggregates instance features into semantic feature space to promote semantic segmentation. Finally, the instance predictions are generated by applying a simple mean-shift clustering on instance embeddings. As a result, we evaluate the proposed JSNet on a large-scale 3D indoor point cloud dataset S3DIS and a part dataset ShapeNet, and compare it with existing approaches. Experimental results demonstrate our approach outperforms the state-of-the-art method in 3D instance segmentation with a significant improvement in 3D semantic prediction and our method is also beneficial for part segmentation. The source code for this work is available at https://github.com/dlinzhao/JSNet.

Download Full-text

MS-AFF: A Novel Semantic Segmentation Approach for Buried Object Based on Multi-scale Attentional Feature Fusion

10.21203/rs.3.rs-193757/v1 ◽

2021 ◽

Author(s):

Chao Lu ◽

Fansheng Chen ◽

Xiaofeng Su ◽

Dan Zeng

Keyword(s):

Deep Learning ◽

Spatial Information ◽

Feature Fusion ◽

Infrared Image ◽

Semantic Segmentation ◽

Target Object ◽

Infrared Images ◽

Feature Maps ◽

Multi Scale ◽

Visible Images

Abstract Infrared technology is a widely used in precision guidance and mine detection since it can capture the heat radiated outward from the target object. We use infrared (IR) thermography to get the infrared image of the buried obje cts. Compared to the visible images, infrared images present poor resolution, low contrast, and fuzzy visual effect, which make it difficult to segment the target object, specifically in the complex backgrounds. In this condition, traditional segmentation methods cannot perform well in infrared images since they are easily disturbed by the noise and non-target objects in the images. With the advance of deep convolutional neural network (CNN), the deep learning-based methods have made significant improvements in semantic segmentation task. However, few of them research Infrared image semantic segmentation, which is a more challenging scenario compared to visible images. Moreover, the lack of an Infrared image dataset is also a problem for current methods based on deep learning. We raise a multi-scale attentional feature fusion (MS-AFF) module for infrared image semantic segmentation to solve this problem. Precisely, we integrate a series of feature maps from different levels by an atrous spatial pyramid structure. In this way, the model can obtain rich representation ability on the infrared images. Besides, a global spatial information attention module is employed to let the model focus on the target region and reduce disturbance in infrared images' background. In addition, we propose an infrared segmentation dataset based on the infrared thermal imaging system. Extensive experiments conducted in the infrared image segmentation dataset show the superiority of our method.

Download Full-text

Top-Down Pyramid Fusion Network for High-Resolution Remote Sensing Semantic Segmentation

Remote Sensing ◽

10.3390/rs13204159 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4159

Author(s):

Yuhang Gu ◽

Jie Hao ◽

Bing Chen ◽

Hai Deng

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Feature Fusion ◽

Semantic Segmentation ◽

Semantic Knowledge ◽

Surface Model ◽

Top Down ◽

Segmentation Accuracy ◽

Fusion Methods ◽

High Level

In recent years, high-resolution remote sensing semantic segmentation based on data fusion has gradually become a research focus in the field of land classification, which is an indispensable task of a smart city. However, the existing feature fusion methods with bottom-up structures can achieve limited fusion results. Alternatively, various auxiliary fusion modules significantly increase the complexity of the models and make the training process intolerably expensive. In this paper, we propose a new lightweight model called top-down pyramid fusion network (TdPFNet) including a multi-source feature extractor, a top-down pyramid fusion module and a decoder. It can deeply fuse features from different sources in a top-down structure using high-level semantic knowledge guiding the fusion of low-level texture information. Digital surface model (DSM) data and open street map (OSM) data are used as auxiliary inputs to the Potsdam dataset for the proposed model evaluation. Experimental results show that the network proposed in this paper not only notably improves the segmentation accuracy, but also reduces the complexity of the multi-source semantic segmentation model.

Download Full-text

HA-MPPNet: Height Aware-Multi Path Parallel Network for High Spatial Resolution Remote Sensing Image Semantic Seg-Mentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100672 ◽

2021 ◽

Vol 10 (10) ◽

pp. 672

Author(s):

Suting Chen ◽

Chaoqun Wu ◽

Mithun Mukherjee ◽

Yujie Zheng

Keyword(s):

Remote Sensing ◽

Spatial Resolution ◽

Spatial Information ◽

Feature Fusion ◽

Semantic Segmentation ◽

Remote Sensing Image ◽

Surface Model ◽

Semantic Features ◽

Low Level ◽

Parallel Network

Semantic segmentation of remote sensing images (RSI) plays a significant role in urban management and land cover classification. Due to the richer spatial information in the RSI, existing convolutional neural network (CNN)-based methods cannot segment images accurately and lose some edge information of objects. In addition, recent studies have shown that leveraging additional 3D geometric data with 2D appearance is beneficial to distinguish the pixels’ category. However, most of them require height maps as additional inputs, which severely limits their applications. To alleviate the above issues, we propose a height aware-multi path parallel network (HA-MPPNet). Our proposed MPPNet first obtains multi-level semantic features while maintaining the spatial resolution in each path for preserving detailed image information. Afterward, gated high-low level feature fusion is utilized to complement the lack of low-level semantics. Then, we designed the height feature decode branch to learn the height features under the supervision of digital surface model (DSM) images and used the learned embeddings to improve semantic context by height feature guide propagation. Note that our module does not need a DSM image as additional input after training and is end-to-end. Our method outperformed other state-of-the-art methods for semantic segmentation on publicly available remote sensing image datasets.

Download Full-text

Point Cloud Semantic Segmentation with Cross-Correction Features

10.21203/rs.3.rs-1218117/v1 ◽

2022 ◽

Author(s):

Yuehua Zhao ◽

Ma Jie ◽

Chong Nannan ◽

Wen Junjie

Keyword(s):

Real Time ◽

Point Cloud ◽

Large Scale ◽

Spatial Information ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Basic Unit ◽

Semantic Features ◽

Point Cloud Segmentation ◽

Scale Point

Abstract Real time large scale point cloud segmentation is an important but challenging task for practical application like autonomous driving. Existing real time methods have achieved acceptance performance by aggregating local information. However, most of them only exploit local spatial information or local semantic information dependently, few considering the complementarity of both. In this paper, we propose a model named Spatial-Semantic Incorporation Network (SSI-Net) for real time large scale point cloud segmentation. A Spatial-Semantic Cross-correction (SSC) module is introduced in SSI-Net as a basic unit. High quality contextual features can be learned through SSC by correct and update semantic features using spatial cues, and vice verse. Adopting the plug-and-play SSC module, we design SSI-Net as an encoder-decoder architecture. To ensure efficiency, it also adopts a random sample based hierarchical network structure. Extensive experiments on several prevalent datasets demonstrate that our method can achieve state-of-the-art performance.

Download Full-text