scholarly journals BUILDING SEGMENTATION FROM AIRBORNE VHR IMAGES USING MASK R-CNN

Author(s):  
K. Zhou ◽  
Y. Chen ◽  
I. Smal ◽  
R. Lindenbergh

<p><strong>Abstract.</strong> Up-to-date 3D building models are important for many applications. Airborne very high resolution (VHR) images often acquired annually give an opportunity to create an up-to-date 3D model. Building segmentation is often the first and utmost step. Convolutional neural networks (CNNs) draw lots of attention in interpreting VHR images as they can learn very effective features for very complex scenes. This paper employs Mask R-CNN to address two problems in building segmentation: detecting different scales of building and segmenting buildings to have accurately segmented edges. Mask R-CNN starts from feature pyramid network (FPN) to create different scales of semantically rich features. FPN is integrated with region proposal network (RPN) to generate objects with various scales with the corresponding optimal scale of features. The features with high and low levels of information are further used for better object classification of small objects and for mask prediction of edges. The method is tested on ISPRS benchmark dataset by comparing results with the fully convolutional networks (FCN), which merge high and low level features by a skip-layer to create a single feature for semantic segmentation. The results show that Mask R-CNN outperforms FCN with around 15% in detecting objects, especially in detecting small objects. Moreover, Mask R-CNN has much better results in edge region than FCN. The results also show that choosing the range of anchor scales in Mask R-CNN is a critical factor in segmenting different scale of objects. This paper provides an insight into how a good anchor scale for different dataset should be chosen.</p>

Author(s):  
Qian Wu ◽  
Jinan Gu ◽  
Chen Wu ◽  
Jin Li

Each pixel can be classified in the image by the semantic segmentation. The segmentation detection results of pixel level can be got which are similar to the contour of the target object. However, the results of semantic segmentation trained by Fully convolutional networks often lead to loss of detail information. This paper proposes a CRF-FCN model based on CRF optimization. Firstly, the original image is detected based on feature pyramid networks, and the target area information is extracted, which is used to train the high-order potential function of CRF. Then, the high-order CRF is used as the back-end of the complete convolution network to optimize the semantic image segmentation. The algorithm comparison experiment shows that our algorithm makes the target details more obvious, and improves the accuracy and efficiency of semantic segmentation.


2020 ◽  
Vol 2020 ◽  
pp. 1-6 ◽  
Author(s):  
Meijun Yang ◽  
Xiaoyan Xiao ◽  
Zhi Liu ◽  
Longkun Sun ◽  
Wei Guo ◽  
...  

Background. Currently, echocardiography has become an essential technology for the diagnosis of cardiovascular diseases. Accurate classification of apical two-chamber (A2C), apical three-chamber (A3C), and apical four-chamber (A4C) views and the precise detection of the left ventricle can significantly reduce the workload of clinicians and improve the reproducibility of left ventricle segmentation. In addition, left ventricle detection is significant for the three-dimensional reconstruction of the heart chambers. Method. RetinaNet is a one-stage object detection algorithm that can achieve high accuracy and efficiency at the same time. RetinaNet is mainly composed of the residual network (ResNet), the feature pyramid network (FPN), and two fully convolutional networks (FCNs); one FCN is for the classification task, and the other is for the border regression task. Results. In this paper, we use the classification subnetwork to classify A2C, A3C, and A4C images and use the regression subnetworks to detect the left ventricle simultaneously. We display not only the position of the left ventricle on the test image but also the view category on the image, which will facilitate the diagnosis. We used the mean intersection-over-union (mIOU) as an index to measure the performance of left ventricle detection and the accuracy as an index to measure the effect of the classification of the three different views. Our study shows that both classification and detection effects are noteworthy. The classification accuracy rates of A2C, A3C, and A4C are 1.000, 0.935, and 0.989, respectively. The mIOU values of A2C, A3C, and A4C are 0.858, 0.794, and 0.838, respectively.


Author(s):  
S. Schmitz ◽  
M. Weinmann ◽  
A. Thiele

Abstract. Inspired by the application of state-of-the-art Fully Convolutional Networks (FCNs) for the semantic segmentation of high-resolution optical imagery, recent works transfer this methodology successfully to pixel-wise land use and land cover (LULC) classification of PolSAR data. So far, mainly single PolSAR images are included in the FCN-based classification processes. To further increase classification accuracy, this paper presents an approach for integrating interferometric coherence derived from co-registered image pairs into a FCN-based classification framework. A network based on an encoder-decoder structure with two separated encoder branches is presented for this task. It extracts features from polarimetric backscattering intensities on the one hand and interferometric coherence on the other hand. Based on a joint representation of the complementary features pixel-wise classification is performed. To overcome the scarcity of labelled SAR data for training and testing, annotations are generated automatically by fusing available LULC products. Experimental evaluation is performed on high-resolution airborne SAR data, captured over the German Wadden Sea. The results demonstrate that the proposed model produces smooth and accurate classification maps. A comparison with a single-branch FCN model indicates that the appropriate integration of interferometric coherence enables the improvement of classification performance.


IEEE Access ◽  
2020 ◽  
pp. 1-1
Author(s):  
Jeremy M. Webb ◽  
Duane D. Meixner ◽  
Shaheeda A. Adusei ◽  
Eric C. Polley ◽  
Mostafa Fatemi ◽  
...  

2021 ◽  
Vol 10 (8) ◽  
pp. 523
Author(s):  
Nicholus Mboga ◽  
Stefano D’Aronco ◽  
Tais Grippa ◽  
Charlotte Pelletier ◽  
Stefanos Georganos ◽  
...  

Multitemporal environmental and urban studies are essential to guide policy making to ultimately improve human wellbeing in the Global South. Land-cover products derived from historical aerial orthomosaics acquired decades ago can provide important evidence to inform long-term studies. To reduce the manual labelling effort by human experts and to scale to large, meaningful regions, we investigate in this study how domain adaptation techniques and deep learning can help to efficiently map land cover in Central Africa. We propose and evaluate a methodology that is based on unsupervised adaptation to reduce the cost of generating reference data for several cities and across different dates. We present the first application of domain adaptation based on fully convolutional networks for semantic segmentation of a dataset of historical panchromatic orthomosaics for land-cover generation for two focus cities Goma-Gisenyi and Bukavu. Our experimental evaluation shows that the domain adaptation methods can reach an overall accuracy between 60% and 70% for different regions. If we add a small amount of labelled data from the target domain, too, further performance gains can be achieved.


2020 ◽  
Vol 10 (7) ◽  
pp. 2528 ◽  
Author(s):  
Lu Deng ◽  
Hong-Hu Chu ◽  
Peng Shi ◽  
Wei Wang ◽  
Xuan Kong

Cracks are often the most intuitive indicators for assessing the condition of in-service structures. Intelligent detection methods based on regular convolutional neural networks (CNNs) have been widely applied to the field of crack detection in recently years; however, these methods exhibit unsatisfying performance on the detection of out-of-plane cracks. To overcome this drawback, a new type of region-based CNN (R-CNN) crack detector with deformable modules is proposed in the present study. The core idea of the method is to replace the traditional regular convolution and pooling operation with a deformable convolution operation and a deformable pooling operation. The idea is implemented on three different regular detectors, namely the Faster R-CNN, region-based fully convolutional networks (R-FCN), and feature pyramid network (FPN)-based Faster R-CNN. To examine the advantages of the proposed method, the results obtained from the proposed detector and corresponding regular detectors are compared. The results show that the addition of deformable modules improves the mean average precisions (mAPs) achieved by the Faster R-CNN, R-FCN, and FPN-based Faster R-CNN for crack detection. More importantly, adding deformable modules enables these detectors to detect the out-of-plane cracks that are difficult for regular detectors to detect.


2020 ◽  
Vol 9 (10) ◽  
pp. 571
Author(s):  
Jinglun Li ◽  
Jiapeng Xiu ◽  
Zhengqiu Yang ◽  
Chen Liu

Semantic segmentation plays an important role in being able to understand the content of remote sensing images. In recent years, deep learning methods based on Fully Convolutional Networks (FCNs) have proved to be effective for the sematic segmentation of remote sensing images. However, the rich information and complex content makes the training of networks for segmentation challenging, and the datasets are necessarily constrained. In this paper, we propose a Convolutional Neural Network (CNN) model called Dual Path Attention Network (DPA-Net) that has a simple modular structure and can be added to any segmentation model to enhance its ability to learn features. Two types of attention module are appended to the segmentation model, one focusing on spatial information the other focusing upon the channel. Then, the outputs of these two attention modules are fused to further improve the network’s ability to extract features, thus contributing to more precise segmentation results. Finally, data pre-processing and augmentation strategies are used to compensate for the small number of datasets and uneven distribution. The proposed network was tested on the Gaofen Image Dataset (GID). The results show that the network outperformed U-Net, PSP-Net, and DeepLab V3+ in terms of the mean IoU by 0.84%, 2.54%, and 1.32%, respectively.


Sign in / Sign up

Export Citation Format

Share Document