BUILDING SEGMENTATION FROM AIRBORNE VHR IMAGES USING MASK R-CNN

<p><strong>Abstract.</strong> Up-to-date 3D building models are important for many applications. Airborne very high resolution (VHR) images often acquired annually give an opportunity to create an up-to-date 3D model. Building segmentation is often the first and utmost step. Convolutional neural networks (CNNs) draw lots of attention in interpreting VHR images as they can learn very effective features for very complex scenes. This paper employs Mask R-CNN to address two problems in building segmentation: detecting different scales of building and segmenting buildings to have accurately segmented edges. Mask R-CNN starts from feature pyramid network (FPN) to create different scales of semantically rich features. FPN is integrated with region proposal network (RPN) to generate objects with various scales with the corresponding optimal scale of features. The features with high and low levels of information are further used for better object classification of small objects and for mask prediction of edges. The method is tested on ISPRS benchmark dataset by comparing results with the fully convolutional networks (FCN), which merge high and low level features by a skip-layer to create a single feature for semantic segmentation. The results show that Mask R-CNN outperforms FCN with around 15% in detecting objects, especially in detecting small objects. Moreover, Mask R-CNN has much better results in edge region than FCN. The results also show that choosing the range of anchor scales in Mask R-CNN is a critical factor in segmenting different scale of objects. This paper provides an insight into how a good anchor scale for different dataset should be chosen.</p>

Download Full-text

Fully convolutional networks semantic segmentation based on conditional random field optimization

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-214867 ◽

2021 ◽

pp. 1-11

Author(s):

Qian Wu ◽

Jinan Gu ◽

Chen Wu ◽

Jin Li

Keyword(s):

Conditional Random Field ◽

Semantic Segmentation ◽

Target Object ◽

High Order ◽

Target Area ◽

Original Image ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Algorithm Comparison ◽

Feature Pyramid

Each pixel can be classified in the image by the semantic segmentation. The segmentation detection results of pixel level can be got which are similar to the contour of the target object. However, the results of semantic segmentation trained by Fully convolutional networks often lead to loss of detail information. This paper proposes a CRF-FCN model based on CRF optimization. Firstly, the original image is detected based on feature pyramid networks, and the target area information is extracted, which is used to train the high-order potential function of CRF. Then, the high-order CRF is used as the back-end of the complete convolution network to optimize the semantic image segmentation. The algorithm comparison experiment shows that our algorithm makes the target details more obvious, and improves the accuracy and efficiency of semantic segmentation.

Download Full-text

Deep RetinaNet for Dynamic Left Ventricle Detection in Multiview Echocardiography Classification

Scientific Programming ◽

10.1155/2020/7025403 ◽

2020 ◽

Vol 2020 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Meijun Yang ◽

Xiaoyan Xiao ◽

Zhi Liu ◽

Longkun Sun ◽

Wei Guo ◽

...

Keyword(s):

Left Ventricle ◽

Three Dimensional ◽

Detection Algorithm ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Dimensional Reconstruction ◽

Feature Pyramid ◽

The Mean ◽

Accuracy Rates

Background. Currently, echocardiography has become an essential technology for the diagnosis of cardiovascular diseases. Accurate classification of apical two-chamber (A2C), apical three-chamber (A3C), and apical four-chamber (A4C) views and the precise detection of the left ventricle can significantly reduce the workload of clinicians and improve the reproducibility of left ventricle segmentation. In addition, left ventricle detection is significant for the three-dimensional reconstruction of the heart chambers. Method. RetinaNet is a one-stage object detection algorithm that can achieve high accuracy and efficiency at the same time. RetinaNet is mainly composed of the residual network (ResNet), the feature pyramid network (FPN), and two fully convolutional networks (FCNs); one FCN is for the classification task, and the other is for the border regression task. Results. In this paper, we use the classification subnetwork to classify A2C, A3C, and A4C images and use the regression subnetworks to detect the left ventricle simultaneously. We display not only the position of the left ventricle on the test image but also the view category on the image, which will facilitate the diagnosis. We used the mean intersection-over-union (mIOU) as an index to measure the performance of left ventricle detection and the accuracy as an index to measure the effect of the classification of the three different views. Our study shows that both classification and detection effects are noteworthy. The classification accuracy rates of A2C, A3C, and A4C are 1.000, 0.935, and 0.989, respectively. The mIOU values of A2C, A3C, and A4C are 0.858, 0.794, and 0.838, respectively.

Download Full-text

INCORPORATING INTERFEROMETRIC COHERENCE INTO LULC CLASSIFICATION OF AIRBORNE POLSAR-IMAGES USING FULLY CONVOLUTIONAL NETWORKS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b1-2020-115-2020 ◽

2020 ◽

Vol XLIII-B1-2020 ◽

pp. 115-122

Author(s):

S. Schmitz ◽

M. Weinmann ◽

A. Thiele

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Classification Performance ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Classification Framework ◽

Sar Data ◽

Image Pairs ◽

Airborne Sar

Abstract. Inspired by the application of state-of-the-art Fully Convolutional Networks (FCNs) for the semantic segmentation of high-resolution optical imagery, recent works transfer this methodology successfully to pixel-wise land use and land cover (LULC) classification of PolSAR data. So far, mainly single PolSAR images are included in the FCN-based classification processes. To further increase classification accuracy, this paper presents an approach for integrating interferometric coherence derived from co-registered image pairs into a FCN-based classification framework. A network based on an encoder-decoder structure with two separated encoder branches is presented for this task. It extracts features from polarimetric backscattering intensities on the one hand and interferometric coherence on the other hand. Based on a joint representation of the complementary features pixel-wise classification is performed. To overcome the scarcity of labelled SAR data for training and testing, annotations are generated automatically by fusing available LULC products. Experimental evaluation is performed on high-resolution airborne SAR data, captured over the German Wadden Sea. The results demonstrate that the proposed model produces smooth and accurate classification maps. A comparison with a single-branch FCN model indicates that the appropriate integration of interferometric coherence enables the improvement of classification performance.

Download Full-text

Automatic Deep Learning Semantic Segmentation of Ultrasound Thyroid Cineclips using Recurrent Fully Convolutional Networks

IEEE Access ◽

10.1109/access.2020.3045906 ◽

2020 ◽

pp. 1-1

Author(s):

Jeremy M. Webb ◽

Duane D. Meixner ◽

Shaheeda A. Adusei ◽

Eric C. Polley ◽

Mostafa Fatemi ◽

...

Keyword(s):

Deep Learning ◽

Semantic Segmentation ◽

Convolutional Networks ◽

Fully Convolutional Networks

Download Full-text

A Hybrid Semantic Segmentation Based on Level-Set Evolution Driven by Fully Convolutional Networks

IEEE Access ◽

10.1109/access.2021.3066515 ◽

2021 ◽

Vol 9 ◽

pp. 42556-42567

Author(s):

Meng Wang ◽

Yi Ma ◽

Fan Li ◽

Zhengbing Guo

Keyword(s):

Level Set ◽

Semantic Segmentation ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Level Set Evolution

Download Full-text

Domain Adaptation for Semantic Segmentation of Historical Panchromatic Orthomosaics in Central Africa

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10080523 ◽

2021 ◽

Vol 10 (8) ◽

pp. 523

Author(s):

Nicholus Mboga ◽

Stefano D’Aronco ◽

Tais Grippa ◽

Charlotte Pelletier ◽

Stefanos Georganos ◽

...

Keyword(s):

Land Cover ◽

Domain Adaptation ◽

Central Africa ◽

Semantic Segmentation ◽

Target Domain ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

The Cost ◽

Performance Gains

Multitemporal environmental and urban studies are essential to guide policy making to ultimately improve human wellbeing in the Global South. Land-cover products derived from historical aerial orthomosaics acquired decades ago can provide important evidence to inform long-term studies. To reduce the manual labelling effort by human experts and to scale to large, meaningful regions, we investigate in this study how domain adaptation techniques and deep learning can help to efficiently map land cover in Central Africa. We propose and evaluate a methodology that is based on unsupervised adaptation to reduce the cost of generating reference data for several cities and across different dates. We present the first application of domain adaptation based on fully convolutional networks for semantic segmentation of a dataset of historical panchromatic orthomosaics for land-cover generation for two focus cities Goma-Gisenyi and Bukavu. Our experimental evaluation shows that the domain adaptation methods can reach an overall accuracy between 60% and 70% for different regions. If we add a small amount of labelled data from the target domain, too, further performance gains can be achieved.

Download Full-text

Region-Based CNN Method with Deformable Modules for Visually Classifying Concrete Cracks

Applied Sciences ◽

10.3390/app10072528 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2528 ◽

Cited By ~ 5

Author(s):

Lu Deng ◽

Hong-Hu Chu ◽

Peng Shi ◽

Wei Wang ◽

Xuan Kong

Keyword(s):

Crack Detection ◽

Detection Methods ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Out Of Plane ◽

Intelligent Detection ◽

New Type ◽

Feature Pyramid ◽

Crack Detector ◽

Core Idea

Cracks are often the most intuitive indicators for assessing the condition of in-service structures. Intelligent detection methods based on regular convolutional neural networks (CNNs) have been widely applied to the field of crack detection in recently years; however, these methods exhibit unsatisfying performance on the detection of out-of-plane cracks. To overcome this drawback, a new type of region-based CNN (R-CNN) crack detector with deformable modules is proposed in the present study. The core idea of the method is to replace the traditional regular convolution and pooling operation with a deformable convolution operation and a deformable pooling operation. The idea is implemented on three different regular detectors, namely the Faster R-CNN, region-based fully convolutional networks (R-FCN), and feature pyramid network (FPN)-based Faster R-CNN. To examine the advantages of the proposed method, the results obtained from the proposed detector and corresponding regular detectors are compared. The results show that the addition of deformable modules improves the mean average precisions (mAPs) achieved by the Faster R-CNN, R-FCN, and FPN-based Faster R-CNN for crack detection. More importantly, adding deformable modules enables these detectors to detect the out-of-plane cracks that are difficult for regular detectors to detect.

Download Full-text

Dual Path Attention Net for Remote Sensing Semantic Image Segmentation

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9100571 ◽

2020 ◽

Vol 9 (10) ◽

pp. 571

Author(s):

Jinglun Li ◽

Jiapeng Xiu ◽

Zhengqiu Yang ◽

Chen Liu

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Augmentation Strategies ◽

Rich Information ◽

The Mean ◽

The Rich

Semantic segmentation plays an important role in being able to understand the content of remote sensing images. In recent years, deep learning methods based on Fully Convolutional Networks (FCNs) have proved to be effective for the sematic segmentation of remote sensing images. However, the rich information and complex content makes the training of networks for segmentation challenging, and the datasets are necessarily constrained. In this paper, we propose a Convolutional Neural Network (CNN) model called Dual Path Attention Network (DPA-Net) that has a simple modular structure and can be added to any segmentation model to enhance its ability to learn features. Two types of attention module are appended to the segmentation model, one focusing on spatial information the other focusing upon the channel. Then, the outputs of these two attention modules are fused to further improve the network’s ability to extract features, thus contributing to more precise segmentation results. Finally, data pre-processing and augmentation strategies are used to compensate for the small number of datasets and uneven distribution. The proposed network was tested on the Gaofen Image Dataset (GID). The results show that the network outperformed U-Net, PSP-Net, and DeepLab V3+ in terms of the mean IoU by 0.84%, 2.54%, and 1.32%, respectively.

Download Full-text