scholarly journals Multi‐scale attention encoder for street‐to‐aerial image geo‐localization

Author(s):  
Songlian Li ◽  
Zhigang Tu ◽  
Yujin Chen ◽  
Tan Yu
Keyword(s):  
2021 ◽  
Vol 13 (14) ◽  
pp. 2656
Author(s):  
Furong Shi ◽  
Tong Zhang

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.


2015 ◽  
Author(s):  
Shuo Huang ◽  
Siying Chen ◽  
Yinchao Zhang ◽  
Pan Guo ◽  
He Chen

2019 ◽  
Vol 11 (19) ◽  
pp. 2219 ◽  
Author(s):  
Fatemeh Alidoost ◽  
Hossein Arefi ◽  
Federico Tombari

In this study, a deep learning (DL)-based approach is proposed for the detection and reconstruction of buildings from a single aerial image. The pre-required knowledge to reconstruct the 3D shapes of buildings, including the height data as well as the linear elements of individual roofs, is derived from the RGB image using an optimized multi-scale convolutional–deconvolutional network (MSCDN). The proposed network is composed of two feature extraction levels to first predict the coarse features, and then automatically refine them. The predicted features include the normalized digital surface models (nDSMs) and linear elements of roofs in three classes of eave, ridge, and hip lines. Then, the prismatic models of buildings are generated by analyzing the eave lines. The parametric models of individual roofs are also reconstructed using the predicted ridge and hip lines. The experiments show that, even in the presence of noises in height values, the proposed method performs well on 3D reconstruction of buildings with different shapes and complexities. The average root mean square error (RMSE) and normalized median absolute deviation (NMAD) metrics are about 3.43 m and 1.13 m, respectively for the predicted nDSM. Moreover, the quality of the extracted linear elements is about 91.31% and 83.69% for the Potsdam and Zeebrugge test data, respectively. Unlike the state-of-the-art methods, the proposed approach does not need any additional or auxiliary data and employs a single image to reconstruct the 3D models of buildings with the competitive precision of about 1.2 m and 0.8 m for the horizontal and vertical RMSEs over the Potsdam data and about 3.9 m and 2.4 m over the Zeebrugge test data.


2020 ◽  
Vol 1659 ◽  
pp. 012003
Author(s):  
Yaocheng Li ◽  
Weidong Zhang ◽  
Yingming Cai ◽  
Zhe Li ◽  
Xiuchen Jiang

Author(s):  
F. Alidoost ◽  
H. Arefi ◽  
F. Tombari

Abstract. Automatic detection and extraction of buildings from aerial images are considerable challenges in many applications, including disaster management, navigation, urbanization monitoring, emergency responses, 3D city mapping and reconstruction. However, the most important problem is to precisely localize buildings from single aerial images where there is no additional information such as LiDAR point cloud data or high resolution Digital Surface Models (DSMs). In this paper, a Deep Learning (DL)-based approach is proposed to localize buildings, estimate the relative height information, and extract the buildings’ boundaries using a single aerial image. In order to detect buildings and extract the bounding boxes, a Fully Connected Convolutional Neural Network (FC-CNN) is trained to classify building and non-building objects. We also introduced a novel Multi-Scale Convolutional-Deconvolutional Network (MS-CDN) including skip connection layers to predict normalized DSMs (nDSMs) from a single image. The extracted bounding boxes as well as predicted nDSMs are then employed by an Active Contour Model (ACM) to provide precise boundaries of buildings. The experiments show that, even having noises in the predicted nDSMs, the proposed method performs well on single aerial images with different building shapes. The quality rate for building detection is about 86% and the RMSE for nDSM prediction is about 4 m. Also, the accuracy of boundary extraction is about 68%. Since the proposed framework is based on a single image, it could be employed for real time applications.


Author(s):  
X. Zhuo ◽  
F. Kurz ◽  
P. Reinartz

Manned aircraft has long been used for capturing large-scale aerial images, yet the high costs and weather dependence restrict its availability in emergency situations. In recent years, MAV (Micro Aerial Vehicle) emerged as a novel modality for aerial image acquisition. Its maneuverability and flexibility enable a rapid awareness of the scene of interest. Since these two platforms deliver scene information from different scale and different view, it makes sense to fuse these two types of complimentary imagery to achieve a quick, accurate and detailed description of the scene, which is the main concern of real-time situation awareness. This paper proposes a method to fuse multi-view and multi-scale aerial imagery by establishing a common reference frame. In particular, common features among MAV images and geo-referenced airplane images can be extracted by a scale invariant feature detector like SIFT. From the tie point of geo-referenced images we derive the coordinate of corresponding ground points, which are then utilized as ground control points in global bundle adjustment of MAV images. In this way, the MAV block is aligned to the reference frame. Experiment results show that this method can achieve fully automatic geo-referencing of MAV images even if GPS/IMU acquisition has dropouts, and the orientation accuracy is improved compared to the GPS/IMU based georeferencing. The concept for a subsequent 3D classification method is also described in this paper.


2021 ◽  
Vol 11 (11) ◽  
pp. 5069
Author(s):  
Hao Bai ◽  
Tingzhu Bai ◽  
Wei Li ◽  
Xun Liu

Building segmentation is widely used in urban planning, disaster prevention, human flow monitoring and environmental monitoring. However, due to the complex landscapes and highdensity settlements, automatically characterizing building in the urban village or cities using remote sensing images is very challenging. Inspired by the rencent deep learning methods, this paper proposed a novel end-to-end building segmentation network for segmenting buildings from remote sensing images. The network includes two branches: one branch uses Widely Adaptive Spatial Pyramid (WASP) structure to extract multi-scale features, and the other branch uses a deep residual network combined with a sub-pixel up-sampling structure to enhance the detail of building boundaries. We compared our proposed method with three state-of-the-art networks: DeepLabv3+, ENet, ESPNet. Experiments were performed using the publicly available Inria Aerial Image Labelling dataset (Inria aerial dataset) and the Satellite dataset II(East Asia). The results showed that our method outperformed the other networks in the experiments, with Pixel Accuracy reaching 0.8421 and 0.8738, respectively and with mIoU reaching 0.9034 and 0.8936 respectively. Compared with the basic network, it has increased by about 25% or more. It can not only extract building footprints, but also especially small building objects.


Author(s):  
Shiyu Hu ◽  
Qian Ning ◽  
Bingcai Chen ◽  
Yinjie Lei ◽  
Xinzhi Zhou ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document