scholarly journals GSCA-UNet: Towards Automatic Shadow Detection in Urban Aerial Imagery with Global-Spatial-Context Attention Module

2020 ◽  
Vol 12 (17) ◽  
pp. 2864 ◽  
Author(s):  
Yuwei Jin ◽  
Wenbo Xu ◽  
Zhongwen Hu ◽  
Haitao Jia ◽  
Xin Luo ◽  
...  

As an inevitable phenomenon in most optical remote-sensing images, the effect of shadows is prominent in urban scenes. Shadow detection is critical for exploiting shadows and recovering the distorted information. Unfortunately, in general, automatic shadow detection methods for urban aerial images cannot achieve satisfactory performance due to the limitation of feature patterns and the lack of consideration of non-local contextual information. To address this challenging problem, the global-spatial-context-attention (GSCA) module was developed to self-adaptively aggregate all global contextual information over the spatial dimension for each pixel in this paper. The GSCA module was embedded into a modified U-shaped encoder–decoder network that was derived from the UNet network to output the final shadow predictions. The network was trained on a newly created shadow detection dataset, and the binary cross-entropy (BCE) loss function was modified to enhance the training procedure. The performance of the proposed method was evaluated on several typical urban aerial images. Experiment results suggested that the proposed method achieved a better trade-off between automaticity and accuracy. The F1-score, overall accuracy, balanced-error-rate, and intersection-over-union metrics of the proposed method were higher than those of other state-of-the-art shadow detection methods.

Author(s):  
Suhaib Musleh ◽  
Muhammad Sarfraz ◽  
Hazem Raafat

Shadows occur very frequently in digital images while considering them for various important applications. Shadow is considered as a source of noise and can cause false image colors, loss of information, and false image segmentation. Thus, it is required to detect and remove shadows from images. This chapter addresses the problem of shadow detection in high-resolution aerial images. It presents the required main concepts to introduce for the subject. These concepts are the main knowledge units that provide for the reader a better understanding of the subject of shadow detection and furthering the research. Additionally, an overview of various shadow detection methods is provided together with a detailed comparative study. The results of these methods are also discussed extensively by investigating their main features used in the process to detect the shadows accurately.


2019 ◽  
Vol 9 (6) ◽  
pp. 1128 ◽  
Author(s):  
Yundong Li ◽  
Wei Hu ◽  
Han Dong ◽  
Xueyan Zhang

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2803
Author(s):  
Rabeea Jaffari ◽  
Manzoor Ahmed Hashmani ◽  
Constantino Carlos Reyes-Aldasoro

The segmentation of power lines (PLs) from aerial images is a crucial task for the safe navigation of unmanned aerial vehicles (UAVs) operating at low altitudes. Despite the advances in deep learning-based approaches for PL segmentation, these models are still vulnerable to the class imbalance present in the data. The PLs occupy only a minimal portion (1–5%) of the aerial images as compared to the background region (95–99%). Generally, this class imbalance problem is addressed via the use of PL-specific detectors in conjunction with the popular class balanced cross entropy (BBCE) loss function. However, these PL-specific detectors do not work outside their application areas and a BBCE loss requires hyperparameter tuning for class-wise weights, which is not trivial. Moreover, the BBCE loss results in low dice scores and precision values and thus, fails to achieve an optimal trade-off between dice scores, model accuracy, and precision–recall values. In this work, we propose a generalized focal loss function based on the Matthews correlation coefficient (MCC) or the Phi coefficient to address the class imbalance problem in PL segmentation while utilizing a generic deep segmentation architecture. We evaluate our loss function by improving the vanilla U-Net model with an additional convolutional auxiliary classifier head (ACU-Net) for better learning and faster model convergence. The evaluation of two PL datasets, namely the Mendeley Power Line Dataset and the Power Line Dataset of Urban Scenes (PLDU), where PLs occupy around 1% and 2% of the aerial images area, respectively, reveal that our proposed loss function outperforms the popular BBCE loss by 16% in PL dice scores on both the datasets, 19% in precision and false detection rate (FDR) values for the Mendeley PL dataset and 15% in precision and FDR values for the PLDU with a minor degradation in the accuracy and recall values. Moreover, our proposed ACU-Net outperforms the baseline vanilla U-Net for the characteristic evaluation parameters in the range of 1–10% for both the PL datasets. Thus, our proposed loss function with ACU-Net achieves an optimal trade-off for the characteristic evaluation parameters without any bells and whistles. Our code is available at Github.


Author(s):  
Yi-Ta Hsieh ◽  
Shou-Tsung Wu ◽  
Chaur-Tzuhn Chen ◽  
Jan-Chang Chen

The shadows in optical remote sensing images are regarded as image nuisances in numerous applications. The classification and interpretation of shadow area in a remote sensing image are a challenge, because of the reduction or total loss of spectral information in those areas. In recent years, airborne multispectral aerial image devices have been developed 12-bit or higher radiometric resolution data, including Leica ADS-40, Intergraph DMC. The increased radiometric resolution of digital imagery provides more radiometric details of potential use in classification or interpretation of land cover of shadow areas. Therefore, the objectives of this study are to analyze the spectral properties of the land cover in the shadow areas by ADS-40 high radiometric resolution aerial images, and to investigate the spectral and vegetation index differences between the various shadow and non-shadow land covers. According to research findings of spectral analysis of ADS-40 image: (i) The DN values in shadow area are much lower than in nonshadow area; (ii) DN values received from shadowed areas that will also be affected by different land cover, and it shows the possibility of land cover property retrieval as in nonshadow area; (iii) The DN values received from shadowed regions decrease in the visible band from short to long wavelengths due to scattering; (iv) The shadow area NIR of vegetation category also shows a strong reflection; (v) Generally, vegetation indexes (NDVI) still have utility to classify the vegetation and non-vegetation in shadow area. The spectral data of high radiometric resolution images (ADS-40) is potential for the extract land cover information of shadow areas.


2020 ◽  
Vol 12 (9) ◽  
pp. 1435 ◽  
Author(s):  
Chengyuan Li ◽  
Bin Luo ◽  
Hailong Hong ◽  
Xin Su ◽  
Yajun Wang ◽  
...  

Different from object detection in natural image, optical remote sensing object detection is a challenging task, due to the diverse meteorological conditions, complex background, varied orientations, scale variations, etc. In this paper, to address this issue, we propose a novel object detection network (the global-local saliency constraint network, GLS-Net) that can make full use of the global semantic information and achieve more accurate oriented bounding boxes. More precisely, to improve the quality of the region proposals and bounding boxes, we first propose a saliency pyramid which combines a saliency algorithm with a feature pyramid network, to reduce the impact of complex background. Based on the saliency pyramid, we then propose a global attention module branch to enhance the semantic connection between the target and the global scenario. A fast feature fusion strategy is also used to combine the local object information based on the saliency pyramid with the global semantic information optimized by the attention mechanism. Finally, we use an angle-sensitive intersection over union (IoU) method to obtain a more accurate five-parameter representation of the oriented bounding boxes. Experiments with a publicly available object detection dataset for aerial images demonstrate that the proposed GLS-Net achieves a state-of-the-art detection performance.


Author(s):  
Xiang He ◽  
Sibei Yang ◽  
Guanbin Li ◽  
Haofeng Li ◽  
Huiyou Chang ◽  
...  

Recent progress in biomedical image segmentation based on deep convolutional neural networks (CNNs) has drawn much attention. However, its vulnerability towards adversarial samples cannot be overlooked. This paper is the first one that discovers that all the CNN-based state-of-the-art biomedical image segmentation models are sensitive to adversarial perturbations. This limits the deployment of these methods in safety-critical biomedical fields. In this paper, we discover that global spatial dependencies and global contextual information in a biomedical image can be exploited to defend against adversarial attacks. To this end, non-local context encoder (NLCE) is proposed to model short- and long-range spatial dependencies and encode global contexts for strengthening feature activations by channel-wise attention. The NLCE modules enhance the robustness and accuracy of the non-local context encoding network (NLCEN), which learns robust enhanced pyramid feature representations with NLCE modules, and then integrates the information across different levels. Experiments on both lung and skin lesion segmentation datasets have demonstrated that NLCEN outperforms any other state-of-the-art biomedical image segmentation methods against adversarial attacks. In addition, NLCE modules can be applied to improve the robustness of other CNN-based biomedical image segmentation methods.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1651 ◽  
Author(s):  
Suk-Ju Hong ◽  
Yunhyeok Han ◽  
Sang-Yeon Kim ◽  
Ah-Yeong Lee ◽  
Ghiseok Kim

Wild birds are monitored with the important objectives of identifying their habitats and estimating the size of their populations. Especially in the case of migratory bird, they are significantly recorded during specific periods of time to forecast any possible spread of animal disease such as avian influenza. This study led to the construction of deep-learning-based object-detection models with the aid of aerial photographs collected by an unmanned aerial vehicle (UAV). The dataset containing the aerial photographs includes diverse images of birds in various bird habitats and in the vicinity of lakes and on farmland. In addition, aerial images of bird decoys are captured to achieve various bird patterns and more accurate bird information. Bird detection models such as Faster Region-based Convolutional Neural Network (R-CNN), Region-based Fully Convolutional Network (R-FCN), Single Shot MultiBox Detector (SSD), Retinanet, and You Only Look Once (YOLO) were created and the performance of all models was estimated by comparing their computing speed and average precision. The test results show Faster R-CNN to be the most accurate and YOLO to be the fastest among the models. The combined results demonstrate that the use of deep-learning-based detection methods in combination with UAV aerial imagery is fairly suitable for bird detection in various environments.


2018 ◽  
Vol 10 (12) ◽  
pp. 2068 ◽  
Author(s):  
Juha Suomalainen ◽  
Teemu Hakala ◽  
Raquel Alves de Oliveira ◽  
Lauri Markelin ◽  
Niko Viljanen ◽  
...  

In unstable atmospheric conditions, using on-board irradiance sensors is one of the only robust methods to convert unmanned aerial vehicle (UAV)-based optical remote sensing data to reflectance factors. Normally, such sensors experience significant errors due to tilting of the UAV, if not installed on a stabilizing gimbal. Unfortunately, such gimbals of sufficient accuracy are heavy, cumbersome, and cannot be installed on all UAV platforms. In this paper, we present the FGI Aerial Image Reference System (FGI AIRS) developed at the Finnish Geospatial Research Institute (FGI) and a novel method for optical and mathematical tilt correction of the irradiance measurements. The FGI AIRS is a sensor unit for UAVs that provides the irradiance spectrum, Real Time Kinematic (RTK)/Post Processed Kinematic (PPK) GNSS position, and orientation for the attached cameras. The FGI AIRS processes the reference data in real time for each acquired image and can send it to an on-board or on-cloud processing unit. The novel correction method is based on three RGB photodiodes that are tilted 10° in opposite directions. These photodiodes sample the irradiance readings at different sensor tilts, from which reading of a virtual horizontal irradiance sensor is calculated. The FGI AIRS was tested, and the method was shown to allow on-board measurement of irradiance at an accuracy better than ±0.8% at UAV tilts up to 10° and ±1.2% at tilts up to 15°. In addition, the accuracy of FGI AIRS to produce reflectance-factor-calibrated aerial images was compared against the traditional methods. In the unstable weather conditions of the experiment, both the FGI AIRS and the on-ground spectrometer were able to produce radiometrically accurate and visually pleasing orthomosaics, while the reflectance reference panels and the on-board irradiance sensor without stabilization or tilt correction both failed to do so. The authors recommend the implementation of the proposed tilt correction method in all future UAV irradiance sensors if they are not to be installed on a gimbal.


Author(s):  
N. M. S. M. Kadhim ◽  
M. Mourshed ◽  
M. T. Bray

Very-High-Resolution (VHR) satellite imagery is a powerful source of data for detecting and extracting information about urban constructions. Shadow in the VHR satellite imageries provides vital information on urban construction forms, illumination direction, and the spatial distribution of the objects that can help to further understanding of the built environment. However, to extract shadows, the automated detection of shadows from images must be accurate. This paper reviews current automatic approaches that have been used for shadow detection from VHR satellite images and comprises two main parts. In the first part, shadow concepts are presented in terms of shadow appearance in the VHR satellite imageries, current shadow detection methods, and the usefulness of shadow detection in urban environments. In the second part, we adopted two approaches which are considered current state-of-the-art shadow detection, and segmentation algorithms using WorldView-3 and Quickbird images. In the first approach, the ratios between the NIR and visible bands were computed on a pixel-by-pixel basis, which allows for disambiguation between shadows and dark objects. To obtain an accurate shadow candidate map, we further refine the shadow map after applying the ratio algorithm on the Quickbird image. The second selected approach is the GrabCut segmentation approach for examining its performance in detecting the shadow regions of urban objects using the true colour image from WorldView-3. Further refinement was applied to attain a segmented shadow map. Although the detection of shadow regions is a very difficult task when they are derived from a VHR satellite image that comprises a visible spectrum range (RGB true colour), the results demonstrate that the detection of shadow regions in the WorldView-3 image is a reasonable separation from other objects by applying the GrabCut algorithm. In addition, the derived shadow map from the Quickbird image indicates significant performance of the ratio algorithm. The differences in the characteristics of the two satellite imageries in terms of spatial and spectral resolution can play an important role in the estimation and detection of the shadow of urban objects.


Sign in / Sign up

Export Citation Format

Share Document