A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs

Author(s):  
Zhaokun Li ◽  
Xueliang Liu ◽  
Ye Zhao ◽  
Bo Liu ◽  
Zhen Huang ◽  
...  
2021 ◽  
Vol 13 (14) ◽  
pp. 2656
Author(s):  
Furong Shi ◽  
Tong Zhang

Deep-learning technologies, especially convolutional neural networks (CNNs), have achieved great success in building extraction from areal images. However, shape details are often lost during the down-sampling process, which results in discontinuous segmentation or inaccurate segmentation boundary. In order to compensate for the loss of shape information, two shape-related auxiliary tasks (i.e., boundary prediction and distance estimation) were jointly learned with building segmentation task in our proposed network. Meanwhile, two consistency constraint losses were designed based on the multi-task network to exploit the duality between the mask prediction and two shape-related information predictions. Specifically, an atrous spatial pyramid pooling (ASPP) module was appended to the top of the encoder of a U-shaped network to obtain multi-scale features. Based on the multi-scale features, one regression loss and two classification losses were used for predicting the distance-transform map, segmentation, and boundary. Two inter-task consistency-loss functions were constructed to ensure the consistency between distance maps and masks, and the consistency between masks and boundary maps. Experimental results on three public aerial image data sets showed that our method achieved superior performance over the recent state-of-the-art models.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1426
Author(s):  
Chuanyang Liu ◽  
Yiquan Wu ◽  
Jingjing Liu ◽  
Jiaming Han

Insulator detection is an essential task for the safety and reliable operation of intelligent grids. Owing to insulator images including various background interferences, most traditional image-processing methods cannot achieve good performance. Some You Only Look Once (YOLO) networks are employed to meet the requirements of actual applications for insulator detection. To achieve a good trade-off among accuracy, running time, and memory storage, this work proposes the modified YOLO-tiny for insulator (MTI-YOLO) network for insulator detection in complex aerial images. First of all, composite insulator images are collected in common scenes and the “CCIN_detection” (Chinese Composite INsulator) dataset is constructed. Secondly, to improve the detection accuracy of different sizes of insulator, multi-scale feature detection headers, a structure of multi-scale feature fusion, and the spatial pyramid pooling (SPP) model are adopted to the MTI-YOLO network. Finally, the proposed MTI-YOLO network and the compared networks are trained and tested on the “CCIN_detection” dataset. The average precision (AP) of our proposed network is 17% and 9% higher than YOLO-tiny and YOLO-v2. Compared with YOLO-tiny and YOLO-v2, the running time of the proposed network is slightly higher. Furthermore, the memory usage of the proposed network is 25.6% and 38.9% lower than YOLO-v2 and YOLO-v3, respectively. Experimental results and analysis validate that the proposed network achieves good performance in both complex backgrounds and bright illumination conditions.


2020 ◽  
Vol 12 (5) ◽  
pp. 784 ◽  
Author(s):  
Wei Guo ◽  
Weihong Li ◽  
Weiguo Gong ◽  
Jinkai Cui

Multi-scale object detection is a basic challenge in computer vision. Although many advanced methods based on convolutional neural networks have succeeded in natural images, the progress in aerial images has been relatively slow mainly due to the considerably huge scale variations of objects and many densely distributed small objects. In this paper, considering that the semantic information of the small objects may be weakened or even disappear in the deeper layers of neural network, we propose a new detection framework called Extended Feature Pyramid Network (EFPN) for strengthening the information extraction ability of the neural network. In the EFPN, we first design the multi-branched dilated bottleneck (MBDB) module in the lateral connections to capture much more semantic information. Then, we further devise an attention pathway for better locating the objects. Finally, an augmented bottom-up pathway is conducted for making shallow layer information easier to spread and further improving performance. Moreover, we present an adaptive scale training strategy to enable the network to better recognize multi-scale objects. Meanwhile, we present a novel clustering method to achieve adaptive anchors and make the neural network better learn data features. Experiments on the public aerial datasets indicate that the presented method obtain state-of-the-art performance.


Author(s):  
Jiajia Liao ◽  
Yingchao Piao ◽  
Guorong Cai ◽  
Yundong Wu ◽  
Jinhe Su

2019 ◽  
Vol 11 (19) ◽  
pp. 2219 ◽  
Author(s):  
Fatemeh Alidoost ◽  
Hossein Arefi ◽  
Federico Tombari

In this study, a deep learning (DL)-based approach is proposed for the detection and reconstruction of buildings from a single aerial image. The pre-required knowledge to reconstruct the 3D shapes of buildings, including the height data as well as the linear elements of individual roofs, is derived from the RGB image using an optimized multi-scale convolutional–deconvolutional network (MSCDN). The proposed network is composed of two feature extraction levels to first predict the coarse features, and then automatically refine them. The predicted features include the normalized digital surface models (nDSMs) and linear elements of roofs in three classes of eave, ridge, and hip lines. Then, the prismatic models of buildings are generated by analyzing the eave lines. The parametric models of individual roofs are also reconstructed using the predicted ridge and hip lines. The experiments show that, even in the presence of noises in height values, the proposed method performs well on 3D reconstruction of buildings with different shapes and complexities. The average root mean square error (RMSE) and normalized median absolute deviation (NMAD) metrics are about 3.43 m and 1.13 m, respectively for the predicted nDSM. Moreover, the quality of the extracted linear elements is about 91.31% and 83.69% for the Potsdam and Zeebrugge test data, respectively. Unlike the state-of-the-art methods, the proposed approach does not need any additional or auxiliary data and employs a single image to reconstruct the 3D models of buildings with the competitive precision of about 1.2 m and 0.8 m for the horizontal and vertical RMSEs over the Potsdam data and about 3.9 m and 2.4 m over the Zeebrugge test data.


2010 ◽  
Vol 19 (01) ◽  
pp. 173-189
Author(s):  
SEUNG-HUN YOO ◽  
CHANG-SUNG JEONG

Graphics processing unit (GPU) has surfaced as a high-quality platform for computer vision-related systems. In this paper, we propose a straightforward system consisting of a registration and a fusion method over GPU, which generates good results at high speed, compared to non-GPU-based systems. Our GPU-accelerated system utilizes existing methods through converting the methods into the GPU-based platform. The registration method uses point correspondences to find a registering transformation estimated with the incremental parameters in a coarse-to-fine way, while the fusion algorithm uses multi-scale methods to fuse the results from the registration stage. We evaluate performance with the same methods that are executed over both CPU-only and GPU-mounted environment. The experiment results present convincing evidences of the efficiency of our system, which is tested on a few pairs of aerial images taken by electro-optical and infrared sensors to provide visual information of a scene for environmental observatories.


Author(s):  
A. C. Carrilho ◽  
M. Galo

<p><strong>Abstract.</strong> Recent advances in machine learning techniques for image classification have led to the development of robust approaches to both object detection and extraction. Traditional CNN architectures, such as LeNet, AlexNet and CaffeNet, usually use as input images of fixed sizes taken from objects and attempt to assign labels to those images. Another possible approach is the Fast Region-based CNN (or Fast R-CNN), which works by using two models: (i) a Region Proposal Network (RPN) which generates a set of potential Regions of Interest (RoI) in the image; and (ii) a traditional CNN which assigns labels to the proposed RoI. As an alternative, this study proposes an approach to automatic object extraction from aerial images similar to the Fast R-CNN architecture, the main difference being the use of the Simple Linear Iterative Clustering (SLIC) algorithm instead of an RPN to generate the RoI. The dataset used is composed of high-resolution aerial images and the following classes were considered: house, sport court, hangar, building, swimming pool, tree, and street/road. The proposed method can generate RoI with different sizes by running a multi-scale SLIC approach. The overall accuracy obtained for object detection was 89% and the major advantage is that the proposed method is capable of semantic segmentation by assigning a label to each selected RoI. Some of the problems encountered are related to object proximity, in which different instances appeared merged in the results.</p>


Author(s):  
F. Alidoost ◽  
H. Arefi ◽  
F. Tombari

Abstract. Automatic detection and extraction of buildings from aerial images are considerable challenges in many applications, including disaster management, navigation, urbanization monitoring, emergency responses, 3D city mapping and reconstruction. However, the most important problem is to precisely localize buildings from single aerial images where there is no additional information such as LiDAR point cloud data or high resolution Digital Surface Models (DSMs). In this paper, a Deep Learning (DL)-based approach is proposed to localize buildings, estimate the relative height information, and extract the buildings’ boundaries using a single aerial image. In order to detect buildings and extract the bounding boxes, a Fully Connected Convolutional Neural Network (FC-CNN) is trained to classify building and non-building objects. We also introduced a novel Multi-Scale Convolutional-Deconvolutional Network (MS-CDN) including skip connection layers to predict normalized DSMs (nDSMs) from a single image. The extracted bounding boxes as well as predicted nDSMs are then employed by an Active Contour Model (ACM) to provide precise boundaries of buildings. The experiments show that, even having noises in the predicted nDSMs, the proposed method performs well on single aerial images with different building shapes. The quality rate for building detection is about 86% and the RMSE for nDSM prediction is about 4 m. Also, the accuracy of boundary extraction is about 68%. Since the proposed framework is based on a single image, it could be employed for real time applications.


2020 ◽  
Vol 12 (11) ◽  
pp. 1760 ◽  
Author(s):  
Wang Zhang ◽  
Chunsheng Liu ◽  
Faliang Chang ◽  
Ye Song

With the advantage of high maneuverability, Unmanned Aerial Vehicles (UAVs) have been widely deployed in vehicle monitoring and controlling. However, processing the images captured by UAV for the extracting vehicle information is hindered by some challenges including arbitrary orientations, huge scale variations and partial occlusion. In seeking to address these challenges, we propose a novel Multi-Scale and Occlusion Aware Network (MSOA-Net) for UAV based vehicle segmentation, which consists of two parts including a Multi-Scale Feature Adaptive Fusion Network (MSFAF-Net) and a Regional Attention based Triple Head Network (RATH-Net). In MSFAF-Net, a self-adaptive feature fusion module is proposed, which can adaptively aggregate hierarchical feature maps from multiple levels to help Feature Pyramid Network (FPN) deal with the scale change of vehicles. The RATH-Net with a self-attention mechanism is proposed to guide the location-sensitive sub-networks to enhance the vehicle of interest and suppress background noise caused by occlusions. In this study, we release a large comprehensive UAV based vehicle segmentation dataset (UVSD), which is the first public dataset for UAV based vehicle detection and segmentation. Experiments are conducted on the challenging UVSD dataset. Experimental results show that the proposed method is efficient in detecting and segmenting vehicles, and outperforms the compared state-of-the-art works.


Sign in / Sign up

Export Citation Format

Share Document