ISDNet: AI-enabled Instance Segmentation of Aerial Scenes for Smart Cities

2021 ◽  
Vol 21 (3) ◽  
pp. 1-18
Author(s):  
Prateek Garg ◽  
Anirudh Srinivasan Chakravarthy ◽  
Murari Mandal ◽  
Pratik Narang ◽  
Vinay Chamola ◽  
...  

Aerial scenes captured by UAVs have immense potential in IoT applications related to urban surveillance, road and building segmentation, land cover classification, and so on, which are necessary for the evolution of smart cities. The advancements in deep learning have greatly enhanced visual understanding, but the domain of aerial vision remains largely unexplored. Aerial images pose many unique challenges for performing proper scene parsing such as high-resolution data, small-scaled objects, a large number of objects in the camera view, dense clustering of objects, background clutter, and so on, which greatly hinder the performance of the existing deep learning methods. In this work, we propose ISDNet (Instance Segmentation and Detection Network), a novel network to perform instance segmentation and object detection on visual data captured by UAVs. This work enables aerial image analytics for various needs in a smart city. In particular, we use dilated convolutions to generate improved spatial context, leading to better discrimination between foreground and background features. The proposed network efficiently reuses the segment-mask features by propagating them from early stages using residual connections. Furthermore, ISDNet makes use of effective anchors to accommodate varying object scales and sizes. The proposed method obtains state-of-the-art results in the aerial context.

Author(s):  
L. Madhuanand ◽  
F. Nex ◽  
M. Y. Yang

Abstract. Depth is an essential component for various scene understanding tasks and for reconstructing the 3D geometry of the scene. Estimating depth from stereo images requires multiple views of the same scene to be captured which is often not possible when exploring new environments with a UAV. To overcome this monocular depth estimation has been a topic of interest with the recent advancements in computer vision and deep learning techniques. This research has been widely focused on indoor scenes or outdoor scenes captured at ground level. Single image depth estimation from aerial images has been limited due to additional complexities arising from increased camera distance, wider area coverage with lots of occlusions. A new aerial image dataset is prepared specifically for this purpose combining Unmanned Aerial Vehicles (UAV) images covering different regions, features and point of views. The single image depth estimation is based on image reconstruction techniques which uses stereo images for learning to estimate depth from single images. Among the various available models for ground-level single image depth estimation, two models, 1) a Convolutional Neural Network (CNN) and 2) a Generative Adversarial model (GAN) are used to learn depth from aerial images from UAVs. These models generate pixel-wise disparity images which could be converted into depth information. The generated disparity maps from these models are evaluated for its internal quality using various error metrics. The results show higher disparity ranges with smoother images generated by CNN model and sharper images with lesser disparity range generated by GAN model. The produced disparity images are converted to depth information and compared with point clouds obtained using Pix4D. It is found that the CNN model performs better than GAN and produces depth similar to that of Pix4D. This comparison helps in streamlining the efforts to produce depth from a single aerial image.


Smart Cities ◽  
2021 ◽  
Vol 4 (3) ◽  
pp. 1220-1243
Author(s):  
Hafiz Suliman Munawar ◽  
Fahim Ullah ◽  
Siddra Qayyum ◽  
Amirhossein Heravi

Floods are one of the most fatal and devastating disasters, instigating an immense loss of human lives and damage to property, infrastructure, and agricultural lands. To cater to this, there is a need to develop and implement real-time flood management systems that could instantly detect flooded regions to initiate relief activities as early as possible. Current imaging systems, relying on satellites, have demonstrated low accuracy and delayed response, making them unreliable and impractical to be used in emergency responses to natural disasters such as flooding. This research employs Unmanned Aerial Vehicles (UAVs) to develop an automated imaging system that can identify inundated areas from aerial images. The Haar cascade classifier was explored in the case study to detect landmarks such as roads and buildings from the aerial images captured by UAVs and identify flooded areas. The extracted landmarks are added to the training dataset that is used to train a deep learning algorithm. Experimental results show that buildings and roads can be detected from the images with 91% and 94% accuracy, respectively. The overall accuracy of 91% is recorded in classifying flooded and non-flooded regions from the input case study images. The system has shown promising results on test images belonging to both pre- and post-flood classes. The flood relief and rescue workers can quickly locate flooded regions and rescue stranded people using this system. Such real-time flood inundation systems will help transform the disaster management systems in line with modern smart cities initiatives.


2021 ◽  
Vol 13 (8) ◽  
pp. 1440
Author(s):  
Yi Zhang ◽  
Lei Fu ◽  
Ying Li ◽  
Yanning Zhang

Accurate change detection in optical aerial images by using deep learning techniques has been attracting lots of research efforts in recent years. Correct change-detection results usually involve both global and local deep learning features. Existing deep learning approaches have achieved good performance on this task. However, under the scenarios of containing multiscale change areas within a bi-temporal image pair, existing methods still have shortcomings in adapting these change areas, such as false detection and limited completeness in detected areas. To deal with these problems, we design a hierarchical dynamic fusion network (HDFNet) to implement the optical aerial image-change detection task. Specifically, we propose a change-detection framework with hierarchical fusion strategy to provide sufficient information encouraging for change detection and introduce dynamic convolution modules to self-adaptively learn from this information. Also, we use a multilevel supervision strategy with multiscale loss functions to supervise the training process. Comprehensive experiments are conducted on two benchmark datasets, LEBEDEV and LEVIR-CD, to verify the effectiveness of the proposed method and the experimental results show that our model achieves state-of-the-art performance.


2019 ◽  
Author(s):  
Alan Bauer ◽  
Aaron George Bostrom ◽  
Joshua Ball ◽  
Christopher Applegate ◽  
Tao Cheng ◽  
...  

AbstractAerial imagery is regularly used by farmers and growers to monitor crops during the growing season. To extract meaningful phenotypic information from large-scale aerial images collected regularly from the field, high-throughput analytic solutions are required, which not only produce high-quality measures of key crop traits, but also support agricultural practitioners to make reliable management decisions of their crops. Here, we report AirSurf-Lettuce, an automated and open-source aerial image analysis platform that combines modern computer vision, up-to-date machine learning, and modular software engineering to measure yield-related phenotypes of millions of lettuces across the field. Utilising ultra-large normalized difference vegetation index (NDVI) images acquired by fixed-wing light aircrafts together with a deep-learning classifier trained with over 100,000 labelled lettuce signals, the platform is capable of scoring and categorising iceberg lettuces with high accuracy (>98%). Furthermore, novel analysis functions have been developed to map lettuce size distribution in the field, based on which global positioning system (GPS) tagged harvest regions can be derived to enable growers and farmers’ precise harvest strategies and marketability estimates before the harvest.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5240
Author(s):  
Anis Koubaa ◽  
Adel Ammar ◽  
Mahmoud Alahdab ◽  
Anas Kanhouch ◽  
Ahmad Taher Azar

Unmanned Aerial Vehicles (UAVs) have been very effective in collecting aerial images data for various Internet-of-Things (IoT)/smart cities applications such as search and rescue, surveillance, vehicle detection, counting, intelligent transportation systems, to name a few. However, the real-time processing of collected data on edge in the context of the Internet-of-Drones remains an open challenge because UAVs have limited energy capabilities, while computer vision techniquesconsume excessive energy and require abundant resources. This fact is even more critical when deep learning algorithms, such as convolutional neural networks (CNNs), are used for classification and detection. In this paper, we first propose a system architecture of computation offloading for Internet-connected drones. Then, we conduct a comprehensive experimental study to evaluate the performance in terms of energy, bandwidth, and delay of the cloud computation offloading approach versus the edge computing approach of deep learning applications in the context of UAVs. In particular, we investigate the tradeoff between the communication cost and the computation of the two candidate approaches experimentally. The main results demonstrate that the computation offloading approach allows us to provide much higher throughput (i.e., frames per second) as compared to the edge computing approach, despite the larger communication delays.


2020 ◽  
Vol 18 (1) ◽  
pp. 35-46
Author(s):  
Felipe X. Viana ◽  
Gabriel M. Araujo ◽  
Milena F. Pinto ◽  
Jefferson Colares ◽  
Diego B. haddad

Author(s):  
J. Liu ◽  
S. Ji ◽  
C. Zhang ◽  
Z. Qin

Dense stereo matching has been extensively studied in photogrammetry and computer vision. In this paper we evaluate the application of deep learning based stereo methods, which were raised from 2016 and rapidly spread, on aerial stereos other than ground images that are commonly used in computer vision community. Two popular methods are evaluated. One learns matching cost with a convolutional neural network (known as MC-CNN); the other produces a disparity map in an end-to-end manner by utilizing both geometry and context (known as GC-net). First, we evaluate the performance of the deep learning based methods for aerial stereo images by a direct model reuse. The models pre-trained on KITTI 2012, KITTI 2015 and Driving datasets separately, are directly applied to three aerial datasets. We also give the results of direct training on target aerial datasets. Second, the deep learning based methods are compared to the classic stereo matching method, Semi-Global Matching(SGM), and a photogrammetric software, SURE, on the same aerial datasets. Third, transfer learning strategy is introduced to aerial image matching based on the assumption of a few target samples available for model fine tuning. It experimentally proved that the conventional methods and the deep learning based methods performed similarly, and the latter had greater potential to be explored.


2019 ◽  
Vol 11 (10) ◽  
pp. 1157 ◽  
Author(s):  
Jorge Fuentes-Pacheco ◽  
Juan Torres-Olivares ◽  
Edgar Roman-Rangel ◽  
Salvador Cervantes ◽  
Porfirio Juarez-Lopez ◽  
...  

Crop segmentation is an important task in Precision Agriculture, where the use of aerial robots with an on-board camera has contributed to the development of new solution alternatives. We address the problem of fig plant segmentation in top-view RGB (Red-Green-Blue) images of a crop grown under open-field difficult circumstances of complex lighting conditions and non-ideal crop maintenance practices defined by local farmers. We present a Convolutional Neural Network (CNN) with an encoder-decoder architecture that classifies each pixel as crop or non-crop using only raw colour images as input. Our approach achieves a mean accuracy of 93.85% despite the complexity of the background and a highly variable visual appearance of the leaves. We make available our CNN code to the research community, as well as the aerial image data set and a hand-made ground truth segmentation with pixel precision to facilitate the comparison among different algorithms.


Sign in / Sign up

Export Citation Format

Share Document