Effectiveness of Human Detection from Aerial Images Taken from Different Heights

TEM Journal ◽

10.18421/tem102-06 ◽

2021 ◽

pp. 522-530

Author(s):

Muhammad Shahir Hakimy Salem ◽

Fadhlan Hafizhelmi Kamaru Zaman

Keyword(s):

Model Test ◽

Search And Rescue ◽

Human Detection ◽

Training Data ◽

Aerial Images ◽

Single Model ◽

Bounding Box ◽

Aggregate Channel Features

Recently, drones have been regularly used to aid in search and rescue in places where it is normally to carry out some of the early forensic victim localization. There are many suitable human detectors for drone use, such as Histogram Oriented Gradient (HOG), You Only Looks Once (YOLO), and Aggregate Channel Features (ACF). In this paper, the height of the aerial images is analyzed for its effect on the accuracy of the detection. This works compares ACF, YOLO MobileNet, and YOLO ResNet50 using a different set of aerial images varying at 10m, 20m, and 30m heights. The results show that in a single-model test, with our proposed bounding-box standardization, YOLO MobileNet achieves significant increase in test precision (AP), with 0.7 AP recorded. For single-model test, YOLO MobileNet yield best AP using 20m training data where it obtained AP of 0.88 (10m test height), 0.82 (20m test height), and 0.91 (30m test height).

Download Full-text

First Successful Rescue of a Lost Person Using the Human Detection System: A Case Study from Beskid Niski (SE Poland)

Remote Sensing ◽

10.3390/rs13234903 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4903

Author(s):

Tomasz Niedzielski ◽

Mirosława Jurecka ◽

Bartłomiej Miziński ◽

Wojciech Pawul ◽

Tomasz Motyl

Keyword(s):

Detection System ◽

Search And Rescue ◽

Human Detection ◽

Aerial Images ◽

Rescue Service ◽

Se Poland ◽

System A ◽

Close Range ◽

Automated Algorithms

Recent advances in search and rescue methods include the use of unmanned aerial vehicles (UAVs), to carry out aerial monitoring of terrains to spot lost individuals. To date, such searches have been conducted by human observers who view UAV-acquired videos or images. Alternatively, lost persons may be detected by automated algorithms. Although some algorithms are implemented in software to support search and rescue activities, no successful rescue case using automated human detectors has been reported on thus far in the scientific literature. This paper presents a report from a search and rescue mission carried out by Bieszczady Mountain Rescue Service near the village of Cergowa in SE Poland, where a 65-year-old man was rescued after being detected via use of SARUAV software. This software uses convolutional neural networks to automatically locate people in close-range nadir aerial images. The missing man, who suffered from Alzheimer’s disease (as well as a stroke the previous day) spent more than 24 h in open terrain. SARUAV software was allocated to support the search, and its task was to process 782 nadir and near-nadir JPG images collected during four photogrammetric flights. After 4 h 31 min of the analysis, the system successfully detected the missing person and provided his coordinates (uploading 121 photos from a flight over a lost person; image processing and verification of hits lasted 5 min 48 s). The presented case study proves that the use of an UAV assisted by SARUAV software may quicken the search mission.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text

Classification of Very-High-Spatial-Resolution Aerial Images Based on Multiscale Features with Limited Semantic Information

Remote Sensing ◽

10.3390/rs13030364 ◽

2021 ◽

Vol 13 (3) ◽

pp. 364

Author(s):

Han Gao ◽

Jinhui Guo ◽

Peng Guo ◽

Xiuwan Chen

Keyword(s):

Deep Learning ◽

Land Cover ◽

Spatial Resolution ◽

Large Scale ◽

High Spatial Resolution ◽

Training Data ◽

Aerial Images ◽

Rural Landscapes ◽

Feature Representations ◽

Object Based

Recently, deep learning has become the most innovative trend for a variety of high-spatial-resolution remote sensing imaging applications. However, large-scale land cover classification via traditional convolutional neural networks (CNNs) with sliding windows is computationally expensive and produces coarse results. Additionally, although such supervised learning approaches have performed well, collecting and annotating datasets for every task are extremely laborious, especially for those fully supervised cases where the pixel-level ground-truth labels are dense. In this work, we propose a new object-oriented deep learning framework that leverages residual networks with different depths to learn adjacent feature representations by embedding a multibranch architecture in the deep learning pipeline. The idea is to exploit limited training data at different neighboring scales to make a tradeoff between weak semantics and strong feature representations for operational land cover mapping tasks. We draw from established geographic object-based image analysis (GEOBIA) as an auxiliary module to reduce the computational burden of spatial reasoning and optimize the classification boundaries. We evaluated the proposed approach on two subdecimeter-resolution datasets involving both urban and rural landscapes. It presented better classification accuracy (88.9%) compared to traditional object-based deep learning methods and achieves an excellent inference time (11.3 s/ha).

Download Full-text

USING SEMANTICALLY PAIRED IMAGES TO IMPROVE DOMAIN ADAPTATION FOR THE SEMANTIC SEGMENTATION OF AERIAL IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-483-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 483-492

Author(s):

D. Gritzner ◽

J. Ostermann

Keyword(s):

Time Window ◽

Domain Adaptation ◽

Geographical Area ◽

Model Performance ◽

Ground Truth ◽

Semantic Segmentation ◽

Training Data ◽

Aerial Images ◽

Target Domain ◽

Training Examples

Abstract. Modern machine learning, especially deep learning, which is used in a variety of applications, requires a lot of labelled data for model training. Having an insufficient amount of training examples leads to models which do not generalize well to new input instances. This is a particular significant problem for tasks involving aerial images: often training data is only available for a limited geographical area and a narrow time window, thus leading to models which perform poorly in different regions, at different times of day, or during different seasons. Domain adaptation can mitigate this issue by using labelled source domain training examples and unlabeled target domain images to train a model which performs well on both domains. Modern adversarial domain adaptation approaches use unpaired data. We propose using pairs of semantically similar images, i.e., whose segmentations are accurate predictions of each other, for improved model performance. In this paper we show that, as an upper limit based on ground truth, using semantically paired aerial images during training almost always increases model performance with an average improvement of 4.2% accuracy and .036 mean intersection-over-union (mIoU). Using a practical estimate of semantic similarity, we still achieve improvements in more than half of all cases, with average improvements of 2.5% accuracy and .017 mIoU in those cases.

Download Full-text

EFN: Field-Based Object Detection for Aerial Images

Remote Sensing ◽

10.3390/rs12213630 ◽

2020 ◽

Vol 12 (21) ◽

pp. 3630

Author(s):

Jin Liu ◽

Haokun Zheng

Keyword(s):

Object Detection ◽

Semantic Segmentation ◽

Natural Images ◽

Aerial Images ◽

Aerial Image ◽

Data Sets ◽

Data Set ◽

Bounding Box ◽

Good Score ◽

Bounding Boxes

Object detection and recognition in aerial and remote sensing images has become a hot topic in the field of computer vision in recent years. As these images are usually taken from a bird’s-eye view, the targets often have different shapes and are densely arranged. Therefore, using an oriented bounding box to mark the target is a mainstream choice. However, this general method is designed based on horizontal box annotation, while the improved method for detecting an oriented bounding box has a high computational complexity. In this paper, we propose a method called ellipse field network (EFN) to organically integrate semantic segmentation and object detection. It predicts the probability distribution of the target and obtains accurate oriented bounding boxes through a post-processing step. We tested our method on the HRSC2016 and DOTA data sets, achieving mAP values of 0.863 and 0.701, respectively. At the same time, we also tested the performance of EFN on natural images and obtained a mAP of 84.7 in the VOC2012 data set. These extensive experiments demonstrate that EFN can achieve state-of-the-art results in aerial image tests and can obtain a good score when considering natural images.

Download Full-text

Unsupervised Human Detection with an Embedded Vision System on a Fully Autonomous UAV for Search and Rescue Operations

Sensors ◽

10.3390/s19163542 ◽

2019 ◽

Vol 19 (16) ◽

pp. 3542 ◽

Cited By ~ 15

Author(s):

Eleftherios Lygouras ◽

Nicholas Santavas ◽

Anastasios Taitzoglou ◽

Konstantinos Tarchanidis ◽

Athanasios Mitropoulos ◽

...

Keyword(s):

Embedded System ◽

Emergency Services ◽

Vision System ◽

Open Water ◽

Satellite System ◽

Search And Rescue ◽

Human Detection ◽

Primary Role ◽

Wide Range ◽

Global Navigation Satellite

Unmanned aerial vehicles (UAVs) play a primary role in a plethora of technical and scientific fields owing to their wide range of applications. In particular, the provision of emergency services during the occurrence of a crisis event is a vital application domain where such aerial robots can contribute, sending out valuable assistance to both distressed humans and rescue teams. Bearing in mind that time constraints constitute a crucial parameter in search and rescue (SAR) missions, the punctual and precise detection of humans in peril is of paramount importance. The paper in hand deals with real-time human detection onboard a fully autonomous rescue UAV. Using deep learning techniques, the implemented embedded system was capable of detecting open water swimmers. This allowed the UAV to provide assistance accurately in a fully unsupervised manner, thus enhancing first responder operational capabilities. The novelty of the proposed system is the combination of global navigation satellite system (GNSS) techniques and computer vision algorithms for both precise human detection and rescue apparatus release. Details about hardware configuration as well as the system’s performance evaluation are fully discussed.

Download Full-text

Integrated feature set using aggregate channel features and histogram of sparse codes for human detection

Multimedia Tools and Applications ◽

10.1007/s11042-019-08498-w ◽

2019 ◽

Vol 79 (3-4) ◽

pp. 2931-2944

Author(s):

Blossom Treesa Bastian ◽

Jiji C.V.

Keyword(s):

Human Detection ◽

Aggregate Channel Features

Download Full-text

Human Detection for Search and Rescue Applications with UAVs and Mixed Reality Interfaces

2019 14th Iberian Conference on Information Systems and Technologies (CISTI) ◽

10.23919/cisti.2019.8760811 ◽

2019 ◽

Cited By ~ 1

Author(s):

Raul Llasag ◽

Diego Marcillo ◽

Carlos Grilo ◽

Catarina Silva

Keyword(s):

Mixed Reality ◽

Search And Rescue ◽

Human Detection

Download Full-text

VARIABLE SELECTION FOR ROAD SEGMENTATION IN AERIAL IMAGES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-1-w1-297-2017 ◽

2017 ◽

Vol XLII-1/W1 ◽

pp. 297-304 ◽

Cited By ~ 2

Author(s):

S. Warnke ◽

D. Bulatov

Keyword(s):

Variable Selection ◽

Minimum Cost ◽

Optimization Methods ◽

Training Data ◽

Aerial Images ◽

Misclassification Error ◽

3D Data ◽

Elevation Data ◽

Cost Paths ◽

Road Segmentation

For extraction of road pixels from combined image and elevation data, Wegner et al. (2015) proposed classification of superpixels into road and non-road, after which a refinement of the classification results using minimum cost paths and non-local optimization methods took place. We believed that the variable set used for classification was to a certain extent suboptimal, because many variables were redundant while several features known as useful in Photogrammetry and Remote Sensing are missed. This motivated us to implement a variable selection approach which builds a model for classification using portions of training data and subsets of features, evaluates this model, updates the feature set, and terminates when a stopping criterion is satisfied. The choice of classifier is flexible; however, we tested the approach with Logistic Regression and Random Forests, and taylored the evaluation module to the chosen classifier. To guarantee a fair comparison, we kept the segment-based approach and most of the variables from the related work, but we extended them by additional, mostly higher-level features. Applying these superior features, removing the redundant ones, as well as using more accurately acquired 3D data allowed to keep stable or even to reduce the misclassification error in a challenging dataset.

Download Full-text

ITERATIVE RE-WEIGHTED INSTANCE TRANSFER FOR DOMAIN ADAPTATION

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-iii-3-339-2016 ◽

2016 ◽

Vol III-3 ◽

pp. 339-346 ◽

Cited By ~ 1

Author(s):

A. Paul ◽

F. Rottensteiner ◽

C. Heipke

Keyword(s):

Domain Adaptation ◽

Training Data ◽

Aerial Images ◽

Target Domain ◽

Data Set ◽

Source Domain ◽

The Difference ◽

A New Technique ◽

Class Labels ◽

Decision Boundaries

Domain adaptation techniques in transfer learning try to reduce the amount of training data required for classification by adapting a classifier trained on samples from a source domain to a new data set (target domain) where the features may have different distributions. In this paper, we propose a new technique for domain adaptation based on logistic regression. Starting with a classifier trained on training data from the source domain, we iteratively include target domain samples for which class labels have been obtained from the current state of the classifier, while at the same time removing source domain samples. In each iteration the classifier is re-trained, so that the decision boundaries are slowly transferred to the distribution of the target features. To make the transfer procedure more robust we introduce weights as a function of distance from the decision boundary and a new way of regularisation. Our methodology is evaluated using a benchmark data set consisting of aerial images and digital surface models. The experimental results show that in the majority of cases our domain adaptation approach can lead to an improvement of the classification accuracy without additional training data, but also indicate remaining problems if the difference in the feature distributions becomes too large.

Download Full-text