A general deep learning model for bird detection in high resolution airborne imagery

Advances in artificial intelligence for image processing hold great promise for increasing the scales at which ecological systems can be studied. The distribution and behavior of individuals is central to ecology, and computer vision using deep neural networks can learn to detect individual objects in imagery. However, developing computer vision for ecological monitoring is challenging because it needs large amounts of human-labeled training data, requires advanced technical expertise and computational infrastructure, and is prone to overfitting. This limits application across space and time. One solution is developing generalized models that can be applied across species and ecosystems. Using over 250,000 annotations from 13 projects from around the world, we develop a general bird detection model that achieves over 65% recall and 50% precision on novel aerial data without any local training despite differences in species, habitat, and imaging methodology. Fine-tuning this model with only 1000 local annotations increases these values to an average of 84% recall and 69% precision by building on the general features learned from other data sources. Retraining from the general model improves local predictions even when moderately large annotation sets are available and makes model training faster and more stable. Our results demonstrate that general models for detecting broad classes of organisms using airborne imagery are achievable. These models can reduce the effort, expertise, and computational resources necessary for automating the detection of individual organisms across large scales, helping to transform the scale of data collection in ecology and the questions that can be addressed.

Download Full-text

Methodology for Collecting a Training Dataset for an Intrusion Detection Model

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(5)-5 ◽

2021 ◽

Vol 33 (5) ◽

pp. 83-104

Author(s):

Aleksandr Igorevich Getman ◽

Maxim Nikolaevich Goryunov ◽

Andrey Georgievich Matskevich ◽

Dmitry Aleksandrovich Rybolovlev

Keyword(s):

Attack Detection ◽

Training Data ◽

Training Dataset ◽

Training Models ◽

The Public ◽

Detection Model ◽

Computer Attacks ◽

Model Training ◽

Public Datasets

The paper discusses the issues of training models for detecting computer attacks based on the use of machine learning methods. The results of the analysis of publicly available training datasets and tools for analyzing network traffic and identifying features of network sessions are presented sequentially. The drawbacks of existing tools and possible errors in the datasets formed with their help are noted. It is concluded that it is necessary to collect own training data in the absence of guarantees of the public datasets reliability and the limited use of pre-trained models in networks with characteristics that differ from the characteristics of the network in which the training traffic was collected. A practical approach to generating training data for computer attack detection models is proposed. The proposed solutions have been tested to evaluate the quality of model training on the collected data and the quality of attack detection in conditions of real network infrastructure.

Download Full-text

Semi-Supervised Deep Learning for Lunar Crater Detection Using CE-2 DOM

Remote Sensing ◽

10.3390/rs13142819 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2819

Author(s):

Sudong Zang ◽

Lingli Mu ◽

Lina Xian ◽

Wei Zhang

Keyword(s):

Deep Learning ◽

Landing Site ◽

Training Data ◽

The Moon ◽

Detection Model ◽

Processing Times ◽

Crater Detection ◽

High Resolution Imagery ◽

Model Training ◽

Digital Orthophoto

Lunar craters are very important for estimating the geological age of the Moon, studying the evolution of the Moon, and for landing site selection. Due to a lack of labeled samples, processing times due to high-resolution imagery, the small number of suitable detection models, and the influence of solar illumination, Crater Detection Algorithms (CDAs) based on Digital Orthophoto Maps (DOMs) have not yet been well-developed. In this paper, a large number of training data are labeled manually in the Highland and Maria regions, using the Chang’E-2 (CE-2) DOM; however, the labeled data cannot cover all kinds of crater types. To solve the problem of small crater detection, a new crater detection model (Crater R-CNN) is proposed, which can effectively extract the spatial and semantic information of craters from DOM data. As incomplete labeled samples are not conducive for model training, the Two-Teachers Self-training with Noise (TTSN) method is used to train the Crater R-CNN model, thus constructing a new model—called Crater R-CNN with TTSN—which can achieve state-of-the-art performance. To evaluate the accuracy of the model, three other detection models (Mask R-CNN, no-Mask R-CNN, and Crater R-CNN) based on semi-supervised deep learning were used to detect craters in the Highland and Maria regions. The results indicate that Crater R-CNN with TTSN achieved the highest precision (of 91.4% and 88.5%, respectively) in the Highland and Maria regions, even obtaining the highest recall and F1 score. Compared with Mask R-CNN, no-Mask R-CNN, and Crater R-CNN, Crater R-CNN with TTSN had strong robustness and better generalization ability for crater detection within 1 km in different terrains, making it possible to detect small craters with high accuracy when using DOM data.

Download Full-text

Automatic Identification of Overpass Structures: A Method of Deep Learning

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8090421 ◽

2019 ◽

Vol 8 (9) ◽

pp. 421 ◽

Cited By ~ 1

Author(s):

Hao Li ◽

Maosheng Hu ◽

Youxin Huang

Keyword(s):

Deep Learning ◽

Target Detection ◽

Road Network ◽

Large Scale ◽

Training Data ◽

Fine Tuning ◽

Automatic Identification ◽

Detection Model ◽

Network Pattern ◽

Accuracy Performance

The identification of overpass structures in road networks has great significance for multi-scale modeling of roads, congestion analysis, and vehicle navigation. The traditional vector-based methods identify overpasses by the methodologies coming from computational geometry and graph theory, and they overly rely on the artificially designed features and have poor adaptability to complex scenes. This paper presents a novel method of identifying overpasses based on a target detection model (Faster-RCNN). This method utilizes raster representation of vector data and convolutional neural networks (CNNs) to learn task adaptive features from raster data, then identifies the location of an overpass by a Region Proposal network (RPN). The contribution of this paper is: (1) An overpass labelling geodatabase (OLGDB) for the OpenStreetMap (OSM) road network data of six typical cities in China is established; (2) Three different CNNs (ZF-net, VGG-16, Inception-ResNet V2) are integrated into Faster-RCNN and evaluated by accuracy performance; (3) The optimal combination of learning rate and batchsize is determined by fine-tuning; and (4) Five geometric metrics (perimeter, area, squareness, circularity, and W/L) are synthetized into image bands to enhance the training data, and their contribution to the overpass identification task is determined. The experimental results have shown that the proposed method has good accuracy performance (around 90%), and could be improved with the expansion of OLGDB and switching to more sophisticated target detection models. The deep learning target detection model has great application potential in large-scale road network pattern recognition, it can task-adaptively learn road structure features and easily extend to other road network patterns.

Download Full-text

Edge Learning

ACM Computing Surveys ◽

10.1145/3464419 ◽

2021 ◽

Vol 54 (7) ◽

pp. 1-36

Author(s):

Jie Zhang ◽

Zhihao Qu ◽

Chenxi Chen ◽

Haozhao Wang ◽

Yufeng Zhan ◽

...

Keyword(s):

Big Data Analytics ◽

Training Data ◽

Future Research ◽

Great Promise ◽

Communication Overhead ◽

Comprehensive Overview ◽

Training Models ◽

Comprehensive Survey ◽

Model Training ◽

Privacy Issues

Machine Learning ( ML ) has demonstrated great promise in various fields, e.g., self-driving, smart city, which are fundamentally altering the way individuals and organizations live, work, and interact. Traditional centralized learning frameworks require uploading all training data from different sources to a remote data server, which incurs significant communication overhead, service latency, and privacy issues. To further extend the frontiers of the learning paradigm, a new learning concept, namely, Edge Learning ( EL ) is emerging. It is complementary to the cloud-based methods for big data analytics by enabling distributed edge nodes to cooperatively training models and conduct inferences with their locally cached data. To explore the new characteristics and potential prospects of EL, we conduct a comprehensive survey of the recent research efforts on EL. Specifically, we first introduce the background and motivation. We then discuss the challenging issues in EL from the aspects of data, computation, and communication. Furthermore, we provide an overview of the enabling technologies for EL, including model training, inference, security guarantee, privacy protection, and incentive mechanism. Finally, we discuss future research opportunities on EL. We believe that this survey will provide a comprehensive overview of EL and stimulate fruitful future research in this field.

Download Full-text

Detecting OpenStreetMap missing buildings by transferring pre-trained deep neural networks

AGILE: GIScience Series ◽

10.5194/agile-giss-2-39-2021 ◽

2021 ◽

Vol 2 ◽

pp. 1-7

Author(s):

Jan Pisl ◽

Hao Li ◽

Sven Lautenbach ◽

Benjamin Herfort ◽

Alexander Zipf

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Training Data ◽

Fine Tuning ◽

Target Area ◽

Building Detection ◽

Detection Model ◽

Training Samples ◽

Training Examples ◽

Limited Training Samples

Abstract. Accurate and complete geographic data of human settlements is crucial for effective emergency response, humanitarian aid and sustainable development. Open- StreetMap (OSM) can serve as a valuable source of this data. As there are still many areas missing in OSM, deep neural networks have been trained to detect such areas from satellite imagery. However, in regions where little or no training data is available, training networks is problematic. In this study, we proposed a method of transferring a building detection model, which was previously trained in an area wellmapped in OSM, to remote data-scarce areas. The transferring was achieved via fine-tuning the model on limited training samples from the original training area and the target area. We validated the method by transferring deep neural networks trained in Tanzania to a site in Cameroon with straight distance of over 2600 km, and tested multiple variants of the proposed method. Finally, we applied the fine-tuned model to detect 1192 buildings missing OSM in a selected area in Cameroon. The results showed that the proposed method led to a significant improvement in f1-score with as little as 30 training examples from the target area. This is a crucial quality of the proposed method as it allows to fine-tune models to regions where OSM data is scarce.

Download Full-text

Effect of data-augmentation on fine-tuned CNN model performance

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp84-92 ◽

2021 ◽

Vol 10 (1) ◽

pp. 84

Author(s):

Ramaprasad Poojary ◽

Roma Raina ◽

Amit Kumar Mondal

Keyword(s):

Neural Network ◽

Computer Vision ◽

Deep Learning ◽

High Performance ◽

Data Augmentation ◽

Model Performance ◽

Training Data ◽

Fine Tuning ◽

Test Accuracy ◽

Training Time

<span id="docs-internal-guid-cdb76bbb-7fff-978d-961c-e21c41807064"><span>During the last few years, deep learning achieved remarkable results in the field of machine learning when used for computer vision tasks. Among many of its architectures, deep neural network-based architecture known as convolutional neural networks are recently used widely for image detection and classification. Although it is a great tool for computer vision tasks, it demands a large amount of training data to yield high performance. In this paper, the data augmentation method is proposed to overcome the challenges faced due to a lack of insufficient training data. To analyze the effect of data augmentation, the proposed method uses two convolutional neural network architectures. To minimize the training time without compromising accuracy, models are built by fine-tuning pre-trained networks VGG16 and ResNet50. To evaluate the performance of the models, loss functions and accuracies are used. Proposed models are constructed using Keras deep learning framework and models are trained on a custom dataset created from Kaggle CAT vs DOG database. Experimental results showed that both the models achieved better test accuracy when data augmentation is employed, and model constructed using ResNet50 outperformed VGG16 based model with a test accuracy of 90% with data augmentation & 82% without data augmentation.</span></span>

Download Full-text

Domain randomization-enhanced deep learning models for bird detection

Scientific Reports ◽

10.1038/s41598-020-80101-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Xin Mao ◽

Jun Kang Chow ◽

Pin Siang Tan ◽

Kuan-fu Liu ◽

Jimmy Wu ◽

...

Keyword(s):

Deep Learning ◽

Continuous Monitoring ◽

Bird Species ◽

Training Data ◽

Learning Models ◽

Fine Grained ◽

Bird Detection ◽

Relationship Of ◽

The Relationship

AbstractAutomatic bird detection in ornithological analyses is limited by the accuracy of existing models, due to the lack of training data and the difficulties in extracting the fine-grained features required to distinguish bird species. Here we apply the domain randomization strategy to enhance the accuracy of the deep learning models in bird detection. Trained with virtual birds of sufficient variations in different environments, the model tends to focus on the fine-grained features of birds and achieves higher accuracies. Based on the 100 terabytes of 2-month continuous monitoring data of egrets, our results cover the findings using conventional manual observations, e.g., vertical stratification of egrets according to body size, and also open up opportunities of long-term bird surveys requiring intensive monitoring that is impractical using conventional methods, e.g., the weather influences on egrets, and the relationship of the migration schedules between the great egrets and little egrets.

Download Full-text

Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Diagnostics ◽

10.3390/diagnostics11061052 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1052

Author(s):

Leang Sim Nguon ◽

Kangwon Seo ◽

Jung-Hyun Lim ◽

Tae-Jun Song ◽

Sung-Hyun Cho ◽

...

Keyword(s):

Decision Making ◽

Deep Learning ◽

Network Model ◽

Endoscopic Ultrasonography ◽

Data Augmentation ◽

Clinical Information ◽

Training Data ◽

Fine Tuning ◽

Cystic Neoplasm ◽

Cystic Neoplasms

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text

Listen2Cough

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3448124 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-22

Author(s):

Xuhai Xu ◽

Ebrahim Nemati ◽

Korosh Vatanparvar ◽

Viswam Nathan ◽

Tousif Ahmed ◽

...

Keyword(s):

Health Assessment ◽

Ground Truth ◽

Health Condition ◽

Fine Tuning ◽

Detection Model ◽

Assessment Tasks ◽

Augmentation Techniques ◽

In The Wild ◽

The Rich ◽

Lung Health

The prevalence of ubiquitous computing enables new opportunities for lung health monitoring and assessment. In the past few years, there have been extensive studies on cough detection using passively sensed audio signals. However, the generalizability of a cough detection model when applied to external datasets, especially in real-world implementation, is questionable and not explored adequately. Beyond detecting coughs, researchers have looked into how cough sounds can be used in assessing lung health. However, due to the challenges in collecting both cough sounds and lung health condition ground truth, previous studies have been hindered by the limited datasets. In this paper, we propose Listen2Cough to address these gaps. We first build an end-to-end deep learning architecture using public cough sound datasets to detect coughs within raw audio recordings. We employ a pre-trained MobileNet and integrate a number of augmentation techniques to improve the generalizability of our model. Without additional fine-tuning, our model is able to achieve an F1 score of 0.948 when tested against a new clean dataset, and 0.884 on another in-the-wild noisy dataset, leading to an advantage of 5.8% and 8.4% on average over the best baseline model, respectively. Then, to mitigate the issue of limited lung health data, we propose to transform the cough detection task to lung health assessment tasks so that the rich cough data can be leveraged. Our hypothesis is that these tasks extract and utilize similar effective representation from cough sounds. We embed the cough detection model into a multi-instance learning framework with the attention mechanism and further tune the model for lung health assessment tasks. Our final model achieves an F1-score of 0.912 on healthy v.s. unhealthy, 0.870 on obstructive v.s. non-obstructive, and 0.813 on COPD v.s. asthma classification, outperforming the baseline by 10.7%, 6.3%, and 3.7%, respectively. Moreover, the weight value in the attention layer can be used to identify important coughs highly correlated with lung health, which can potentially provide interpretability for expert diagnosis in the future.

Download Full-text

Fully automated contrast and non-contrast cardiac view detection in echocardiography a multi-centre, multi-vendor study

European Heart Journal ◽

10.1093/ehjci/ehaa946.0078 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

S Gao ◽

D Stojanovski ◽

A Parker ◽

P Marques ◽

S Heitner ◽

...

Keyword(s):

Neural Network ◽

Training Data ◽

Classification Model ◽

Validation Dataset ◽

Funding Source ◽

Private Company ◽

Validation Data ◽

Independent Test ◽

Model Training ◽

Confusion Matrices

Abstract Background Correctly identifying views acquired in a 2D echocardiographic examination is paramount to post-processing and quantification steps often performed as part of most clinical workflows. In many exams, particularly in stress echocardiography, microbubble contrast is used which greatly affects the appearance of the cardiac views. Here we present a bespoke, fully automated convolutional neural network (CNN) which identifies apical 2, 3, and 4 chamber, and short axis (SAX) views acquired with and without contrast. The CNN was tested in a completely independent, external dataset with the data acquired in a different country than that used to train the neural network. Methods Training data comprised of 2D echocardiograms was taken from 1014 subjects from a prospective multisite, multi-vendor, UK trial with the number of frames in each view greater than 17,500. Prior to view classification model training, images were processed using standard techniques to ensure homogenous and normalised image inputs to the training pipeline. A bespoke CNN was built using the minimum number of convolutional layers required with batch normalisation, and including dropout for reducing overfitting. Before processing, the data was split into 90% for model training (211,958 frames), and 10% used as a validation dataset (23,946 frames). Image frames from different subjects were separated out entirely amongst the training and validation datasets. Further, a separate trial dataset of 240 studies acquired in the USA was used as an independent test dataset (39,401 frames). Results Figure 1 shows the confusion matrices for both validation data (left) and independent test data (right), with an overall accuracy of 96% and 95% for the validation and test datasets respectively. The accuracy for the non-contrast cardiac views of >99% exceeds that seen in other works. The combined datasets included images acquired across ultrasound manufacturers and models from 12 clinical sites. Conclusion We have developed a CNN capable of automatically accurately identifying all relevant cardiac views used in “real world” echo exams, including views acquired with contrast. Use of the CNN in a routine clinical workflow could improve efficiency of quantification steps performed after image acquisition. This was tested on an independent dataset acquired in a different country to that used to train the model and was found to perform similarly thus indicating the generalisability of the model. Figure 1. Confusion matrices Funding Acknowledgement Type of funding source: Private company. Main funding source(s): Ultromics Ltd.

Download Full-text