Deep Learning Case Study on Imbalanced Training Data for Automatic Bird Identification

Author(s):  
Juha Niemi ◽  
Juha T. Tanttu
2019 ◽  
Vol 11 (4) ◽  
pp. 1-22 ◽  
Author(s):  
Junhua Ding ◽  
Xinchuan Li ◽  
Xiaojun Kang ◽  
Venkat N. Gudivada

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5428
Author(s):  
Suzanna Cuypers ◽  
Maarten Bassier ◽  
Maarten Vergauwen

With recent advancements in deep learning models for image interpretation, it has finally become possible to automate construction site monitoring processes that rely on remote sensing. However, the major drawback of these models is their dependency on large datasets of training images labeled at pixel level, which have to be produced manually by skilled personnel. To alleviate the need for training data, this study evaluates weakly- and semi-supervised semantic segmentation models for construction site imagery to efficiently automate monitoring tasks. As a case study, we compare fully-, weakly- and semi-supervised methods for the detection of rebar covers, which are useful for quality control. In the experiments, recent models, i.e. IRNet, DeepLabv3+ and the cross-consistency training model, are compared for their ability to segment rebar covers from construction site imagery with minimal manual input. The results show that weakly- and semi-supervised models can indeed approach the performance of fully-supervised models, with the majority of the target objects being properly found. Through this study, construction site stakeholders are provided with detailed information on how tp leverage deep learning for efficient construction site monitoring and weigh preprocessing, training and testing efforts against each other in order to decide between fully-, weakly- and semi-supervised training.


2018 ◽  
Vol 10 (12) ◽  
pp. 2067 ◽  
Author(s):  
Lingcao Huang ◽  
Lin Liu ◽  
Liming Jiang ◽  
Tingjun Zhang

Thawing of ice-rich permafrost causes thermokarst landforms on the ground surface. Obtaining the distribution of thermokarst landforms is a prerequisite for understanding permafrost degradation and carbon exchange at local and regional scales. However, because of their diverse types and characteristics, it is challenging to map thermokarst landforms from remote sensing images. We conducted a case study towards automatically mapping a type of thermokarst landforms (i.e., thermo-erosion gullies) in a local area in the northeastern Tibetan Plateau from high-resolution images by the use of deep learning. In particular, we applied the DeepLab algorithm (based on Convolutional Neural Networks) to a 0.15-m-resolution Digital Orthophoto Map (created using aerial photographs taken by an Unmanned Aerial Vehicle). Here, we document the detailed processing flow with key steps including preparing training data, fine-tuning, inference, and post-processing. Validating against the field measurements and manual digitizing results, we obtained an F1 score of 0.74 (precision is 0.59 and recall is 1.0), showing that the proposed method can effectively map small and irregular thermokarst landforms. It is potentially viable to apply the designed method to mapping diverse thermokarst landforms in a larger area where high-resolution images and training data are available.


2019 ◽  
Vol 9 (22) ◽  
pp. 4749
Author(s):  
Lingyun Jiang ◽  
Kai Qiao ◽  
Linyuan Wang ◽  
Chi Zhang ◽  
Jian Chen ◽  
...  

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Xin Mao ◽  
Jun Kang Chow ◽  
Pin Siang Tan ◽  
Kuan-fu Liu ◽  
Jimmy Wu ◽  
...  

AbstractAutomatic bird detection in ornithological analyses is limited by the accuracy of existing models, due to the lack of training data and the difficulties in extracting the fine-grained features required to distinguish bird species. Here we apply the domain randomization strategy to enhance the accuracy of the deep learning models in bird detection. Trained with virtual birds of sufficient variations in different environments, the model tends to focus on the fine-grained features of birds and achieves higher accuracies. Based on the 100 terabytes of 2-month continuous monitoring data of egrets, our results cover the findings using conventional manual observations, e.g., vertical stratification of egrets according to body size, and also open up opportunities of long-term bird surveys requiring intensive monitoring that is impractical using conventional methods, e.g., the weather influences on egrets, and the relationship of the migration schedules between the great egrets and little egrets.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2611
Author(s):  
Andrew Shepley ◽  
Greg Falzon ◽  
Christopher Lawson ◽  
Paul Meek ◽  
Paul Kwan

Image data is one of the primary sources of ecological data used in biodiversity conservation and management worldwide. However, classifying and interpreting large numbers of images is time and resource expensive, particularly in the context of camera trapping. Deep learning models have been used to achieve this task but are often not suited to specific applications due to their inability to generalise to new environments and inconsistent performance. Models need to be developed for specific species cohorts and environments, but the technical skills required to achieve this are a key barrier to the accessibility of this technology to ecologists. Thus, there is a strong need to democratize access to deep learning technologies by providing an easy-to-use software application allowing non-technical users to train custom object detectors. U-Infuse addresses this issue by providing ecologists with the ability to train customised models using publicly available images and/or their own images without specific technical expertise. Auto-annotation and annotation editing functionalities minimize the constraints of manually annotating and pre-processing large numbers of images. U-Infuse is a free and open-source software solution that supports both multiclass and single class training and object detection, allowing ecologists to access deep learning technologies usually only available to computer scientists, on their own device, customised for their application, without sharing intellectual property or sensitive data. It provides ecological practitioners with the ability to (i) easily achieve object detection within a user-friendly GUI, generating a species distribution report, and other useful statistics, (ii) custom train deep learning models using publicly available and custom training data, (iii) achieve supervised auto-annotation of images for further training, with the benefit of editing annotations to ensure quality datasets. Broad adoption of U-Infuse by ecological practitioners will improve ecological image analysis and processing by allowing significantly more image data to be processed with minimal expenditure of time and resources, particularly for camera trap images. Ease of training and use of transfer learning means domain-specific models can be trained rapidly, and frequently updated without the need for computer science expertise, or data sharing, protecting intellectual property and privacy.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1052
Author(s):  
Leang Sim Nguon ◽  
Kangwon Seo ◽  
Jung-Hyun Lim ◽  
Tae-Jun Song ◽  
Sung-Hyun Cho ◽  
...  

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.


2021 ◽  
Vol 11 (6) ◽  
pp. 2866
Author(s):  
Damheo Lee ◽  
Donghyun Kim ◽  
Seung Yun ◽  
Sanghun Kim

In this paper, we propose a new method for code-switching (CS) automatic speech recognition (ASR) in Korean. First, the phonetic variations in English pronunciation spoken by Korean speakers should be considered. Thus, we tried to find a unified pronunciation model based on phonetic knowledge and deep learning. Second, we extracted the CS sentences semantically similar to the target domain and then applied the language model (LM) adaptation to solve the biased modeling toward Korean due to the imbalanced training data. In this experiment, training data were AI Hub (1033 h) in Korean and Librispeech (960 h) in English. As a result, when compared to the baseline, the proposed method improved the error reduction rate (ERR) by up to 11.6% with phonetic variant modeling and by 17.3% when semantically similar sentences were applied to the LM adaptation. If we considered only English words, the word correction rate improved up to 24.2% compared to that of the baseline. The proposed method seems to be very effective in CS speech recognition.


Sign in / Sign up

Export Citation Format

Share Document