Species recognition technology based on migration learning and data augmentation

In recent years, citizen science campaigns have provided a very good platform for widespread data collection. Within the marine domain, jellyfish are among the most commonly deployed species for citizen reporting purposes. The timely validation of submitted jellyfish reports remains challenging, given the sheer volume of reports being submitted and the relative paucity of trained staff familiar with the taxonomic identification of jellyfish. In this work, hundreds of photos that were submitted to the “Spot the Jellyfish” initiative are used to train a group of region-based, convolution neural networks. The main aim is to develop models that can classify, and distinguish between, the five most commonly recorded species of jellyfish within Maltese waters. In particular, images of the Pelagia noctiluca, Cotylorhiza tuberculata, Carybdea marsupialis, Velella velella and salps were considered. The reliability of the digital architecture is quantified through the precision, recall, f1 score, and κ score metrics. Improvements gained through the applicability of data augmentation and transfer learning techniques, are also discussed. Very promising results, that support upcoming aspirations to embed automated classification methods within online services, including smart phone apps, were obtained. These can reduce, and potentially eliminate, the need for human expert intervention in validating citizen science reports for the five jellyfish species in question, thus providing prompt feedback to the citizen scientist submitting the report.

Download Full-text

Artificial Intelligence in Smart Farms: Plant Phenotyping for Species Recognition and Health Condition Identification Using Deep Learning

AI ◽

10.3390/ai2020017 ◽

2021 ◽

Vol 2 (2) ◽

pp. 274-289

Author(s):

Anirban Jyoti Hati ◽

Rajiv Ranjan Singh

Keyword(s):

Data Augmentation ◽

Species Recognition ◽

Health Condition ◽

Plant Phenotyping ◽

Test Cases ◽

Training Time ◽

Multiple Test ◽

Unequal Number ◽

Multiclass Classifier ◽

Evaluation Parameters

This paper analyses the contribution of residual network (ResNet) based convolutional neural network (CNN) architecture employed in two tasks related to plant phenotyping. Among the contemporary works for species recognition (SR) and infection detection of plants, the majority of them have performed experiments on balanced datasets and used accuracy as the evaluation parameter. However, this work used an imbalanced dataset having an unequal number of images, applied data augmentation to increase accuracy, organised data as multiple test cases and classes, and, most importantly, employed multiclass classifier evaluation parameters useful for asymmetric class distribution. Additionally, the work addresses typical issues faced such as selecting the size of the dataset, depth of classifiers, training time needed, and analysing the classifier’s performance if various test cases are deployed. In this work, ResNet 20 (V2) architecture has performed significantly well in the tasks of Species Recognition (SR) and Identification of Healthy and Infected Leaves (IHIL) with a Precision of 91.84% and 84.00%, Recall of 91.67% and 83.14% and F1 Score of 91.49% and 83.19%, respectively.

Download Full-text

Augmentation Methods for Biodiversity Training Data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37307 ◽

2019 ◽

Vol 3 ◽

Author(s):

Mario Lasseck

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Augmentation ◽

Species Recognition ◽

Original Data ◽

Machine Learning Algorithms ◽

Training Data ◽

Computer Hardware ◽

Individual Species ◽

Learning Approaches

The detection and identification of individual species based on images or audio recordings has shown significant performance increase over the last few years, thanks to recent advances in deep learning. Reliable automatic species recognition provides a promising tool for biodiversity monitoring, research and education. Image-based plant identification, for example, now comes close to the most advanced human expertise (Bonnet et al. 2018, Lasseck 2018a). Besides improved machine learning algorithms, neural network architectures, deep learning frameworks and computer hardware, a major reason for the gain in performance is the increasing abundance of biodiversity training data, either from observational networks and data providers like GBIF, Xeno-canto, iNaturalist, etc. or natural history museum collections like the Animal Sound Archive of the Museum für Naturkunde. However, in many cases, this occurrence data is still insufficient for data-intensive deep learning approaches and is often unbalanced, with only few examples for very rare species. To overcome these limitations, data augmentation can be used. This technique synthetically creates more training samples by applying various subtle random manipulations to the original data in a label-preserving way without changing the content. In the talk, we will present augmentation methods for images and audio data. The positive effect on identification performance will be evaluated on different large-scale data sets from recent plant and bird identification (LifeCLEF 2017, 2018) and detection (DCASE 2018) challenges (Lasseck 2017, Lasseck 2018b, Lasseck 2018c).

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text

Deep neural networks trained with heavier data augmentation learn features closer to representations in hIT

10.32470/ccn.2018.1046-0 ◽

2018 ◽

Cited By ~ 1

Author(s):

Alex Hernández-García ◽

Johannes Mehrer ◽

Nikolaus Kriegeskorte ◽

Peter König ◽

Tim C. Kietzmann

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Data Augmentation

Download Full-text

Comparison of Nonlinear Spatial Correlation Models by the Influence of the Data Augmentation to the Classification Risk

Nonlinear Analysis Modelling and Control ◽

10.15388/na.2002.7.1.15200 ◽

2002 ◽

Vol 7 (1) ◽

pp. 31-42

Author(s):

J. Šaltytė ◽

K. Dučinskas

Keyword(s):

Spatial Correlation ◽

Random Fields ◽

Data Augmentation ◽

Gaussian Random Fields ◽

Classification Rule ◽

Numerical Comparison ◽

First Order ◽

Bayesian Risk ◽

Correlation Models

The Bayesian classification rule used for the classification of the observations of the (second-order) stationary Gaussian random fields with different means and common factorised covariance matrices is investigated. The influence of the observed data augmentation to the Bayesian risk is examined for three different nonlinear widely applicable spatial correlation models. The explicit expression of the Bayesian risk for the classification of augmented data is derived. Numerical comparison of these models by the variability of Bayesian risk in case of the first-order neighbourhood scheme is performed.

Download Full-text

Integrating Improved U-Net and Continuous Maximum Flow Algorithm for 3D Brain Tumor Image Segmentation

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2020.64.4.040412 ◽

2020 ◽

Vol 64 (4) ◽

pp. 40412-1-40412-11

Author(s):

Kexin Bai ◽

Qiang Li ◽

Ching-Hsin Wang

Keyword(s):

Brain Tumor ◽

Data Augmentation ◽

A Priori ◽

Class Imbalance ◽

Maximum Flow ◽

Magnetic Resonance Images ◽

Tumor Segmentation ◽

Similarity Coefficients ◽

Segmentation Algorithms ◽

Flow Algorithm

Abstract To address the issues of the relatively small size of brain tumor image datasets, severe class imbalance, and low precision in existing segmentation algorithms for brain tumor images, this study proposes a two-stage segmentation algorithm integrating convolutional neural networks (CNNs) and conventional methods. Four modalities of the original magnetic resonance images were first preprocessed separately. Next, preliminary segmentation was performed using an improved U-Net CNN containing deep monitoring, residual structures, dense connection structures, and dense skip connections. The authors adopted a multiclass Dice loss function to deal with class imbalance and successfully prevented overfitting using data augmentation. The preliminary segmentation results subsequently served as the a priori knowledge for a continuous maximum flow algorithm for fine segmentation of target edges. Experiments revealed that the mean Dice similarity coefficients of the proposed algorithm in whole tumor, tumor core, and enhancing tumor segmentation were 0.9072, 0.8578, and 0.7837, respectively. The proposed algorithm presents higher accuracy and better stability in comparison with some of the more advanced segmentation algorithms for brain tumor images.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text