Data augmentation for deep learning based semantic segmentation and crop-weed classification in agricultural robotics

Smart farming employs intelligent systems for every domain of agriculture to obtain sustainable economic growth with the available resources using advanced technologies. Deep Learning (DL) is a sophisticated artificial neural network architecture that provides state-of-the-art results in smart farming applications. One of the main tasks in this domain is yield estimation. Manual yield estimation undergoes many hurdles such as labor-intensive, time-consuming, imprecise results, etc. These issues motivate the development of an intelligent fruit yield estimation system that offers more benefits to the farmers in deciding harvesting, marketing, etc. Semantic segmentation combined with DL adds promising results in fruit detection and localization by performing pixel-based prediction. This paper reviews the different literature employing various techniques for fruit yield estimation using DL-based semantic segmentation architectures. It also discusses the challenging issues that occur during intelligent fruit yield estimation such as sampling, collection, annotation and data augmentation, fruit detection, and counting. Results show that the fruit yield estimation employing DL-based semantic segmentation techniques yields better performance than earlier techniques because of human cognition incorporated into the architecture. Future directions like customization of DL architecture for smart-phone applications to predict the yield, development of more comprehensive model encompassing challenging situations like occlusion, overlapping and illumination variation, etc., were also discussed.

Download Full-text

Enhancing Deep Learning-based BIM Element Classification via Data Augmentation and Semantic Segmentation

10.22260/isarc2021/0033 ◽

2021 ◽

Author(s):

Youngsu Yu ◽

Koeun Lee ◽

Daemok Ha ◽

Bonsang Koo

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Semantic Segmentation

Download Full-text

MIScnn: a framework for medical image segmentation with convolutional neural networks and deep learning

BMC Medical Imaging ◽

10.1186/s12880-020-00543-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Dominik Müller ◽

Frank Kramer

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Medical Image ◽

Cross Validation ◽

Data Augmentation ◽

Semantic Segmentation ◽

Medical Image Segmentation ◽

Tumor Segmentation ◽

Data Set ◽

Public Data

Abstract Background The increased availability and usage of modern medical imaging induced a strong need for automatic medical image segmentation. Still, current image segmentation platforms do not provide the required functionalities for plain setup of medical image segmentation pipelines. Already implemented pipelines are commonly standalone software, optimized on a specific public data set. Therefore, this paper introduces the open-source Python library MIScnn. Implementation The aim of MIScnn is to provide an intuitive API allowing fast building of medical image segmentation pipelines including data I/O, preprocessing, data augmentation, patch-wise analysis, metrics, a library with state-of-the-art deep learning models and model utilization like training, prediction, as well as fully automatic evaluation (e.g. cross-validation). Similarly, high configurability and multiple open interfaces allow full pipeline customization. Results Running a cross-validation with MIScnn on the Kidney Tumor Segmentation Challenge 2019 data set (multi-class semantic segmentation with 300 CT scans) resulted into a powerful predictor based on the standard 3D U-Net model. Conclusions With this experiment, we could show that the MIScnn framework enables researchers to rapidly set up a complete medical image segmentation pipeline by using just a few lines of code. The source code for MIScnn is available in the Git repository: https://github.com/frankkramer-lab/MIScnn.

Download Full-text

Fast body part segmentation and tracking of neonatal video data using deep learning

Medical & Biological Engineering & Computing ◽

10.1007/s11517-020-02251-4 ◽

2020 ◽

Vol 58 (12) ◽

pp. 3049-3061

Author(s):

Christoph Hoog Antink ◽

Joana Carlos Mesquita Ferreira ◽

Michael Paul ◽

Simon Lyra ◽

Konrad Heimann ◽

...

Keyword(s):

Deep Learning ◽

Real Time ◽

Neonatal Intensive Care ◽

Near Infrared ◽

Data Augmentation ◽

Region Of Interest ◽

Semantic Segmentation ◽

Body Part ◽

Video Data ◽

Computational Time

AbstractPhotoplethysmography imaging (PPGI) for non-contact monitoring of preterm infants in the neonatal intensive care unit (NICU) is a promising technology, as it could reduce medical adhesive-related skin injuries and associated complications. For practical implementations of PPGI, a region of interest has to be detected automatically in real time. As the neonates’ body proportions differ significantly from adults, existing approaches may not be used in a straightforward way, and color-based skin detection requires RGB data, thus prohibiting the use of less-intrusive near-infrared (NIR) acquisition. In this paper, we present a deep learning-based method for segmentation of neonatal video data. We augmented an existing encoder-decoder semantic segmentation method with a modified version of the ResNet-50 encoder. This reduced the computational time by a factor of 7.5, so that 30 frames per second can be processed at 960 × 576 pixels. The method was developed and optimized on publicly available databases with segmentation data from adults. For evaluation, a comprehensive dataset consisting of RGB and NIR video recordings from 29 neonates with various skin tones recorded in two NICUs in Germany and India was used. From all recordings, 643 frames were manually segmented. After pre-training the model on the public adult data, parts of the neonatal data were used for additional learning and left-out neonates are used for cross-validated evaluation. On the RGB data, the head is segmented well (82% intersection over union, 88% accuracy), and performance is comparable with those achieved on large, public, non-neonatal datasets. On the other hand, performance on the NIR data was inferior. By employing data augmentation to generate additional virtual NIR data for training, results could be improved and the head could be segmented with 62% intersection over union and 65% accuracy. The method is in theory capable of performing segmentation in real time and thus it may provide a useful tool for future PPGI applications.

Download Full-text

DeepImageTranslator: a free, user-friendly graphical interface for image translation using deep-learning and its applications in 3D CT image analysis

10.1101/2021.05.15.444315 ◽

2021 ◽

Author(s):

Run Zhou Ye ◽

Christophe Noll ◽

Gabriel Richard ◽

Martin Lepage ◽

Eric E. Turcotte ◽

...

Keyword(s):

Deep Learning ◽

Noise Reduction ◽

Data Augmentation ◽

Semantic Segmentation ◽

Training Sample ◽

Primary Objective ◽

Small Subset ◽

Graphical Interface ◽

Image Translation ◽

User Friendly

Objectives: The advent of deep-learning has set new standards in an array of image translation applications. At present, the use of these methods often requires computer programming experience. Non-commercial programs with graphical interface usually do not allow users to fully customize their deep-learning pipeline. Therefore, our primary objective is to provide a simple graphical interface that allows students and researchers with no programming experience to easily create, train, and evaluate custom deep-learning models for image translation. We also aimed to test the applicability of our tool (the DeepImageTranslator) in two different tasks: semantic segmentation and noise reduction of CT images. Methods: The DeepImageTranslator was implemented using the Tkinter library; backend computations were implemented using Pillow, Numpy, OpenCV, Augmentor, Tensorflow, and Keras libraries. Convolutional neural networks (CNNs) were trained using DeepImageTranslator and assessed with three-way cross-validation. The effects of data augmentation, deep-supervision, and sample size on model accuracy were also systematically assessed. Results: The DeepImageTranslator a simple tool that allows users to customize all aspects of their deep-learning pipeline, including the CNN, the training optimizer, the loss function, and the types of training image augmentation scheme. We showed that DeepImageTranslator can be used to achieve state-of-the-art accuracy and generalizability in semantic segmentation and noise reduction. Highly accurate 3D segmentation models for body composition can be obtained using training sample sizes as small as 17 images. Therefore, for studies with small datasets, researchers can randomly select a very small subset of images for manual labeling, which can then be used to train a specialized CNN model with DeepImageTranslator to fully automate segmentation of the entire dataset, thereby saving tremendous time and effort. Conclusions: An open-source deep-learning tool for accurate image translation with a user-friendly graphical interface was presented and evaluated. This standalone software is freely available for Windows 10 at:https://sourceforge.net/projects/deepimagetranslator/

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text

Increasing Accuracy of Stock Price Pattern Prediction through Data Augmentation for Deep Learning

The Korea Journal of BigData ◽

10.36498/kbigdt.2019.4.2.1 ◽

2019 ◽

Vol 4 (2) ◽

pp. 1-12

Author(s):

김영준 ◽

이인선 ◽

Hong Joo Lee ◽

김여정

Keyword(s):

Deep Learning ◽

Stock Price ◽

Data Augmentation ◽

Pattern Prediction

Download Full-text

Development of environment design support mixed reality system capable of environment estimation using deep learning

Impact ◽

10.21820/23987073.2020.2.9 ◽

2020 ◽

Vol 2020 (2) ◽

pp. 9-11

Author(s):

Tomohiro Fukuda

Keyword(s):

Deep Learning ◽

Real Time ◽

Computer Games ◽

Construction Projects ◽

Mixed Reality ◽

Semantic Segmentation ◽

Environment Design ◽

Aviation Training ◽

Architecture And Design ◽

World Environment

Mixed reality (MR) is rapidly becoming a vital tool, not just in gaming, but also in education, medicine, construction and environmental management. The term refers to systems in which computer-generated content is superimposed over objects in a real-world environment across one or more sensory modalities. Although most of us have heard of the use of MR in computer games, it also has applications in military and aviation training, as well as tourism, healthcare and more. In addition, it has the potential for use in architecture and design, where buildings can be superimposed in existing locations to render 3D generations of plans. However, one major challenge that remains in MR development is the issue of real-time occlusion. This refers to hiding 3D virtual objects behind real articles. Dr Tomohiro Fukuda, who is based at the Division of Sustainable Energy and Environmental Engineering, Graduate School of Engineering at Osaka University in Japan, is an expert in this field. Researchers, led by Dr Tomohiro Fukuda, are tackling the issue of occlusion in MR. They are currently developing a MR system that realises real-time occlusion by harnessing deep learning to achieve an outdoor landscape design simulation using a semantic segmentation technique. This methodology can be used to automatically estimate the visual environment prior to and after construction projects.

Download Full-text