Data augmentation for deep learning based semantic segmentation and crop-weed classification in agricultural robotics

2021 ◽  
Vol 190 ◽  
pp. 106418
Author(s):  
Daobilige Su ◽  
He Kong ◽  
Yongliang Qiao ◽  
Salah Sukkarieh
2021 ◽  
Vol 12 ◽  
Author(s):  
Prabhakar Maheswari ◽  
Purushothaman Raja ◽  
Orly Enrique Apolo-Apolo ◽  
Manuel Pérez-Ruiz

Smart farming employs intelligent systems for every domain of agriculture to obtain sustainable economic growth with the available resources using advanced technologies. Deep Learning (DL) is a sophisticated artificial neural network architecture that provides state-of-the-art results in smart farming applications. One of the main tasks in this domain is yield estimation. Manual yield estimation undergoes many hurdles such as labor-intensive, time-consuming, imprecise results, etc. These issues motivate the development of an intelligent fruit yield estimation system that offers more benefits to the farmers in deciding harvesting, marketing, etc. Semantic segmentation combined with DL adds promising results in fruit detection and localization by performing pixel-based prediction. This paper reviews the different literature employing various techniques for fruit yield estimation using DL-based semantic segmentation architectures. It also discusses the challenging issues that occur during intelligent fruit yield estimation such as sampling, collection, annotation and data augmentation, fruit detection, and counting. Results show that the fruit yield estimation employing DL-based semantic segmentation techniques yields better performance than earlier techniques because of human cognition incorporated into the architecture. Future directions like customization of DL architecture for smart-phone applications to predict the yield, development of more comprehensive model encompassing challenging situations like occlusion, overlapping and illumination variation, etc., were also discussed.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Dominik Müller ◽  
Frank Kramer

Abstract Background The increased availability and usage of modern medical imaging induced a strong need for automatic medical image segmentation. Still, current image segmentation platforms do not provide the required functionalities for plain setup of medical image segmentation pipelines. Already implemented pipelines are commonly standalone software, optimized on a specific public data set. Therefore, this paper introduces the open-source Python library MIScnn. Implementation The aim of MIScnn is to provide an intuitive API allowing fast building of medical image segmentation pipelines including data I/O, preprocessing, data augmentation, patch-wise analysis, metrics, a library with state-of-the-art deep learning models and model utilization like training, prediction, as well as fully automatic evaluation (e.g. cross-validation). Similarly, high configurability and multiple open interfaces allow full pipeline customization. Results Running a cross-validation with MIScnn on the Kidney Tumor Segmentation Challenge 2019 data set (multi-class semantic segmentation with 300 CT scans) resulted into a powerful predictor based on the standard 3D U-Net model. Conclusions With this experiment, we could show that the MIScnn framework enables researchers to rapidly set up a complete medical image segmentation pipeline by using just a few lines of code. The source code for MIScnn is available in the Git repository: https://github.com/frankkramer-lab/MIScnn.


2020 ◽  
Vol 58 (12) ◽  
pp. 3049-3061
Author(s):  
Christoph Hoog Antink ◽  
Joana Carlos Mesquita Ferreira ◽  
Michael Paul ◽  
Simon Lyra ◽  
Konrad Heimann ◽  
...  

AbstractPhotoplethysmography imaging (PPGI) for non-contact monitoring of preterm infants in the neonatal intensive care unit (NICU) is a promising technology, as it could reduce medical adhesive-related skin injuries and associated complications. For practical implementations of PPGI, a region of interest has to be detected automatically in real time. As the neonates’ body proportions differ significantly from adults, existing approaches may not be used in a straightforward way, and color-based skin detection requires RGB data, thus prohibiting the use of less-intrusive near-infrared (NIR) acquisition. In this paper, we present a deep learning-based method for segmentation of neonatal video data. We augmented an existing encoder-decoder semantic segmentation method with a modified version of the ResNet-50 encoder. This reduced the computational time by a factor of 7.5, so that 30 frames per second can be processed at 960 × 576 pixels. The method was developed and optimized on publicly available databases with segmentation data from adults. For evaluation, a comprehensive dataset consisting of RGB and NIR video recordings from 29 neonates with various skin tones recorded in two NICUs in Germany and India was used. From all recordings, 643 frames were manually segmented. After pre-training the model on the public adult data, parts of the neonatal data were used for additional learning and left-out neonates are used for cross-validated evaluation. On the RGB data, the head is segmented well (82% intersection over union, 88% accuracy), and performance is comparable with those achieved on large, public, non-neonatal datasets. On the other hand, performance on the NIR data was inferior. By employing data augmentation to generate additional virtual NIR data for training, results could be improved and the head could be segmented with 62% intersection over union and 65% accuracy. The method is in theory capable of performing segmentation in real time and thus it may provide a useful tool for future PPGI applications.


2021 ◽  
Author(s):  
Run Zhou Ye ◽  
Christophe Noll ◽  
Gabriel Richard ◽  
Martin Lepage ◽  
Eric E. Turcotte ◽  
...  

Objectives: The advent of deep-learning has set new standards in an array of image translation applications. At present, the use of these methods often requires computer programming experience. Non-commercial programs with graphical interface usually do not allow users to fully customize their deep-learning pipeline. Therefore, our primary objective is to provide a simple graphical interface that allows students and researchers with no programming experience to easily create, train, and evaluate custom deep-learning models for image translation. We also aimed to test the applicability of our tool (the DeepImageTranslator) in two different tasks: semantic segmentation and noise reduction of CT images. Methods: The DeepImageTranslator was implemented using the Tkinter library; backend computations were implemented using Pillow, Numpy, OpenCV, Augmentor, Tensorflow, and Keras libraries. Convolutional neural networks (CNNs) were trained using DeepImageTranslator and assessed with three-way cross-validation. The effects of data augmentation, deep-supervision, and sample size on model accuracy were also systematically assessed. Results: The DeepImageTranslator a simple tool that allows users to customize all aspects of their deep-learning pipeline, including the CNN, the training optimizer, the loss function, and the types of training image augmentation scheme. We showed that DeepImageTranslator can be used to achieve state-of-the-art accuracy and generalizability in semantic segmentation and noise reduction. Highly accurate 3D segmentation models for body composition can be obtained using training sample sizes as small as 17 images. Therefore, for studies with small datasets, researchers can randomly select a very small subset of images for manual labeling, which can then be used to train a specialized CNN model with DeepImageTranslator to fully automate segmentation of the entire dataset, thereby saving tremendous time and effort. Conclusions: An open-source deep-learning tool for accurate image translation with a user-friendly graphical interface was presented and evaluated. This standalone software is freely available for Windows 10 at:https://sourceforge.net/projects/deepimagetranslator/


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


2020 ◽  
Vol 17 (3) ◽  
pp. 299-305 ◽  
Author(s):  
Riaz Ahmad ◽  
Saeeda Naz ◽  
Muhammad Afzal ◽  
Sheikh Rashid ◽  
Marcus Liwicki ◽  
...  

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.


Impact ◽  
2020 ◽  
Vol 2020 (2) ◽  
pp. 9-11
Author(s):  
Tomohiro Fukuda

Mixed reality (MR) is rapidly becoming a vital tool, not just in gaming, but also in education, medicine, construction and environmental management. The term refers to systems in which computer-generated content is superimposed over objects in a real-world environment across one or more sensory modalities. Although most of us have heard of the use of MR in computer games, it also has applications in military and aviation training, as well as tourism, healthcare and more. In addition, it has the potential for use in architecture and design, where buildings can be superimposed in existing locations to render 3D generations of plans. However, one major challenge that remains in MR development is the issue of real-time occlusion. This refers to hiding 3D virtual objects behind real articles. Dr Tomohiro Fukuda, who is based at the Division of Sustainable Energy and Environmental Engineering, Graduate School of Engineering at Osaka University in Japan, is an expert in this field. Researchers, led by Dr Tomohiro Fukuda, are tackling the issue of occlusion in MR. They are currently developing a MR system that realises real-time occlusion by harnessing deep learning to achieve an outdoor landscape design simulation using a semantic segmentation technique. This methodology can be used to automatically estimate the visual environment prior to and after construction projects.


Sign in / Sign up

Export Citation Format

Share Document