scholarly journals Albumentations: Fast and Flexible Image Augmentations

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 125 ◽  
Author(s):  
Alexander Buslaev ◽  
Vladimir I. Iglovikov ◽  
Eugene Khvedchenya ◽  
Alex Parinov ◽  
Mikhail Druzhinin ◽  
...  

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve corresponding output labels. In computer vision, image augmentations have become a common implicit regularization technique to combat overfitting in deep learning models and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations of flipping, rotating, scaling, and cropping. Moreover, image processing speed varies in existing image augmentation libraries. We present Albumentations, a fast and flexible open source library for image augmentation with many various image transform operations available that is also an easy-to-use wrapper around other augmentation libraries. We discuss the design principles that drove the implementation of Albumentations and give an overview of the key features and distinct capabilities. Finally, we provide examples of image augmentations for different computer vision tasks and demonstrate that Albumentations is faster than other commonly used image augmentation tools on most image transform operations.

Author(s):  
Du Chunqi ◽  
Shinobu Hasegawa

In computer vision and computer graphics, 3D reconstruction is the process of capturing real objects’ shapes and appearances. 3D models always can be constructed by active methods which use high-quality scanner equipment, or passive methods that learn from the dataset. However, both of these two methods only aimed to construct the 3D models, without showing what element affects the generation of 3D models. Therefore, the goal of this research is to apply deep learning to automatically generating 3D models, and finding the latent variables which affect the reconstructing process. The existing research GANs can be trained in little data with two networks called Generator and Discriminator, respectively. Generator can produce synthetic data, and Discriminator can discriminate between the generator’s output and real data. The existing research shows that InFoGAN can maximize the mutual information between latent variables and observation. In our approach, we will generate the 3D models based on InFoGAN and design two constraints, shape-constraint and parameters-constraint, respectively. Shape-constraint utilizes the data augmentation method to limit the synthetic data generated in the models’ profiles. At the same time, we also try to employ parameters-constraint to find the 3D models’ relationship corresponding to the latent variables. Furthermore, our approach will be a challenge in the architecture of generating 3D models built on InFoGAN. Finally, in the process of generation, we might discover the contribution of the latent variables influencing the 3D models to the whole network.


2021 ◽  
Vol 7 (10) ◽  
pp. 204
Author(s):  
Vatsa S. Patel ◽  
Zhongliang Nie ◽  
Trung-Nghia Le ◽  
Tam V. Nguyen

Face recognition with wearable items has been a challenging task in computer vision and involves the problem of identifying humans wearing a face mask. Masked face analysis via multi-task learning could effectively improve performance in many fields of face analysis. In this paper, we propose a unified framework for predicting the age, gender, and emotions of people wearing face masks. We first construct FGNET-MASK, a masked face dataset for the problem. Then, we propose a multi-task deep learning model to tackle the problem. In particular, the multi-task deep learning model takes the data as inputs and shares their weight to yield predictions of age, expression, and gender for the masked face. Through extensive experiments, the proposed framework has been found to provide a better performance than other existing methods.


Author(s):  
Ahmed R. Luaibi ◽  
Tariq M. Salman ◽  
Abbas Hussein Miry

The food security major threats are the diseases affected in plants such as citrus so that the identification in an earlier time is very important. Convenient malady recognition can assist the client with responding immediately and sketch for some guarded activities. This recognition can be completed without a human by utilizing plant leaf pictures. There are many methods employed for the classification and detection in machine learning (ML) models, but the combination of increasing advances in computer vision appears the deep learning (DL) area research to achieve a great potential in terms of increasing accuracy. In this paper, two ways of conventional neural networks are used named Alex Net and Res Net models with and without data augmentation involves the process of creating new data points by manipulating the original data. This process increases the number of training images in DL without the need to add new photos, it will appropriate in the case of small datasets. A self-dataset of 200 images of diseases and healthy citrus leaves are collected. The trained models with data augmentation give the best results with 95.83% and 97.92% for Res Net and Alex Net respectively.


Recently, the demand for computer vision techniques is continuously rising because of the development of techniques in decision making pertaining to health sector. Image processing is a subset of computer vision which makes use of algorithms to perform vision emulation to recognize objects. In this study a novel convolutional neural network is configured based on deep learning to classifying Chest x-ray images into five major classes. It addresses an issue of insufficiency in medical images for employing deep learning for image classification. A new augmentation technique superimposing of images helps to generate more new samples from the available images using label-preserving transformations. Data augmentation technique can generate new sample data from the original data using various transforming strategies. Therefore the data augmentation technique helps in accumulating enough data for processing to obtain better performance. The main objective of superimposing of two images is to minimize redundancy and uncertainty in the output image. Therefore the superimposing carried out with original image and a set of various augmented image to obtain better accuracy. Later results of various superimposing techniques are compared and evaluated to demonstrate the better techniques. It is concluded that the proposed techniques can obtain better performance in medical image classification problem.


2022 ◽  
Author(s):  
Ms. Aayushi Bansal ◽  
Dr. Rewa Sharma ◽  
Dr. Mamta Kathuria

Recent advancements in deep learning architecture have increased its utility in real-life applications. Deep learning models require a large amount of data to train the model. In many application domains, there is a limited set of data available for training neural networks as collecting new data is either not feasible or requires more resources such as in marketing, computer vision, and medical science. These models require a large amount of data to avoid the problem of overfitting. One of the data space solutions to the problem of limited data is data augmentation. The purpose of this study focuses on various data augmentation techniques that can be used to further improve the accuracy of a neural network. This saves the cost and time consumption required to collect new data for the training of deep neural networks by augmenting available data. This also regularizes the model and improves its capability of generalization. The need for large datasets in different fields such as computer vision, natural language processing, security and healthcare is also covered in this survey paper. The goal of this paper is to provide a comprehensive survey of recent advancements in data augmentation techniques and their application in various domains.


2021 ◽  
Vol 13 (2) ◽  
pp. 260
Author(s):  
Ha Trang Nguyen ◽  
Maximo Larry Lopez Caceres ◽  
Koma Moritake ◽  
Sarah Kentsch ◽  
Hase Shu ◽  
...  

Insect outbreaks are a recurrent natural phenomenon in forest ecosystems expected to increase due to climate change. Recent advances in Unmanned Aerial Vehicles (UAV) and Deep Learning (DL) Networks provide us with tools to monitor them. In this study we used nine orthomosaics and normalized Digital Surface Models (nDSM) to detect and classify healthy and sick Maries fir trees as well as deciduous trees. This study aims at automatically classifying treetops by means of a novel computer vision treetops detection algorithm and the adaptation of existing DL architectures. Considering detection alone, the accuracy results showed 85.70% success. In terms of detection and classification, we were able to detect/classify correctly 78.59% of all tree classes (39.64% for sick fir). However, with data augmentation, detection/classification percentage of the sick fir class rose to 73.01% at the cost of the result accuracy of all tree classes that dropped 63.57%. The implementation of UAV, computer vision and DL techniques contribute to the development of a new approach to evaluate the impact of insect outbreaks in forest.


Author(s):  
Ramaprasad Poojary ◽  
Roma Raina ◽  
Amit Kumar Mondal

<span id="docs-internal-guid-cdb76bbb-7fff-978d-961c-e21c41807064"><span>During the last few years, deep learning achieved remarkable results in the field of machine learning when used for computer vision tasks. Among many of its architectures, deep neural network-based architecture known as convolutional neural networks are recently used widely for image detection and classification. Although it is a great tool for computer vision tasks, it demands a large amount of training data to yield high performance. In this paper, the data augmentation method is proposed to overcome the challenges faced due to a lack of insufficient training data. To analyze the effect of data augmentation, the proposed method uses two convolutional neural network architectures. To minimize the training time without compromising accuracy, models are built by fine-tuning pre-trained networks VGG16 and ResNet50. To evaluate the performance of the models, loss functions and accuracies are used. Proposed models are constructed using Keras deep learning framework and models are trained on a custom dataset created from Kaggle CAT vs DOG database. Experimental results showed that both the models achieved better test accuracy when data augmentation is employed, and model constructed using ResNet50 outperformed VGG16 based model with a test accuracy of 90% with data augmentation &amp; 82% without data augmentation.</span></span>


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Jacqueline A. Valeri ◽  
Katherine M. Collins ◽  
Pradeep Ramesh ◽  
Miguel A. Alcantar ◽  
Bianca A. Lepe ◽  
...  

Abstract While synthetic biology has revolutionized our approaches to medicine, agriculture, and energy, the design of completely novel biological circuit components beyond naturally-derived templates remains challenging due to poorly understood design rules. Toehold switches, which are programmable nucleic acid sensors, face an analogous design bottleneck; our limited understanding of how sequence impacts functionality often necessitates expensive, time-consuming screens to identify effective switches. Here, we introduce Sequence-based Toehold Optimization and Redesign Model (STORM) and Nucleic-Acid Speech (NuSpeak), two orthogonal and synergistic deep learning architectures to characterize and optimize toeholds. Applying techniques from computer vision and natural language processing, we ‘un-box’ our models using convolutional filters, attention maps, and in silico mutagenesis. Through transfer-learning, we redesign sub-optimal toehold sensors, even with sparse training data, experimentally validating their improved performance. This work provides sequence-to-function deep learning frameworks for toehold selection and design, augmenting our ability to construct potent biological circuit components and precision diagnostics.


Sign in / Sign up

Export Citation Format

Share Document