Data Augmentation to Stabilize Image Caption Generation Models in Deep Learning

Automatic Image Caption Generation is one of the core problems in the field of Deep Learning. Data Augmentation is a technique which helps in increasing the amount of data at hand and this is done by augmenting the training data using various techniques like flipping, rotating, Zooming, Brightening, etc. In this work, we create an Image Captioning model and check its robustness on all the major types of Image Augmentation techniques. The results show the fuzziness of the model while working with the same image but a different augmentation technique and because of this, a different caption is produced every time a different data augmentation technique is employed. We also show the change in the performance of the model after applying these augmentation techniques. Flickr8k dataset is used for this study along with BLEU score as the evaluation metric for the image captioning model.

Download Full-text

Image Caption Generation Using Deep Learning

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6si10.5355 ◽

2018 ◽

Vol 06 (10) ◽

pp. 53-55

Author(s):

Sailee P. Pawaskar ◽

J. A. Laxminarayana

Keyword(s):

Deep Learning ◽

Image Caption Generation ◽

Image Caption

Download Full-text

Hybrid Feature and Sequence Extractor based Deep Learning Model for Image Caption Generation

10.1109/icccnt51525.2021.9579897 ◽

2021 ◽

Author(s):

Rohit Kushwaha ◽

Anupam Biswas

Keyword(s):

Deep Learning ◽

Learning Model ◽

Deep Learning Model ◽

Image Caption Generation ◽

Image Caption

Download Full-text

Deep learning for ultrasound image caption generation based on object detection

Neurocomputing ◽

10.1016/j.neucom.2018.11.114 ◽

2020 ◽

Vol 392 ◽

pp. 132-141 ◽

Cited By ~ 3

Author(s):

Xianhua Zeng ◽

Li Wen ◽

Banggui Liu ◽

Xiaojun Qi

Keyword(s):

Deep Learning ◽

Object Detection ◽

Ultrasound Image ◽

Image Caption Generation ◽

Image Caption

Download Full-text

Image Caption Generation Using Deep Learning Technique

2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) ◽

10.1109/iccubea.2018.8697360 ◽

2018 ◽

Cited By ~ 1

Author(s):

Chetan Amritkar ◽

Vaishali Jabade

Keyword(s):

Deep Learning ◽

Learning Technique ◽

Image Caption Generation ◽

Image Caption

Download Full-text

Deep image captioning using an ensemble of CNN and LSTM based deep neural networks

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189415 ◽

2020 ◽

pp. 1-9

Author(s):

Jafar A. Alzubi ◽

Rachna Jain ◽

Preeti Nagrath ◽

Suresh Satapathy ◽

Soham Taneja ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Ensemble Model ◽

Image Captioning ◽

Deep Image ◽

The Given ◽

Deep Learning Model ◽

Image Caption Generation ◽

Image Caption

The paper is concerned with the problem of Image Caption Generation. The purpose of this paper is to create a deep learning model to generate captions for a given image by decoding the information available in the image. For this purpose, a custom ensemble model was used, which consisted of an Inception model and a 2-layer LSTM model, which were then concatenated and dense layers were added. The CNN part encodes the images and the LSTM part derives insights from the given captions. For comparative study, GRU and Bi-directional LSTM based models are also used for the caption generation to analyze and compare the results. For the training of images, the dataset used is the flickr8k dataset and for word embedding, dataset used is GloVe Embeddings to generate word vectors for each word in the sequence. After vectorization, Images are then fed into the trained model and inferred to create new auto-generated captions. Evaluation of the results was done using Bleu Scores. The Bleu-4 score obtained in the paper is 55.8%, and using LSTM, GRU, and Bi-directional LSTM respectively.

Download Full-text

Conceptual Review of Deep Learning Methods for Automatic Image Caption Generation

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.987991 ◽

2019 ◽

Vol 7 (3) ◽

pp. 987-991

Author(s):

S. H. Patel ◽

N.M. Patel ◽

D.G. Thakore

Keyword(s):

Deep Learning ◽

Learning Methods ◽

Image Caption Generation ◽

Image Caption

Download Full-text

Deep Learning based Automatic Image Caption Generation

2019 Global Conference for Advancement in Technology (GCAT) ◽

10.1109/gcat47503.2019.8978293 ◽

2019 ◽

Cited By ~ 1

Author(s):

Varsha Kesavan ◽

Vaidehi Muley ◽

Megha Kolhekar

Keyword(s):

Deep Learning ◽

Image Caption Generation ◽

Image Caption

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text