Image Caption Generation Using Deep Learning

2018 ◽  
Vol 06 (10) ◽  
pp. 53-55
Author(s):  
Sailee P. Pawaskar ◽  
J. A. Laxminarayana
2020 ◽  
Vol 392 ◽  
pp. 132-141 ◽  
Author(s):  
Xianhua Zeng ◽  
Li Wen ◽  
Banggui Liu ◽  
Xiaojun Qi

Author(s):  
Hamza Aldabbas ◽  
Muhammad Asad ◽  
Mohammad Hashem ◽  
Kaleem Razzaq ◽  
Muhammad Zubair

Author(s):  
Kota Akshith Reddy ◽  
◽  
Satish C J ◽  
Jahnavi Polsani ◽  
Teja Naveen Chintapalli ◽  
...  

Automatic Image Caption Generation is one of the core problems in the field of Deep Learning. Data Augmentation is a technique which helps in increasing the amount of data at hand and this is done by augmenting the training data using various techniques like flipping, rotating, Zooming, Brightening, etc. In this work, we create an Image Captioning model and check its robustness on all the major types of Image Augmentation techniques. The results show the fuzziness of the model while working with the same image but a different augmentation technique and because of this, a different caption is produced every time a different data augmentation technique is employed. We also show the change in the performance of the model after applying these augmentation techniques. Flickr8k dataset is used for this study along with BLEU score as the evaluation metric for the image captioning model.


Author(s):  
Jafar A. Alzubi ◽  
Rachna Jain ◽  
Preeti Nagrath ◽  
Suresh Satapathy ◽  
Soham Taneja ◽  
...  

The paper is concerned with the problem of Image Caption Generation. The purpose of this paper is to create a deep learning model to generate captions for a given image by decoding the information available in the image. For this purpose, a custom ensemble model was used, which consisted of an Inception model and a 2-layer LSTM model, which were then concatenated and dense layers were added. The CNN part encodes the images and the LSTM part derives insights from the given captions. For comparative study, GRU and Bi-directional LSTM based models are also used for the caption generation to analyze and compare the results. For the training of images, the dataset used is the flickr8k dataset and for word embedding, dataset used is GloVe Embeddings to generate word vectors for each word in the sequence. After vectorization, Images are then fed into the trained model and inferred to create new auto-generated captions. Evaluation of the results was done using Bleu Scores. The Bleu-4 score obtained in the paper is 55.8%, and using LSTM, GRU, and Bi-directional LSTM respectively.


Author(s):  
Feng Chen ◽  
Songxian Xie ◽  
Xinyi Li ◽  
Jintao Tang ◽  
Kunyuan Pang ◽  
...  

Author(s):  
Xinyuan Qi ◽  
Zhiguo Cao ◽  
Yang Xiao ◽  
Jian Wang ◽  
Chao Zhang

Sign in / Sign up

Export Citation Format

Share Document