A novel automatic image caption generation using bidirectional long-short term memory framework

Automatically describing the content of an image is an interesting and challenging task in artificial intelligence. In this paper, an enhanced image captioning model—including object detection, color analysis, and image captioning—is proposed to automatically generate the textual descriptions of images. In an encoder–decoder model for image captioning, VGG16 is used as an encoder and an LSTM (long short-term memory) network with attention is used as a decoder. In addition, Mask R-CNN with OpenCV is used for object detection and color analysis. The integration of the image caption and color recognition is then performed to provide better descriptive details of images. Moreover, the generated textual sentence is converted into speech. The validation results illustrate that the proposed method can provide more accurate description of images.

Download Full-text

Image Caption Generator using CNN-LSTM Deep Neural Network

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35663 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 2968-2974

Author(s):

Vaibhav Julakanti

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Short Term Memory ◽

The Other ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

The Individual ◽

General Climate ◽

Image Caption

Captioning pictures naturally is one of the significant aspects of the human visual framework. There are numerous benefits if there is a model which consequently inscription the scenes or climate encompassed by them and offers back the subtitle as a plain book. In this paper, we present a model dependent on CNN-LSTM neural organizations which naturally identifies the items in the pictures and creates inscriptions for the pictures. It utilizes Inception v3 pre-prepared model to play out the errand of distinguishing items and utilizations LSTM to produce the subtitles. It utilizes the method of Transfer Learning on pre-prepared models for the undertaking of item Detection. This model can perform two activities. The first is to recognize objects in the picture utilizing Convolutional Neural Networks and the other is to subtitle the pictures utilizing RNN based LSTM (Long Short Term Memory). It additionally utilizes a bar look for anticipating the inscriptions for example choosing the best words from the accessible corps. In this, we take top k expectations, feed them again in the model and afterward sort them utilizing the probabilities returned by the model. A portion of the product prerequisites of this undertaking is Tensor Flow V2.0, pandas, NumPy, pickle, PIL, OpenCV. A little GUI is made to transfer the picture to the model to create the inscription. The fundamental use instance of this undertaking is to help outwardly debilitated to comprehend the general climate and act as per that. The inscription age is one of the intriguing and centred fields of Artificial Intelligence which has numerous difficulties to survive. Inscription age includes different complex situations beginning from picking the dataset, preparing the model, approving the model, making pre-prepared models to test the pictures, identifying the pictures lastly producing the individual picture-based subtitles.

Download Full-text