Image Captioning using CNN and LSTM

Anish Banda

doi:10.22214/ijraset.2021.37846

Image Captioning using CNN and LSTM

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37846 ◽

2021 ◽

Vol 9 (8) ◽

pp. 2666-2669

Author(s):

Anish Banda

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Image Description ◽

Image Captioning ◽

Training Images ◽

Long Short Term Memory ◽

Standard Calculation

Abstract: In the model we proposed, we examine the deep neural networks-based image caption generation technique. We give image as input to the model, the technique give output in three different forms i.e., sentence in three different languages describing the image, mp3 audio file and an image file is also generated. In this model, we use the techniques of both computer vision and natural language processing. We are aiming to develop a model using the techniques of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to build a model to generate a Caption. Target image is compared with the training images, we have a large dataset containing the training images, this is done by convolutional neural network. This model generates a decent description utilizing the trained data. To extract features from images we need encoder, we use CNN as encoder. To decode the description of image generated we use LSTM. To evaluate the accuracy of generated caption we use BLEU metric algorithm. It grades the quality of content generated. Performance is calculated by the standard calculation matrices. Keywords: CNN, RNN, LSTM, BLEU score, encoder, decoder, captions, image description.

Download Full-text

Estimation of municipal solid waste amount based on one-dimension convolutional neural network and long short-term memory with attention mechanism model: A case study of Shanghai

The Science of The Total Environment ◽

10.1016/j.scitotenv.2021.148088 ◽

2021 ◽

Vol 791 ◽

pp. 148088

Author(s):

Kunsen Lin ◽

Youcai Zhao ◽

Lu Tian ◽

Chunlong Zhao ◽

Meilan Zhang ◽

...

Keyword(s):

Neural Network ◽

Municipal Solid Waste ◽

Convolutional Neural Network ◽

Short Term Memory ◽

One Dimension ◽

Short Term ◽

Term Memory ◽

Mechanism Model ◽

Long Short Term Memory

Download Full-text

Multiple Pedestrians and Vehicles Tracking in Aerial Imagery Using a Convolutional Neural Network

Remote Sensing ◽

10.3390/rs13101953 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1953

Author(s):

Seyed Majid Azimi ◽

Maximilian Kraus ◽

Reza Bahmanyar ◽

Peter Reinartz

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Object Tracking ◽

Short Term Memory ◽

Aerial Imagery ◽

Future Research ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

In this paper, we address various challenges in multi-pedestrian and vehicle tracking in high-resolution aerial imagery by intensive evaluation of a number of traditional and Deep Learning based Single- and Multi-Object Tracking methods. We also describe our proposed Deep Learning based Multi-Object Tracking method AerialMPTNet that fuses appearance, temporal, and graphical information using a Siamese Neural Network, a Long Short-Term Memory, and a Graph Convolutional Neural Network module for more accurate and stable tracking. Moreover, we investigate the influence of the Squeeze-and-Excitation layers and Online Hard Example Mining on the performance of AerialMPTNet. To the best of our knowledge, we are the first to use these two for regression-based Multi-Object Tracking. Additionally, we studied and compared the L1 and Huber loss functions. In our experiments, we extensively evaluate AerialMPTNet on three aerial Multi-Object Tracking datasets, namely AerialMPT and KIT AIS pedestrian and vehicle datasets. Qualitative and quantitative results show that AerialMPTNet outperforms all previous methods for the pedestrian datasets and achieves competitive results for the vehicle dataset. In addition, Long Short-Term Memory and Graph Convolutional Neural Network modules enhance the tracking performance. Moreover, using Squeeze-and-Excitation and Online Hard Example Mining significantly helps for some cases while degrades the results for other cases. In addition, according to the results, L1 yields better results with respect to Huber loss for most of the scenarios. The presented results provide a deep insight into challenges and opportunities of the aerial Multi-Object Tracking domain, paving the way for future research.

Download Full-text

Sentiment Analysis on Twitter Data by Using Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM)

Wireless Personal Communications ◽

10.1007/s11277-021-08580-3 ◽

2021 ◽

Author(s):

Usha Devi Gandhi ◽

Priyan Malarvizhi Kumar ◽

Gokulnath Chandra Babu ◽

Gayathri Karthick

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Sentiment Analysis ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Twitter Data ◽

Long Short Term Memory

Download Full-text

Sentence similarity evaluation using Sent2Vec and siamese neural network with parallel structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189593 ◽

2021 ◽

pp. 1-10

Author(s):

Hye-Jeong Song ◽

Tak-Sung Heo ◽

Jong-Dae Kim ◽

Chan-Young Park ◽

Yu-Seop Kim

Keyword(s):

Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Parallel Structure ◽

Short Term ◽

Similarity Estimation ◽

Accurate Judgment ◽

Proposed Model ◽

Sentence Similarity ◽

Long Short Term Memory

Sentence similarity evaluation is a significant task used in machine translation, classification, and information extraction in the field of natural language processing. When two sentences are given, an accurate judgment should be made whether the meaning of the sentences is equivalent even if the words and contexts of the sentences are different. To this end, existing studies have measured the similarity of sentences by focusing on the analysis of words, morphemes, and letters. To measure sentence similarity, this study uses Sent2Vec, a sentence embedding, as well as morpheme word embedding. Vectors representing words are input to the 1-dimension convolutional neural network (1D-CNN) with various sizes of kernels and bidirectional long short-term memory (Bi-LSTM). Self-attention is applied to the features transformed through Bi-LSTM. Subsequently, vectors undergoing 1D-CNN and self-attention are converted through global max pooling and global average pooling to extract specific values, respectively. The vectors generated through the above process are concatenated to the vector generated through Sent2Vec and are represented as a single vector. The vector is input to softmax layer, and finally, the similarity between the two sentences is determined. The proposed model can improve the accuracy by up to 5.42% point compared with the conventional sentence similarity estimation models.

Download Full-text

1D Convolutional Neural Network with Long Short-Term Memory for Human Activity Recognition

10.1109/iicaiet51634.2021.9573979 ◽

2021 ◽

Author(s):

Jia Xin Goh ◽

Kian Ming Lim ◽

Chin Poo Lee

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Activity Recognition ◽

Human Activity ◽

Short Term Memory ◽

Human Activity Recognition ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Gait-Based Human Identification by Combining Shallow Convolutional Neural Network-Stacked Long Short-Term Memory and Deep Convolutional Neural Network

IEEE Access ◽

10.1109/access.2018.2876890 ◽

2018 ◽

Vol 6 ◽

pp. 63164-63186 ◽

Cited By ~ 12

Author(s):

Ganbayar Batchuluun ◽

Hyo Sik Yoon ◽

Jin Kyu Kang ◽

Kang Ryoung Park

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Human Identification ◽

Deep Convolutional Neural Network ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Pemanfaatan Asynchronous Advantage Actor-Critic Dalam Pembuatan AI Game Bot Pada Game Arcade

Journal of Intelligent System and Computation ◽

10.52985/insyst.v1i2.82 ◽

2019 ◽

Vol 1 (2) ◽

pp. 74-84

Author(s):

Evan Kusuma Susanto ◽

Yosi Kristian

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Reinforcement Learning ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Trial And Error ◽

Short Term ◽

Term Memory ◽

Memory Network ◽

Long Short Term Memory

Asynchronous Advantage Actor-Critic (A3C) adalah sebuah algoritma deep reinforcement learning yang dikembangkan oleh Google DeepMind. Algoritma ini dapat digunakan untuk menciptakan sebuah arsitektur artificial intelligence yang dapat menguasai berbagai jenis game yang berbeda melalui trial and error dengan mempelajari tempilan layar game dan skor yang diperoleh dari hasil tindakannya tanpa campur tangan manusia. Sebuah network A3C terdiri dari Convolutional Neural Network (CNN) di bagian depan, Long Short-Term Memory Network (LSTM) di tengah, dan sebuah Actor-Critic network di bagian belakang. CNN berguna sebagai perangkum dari citra output layar dengan mengekstrak fitur-fitur yang penting yang terdapat pada layar. LSTM berguna sebagai pengingat keadaan game sebelumnya. Actor-Critic Network berguna untuk menentukan tindakan terbaik untuk dilakukan ketika dihadapkan dengan suatu kondisi tertentu. Dari hasil percobaan yang dilakukan, metode ini cukup efektif dan dapat mengalahkan pemain pemula dalam memainkan 5 game yang digunakan sebagai bahan uji coba.

Download Full-text

A hybrid of convolutional neural network and long short-term memory network approach to predictive maintenance

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i1.pp721-730 ◽

2022 ◽

Vol 12 (1) ◽

pp. 721

Author(s):

Ahmed Nasser ◽

Huthaifa AL-Khazraji

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Predictive Maintenance ◽

Fault Prediction ◽

Gradient Boosting ◽

Short Term ◽

Term Memory ◽

Memory Network ◽

Long Short Term Memory

<p>Predictive maintenance (PdM) is a successful strategy used to reduce cost by minimizing the breakdown stoppages and production loss. The massive amount of data that results from the integration between the physical and digital systems of the production process makes it possible for deep learning (DL) algorithms to be applied and utilized for fault prediction and diagnosis. This paper presents a hybrid convolutional neural network based and long short-term memory network (CNN-LSTM) approach to a predictive maintenance problem. The proposed CNN-LSTM approach enhances the predictive accuracy and also reduces the complexity of the model. To evaluate the proposed model, two comparisons with regular LSTM and gradient boosting decision tree (GBDT) methods using a freely available dataset have been made. The PdM model based on CNN-LSTM method demonstrates better prediction accuracy compared to the regular LSTM, where the average F-Score increases form 93.34% in the case of regular LSTM to 97.48% for the proposed CNN-LSTM. Compared to the related works the proposed hybrid CNN-LSTM PdM approach achieved better results in term of accuracy.</p>

Download Full-text

Convolutional Neural Network Audio Classifier for Alarm Sound Detection

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8866.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 4554-4557

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Sound Recognition ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Differential Network ◽

Sound Detection ◽

Long Short Term Memory ◽

Lstm Network

Neural Networks (ANN) has evolved through many stages in the last three decades with many researchers contributing in this challenging field. With the power of math complex problems can also be solved by ANNs. ANNs like Convolutional Neural Network (CNN), Deep Neural network, Generative Adversarial Network (GAN), Long Short Term Memory (LSTM) network, Recurrent Neural Network (RNN), Ordinary Differential Network etc., are playing promising roles in many MNCs and IT industries for their predictions and accuracy. In this paper, Convolutional Neural Network is used for prediction of Beep sounds in high noise levels. Based on Supervised Learning, the research is developed the best CNN architecture for Beep sound recognition in noisy situations. The proposed method gives better results with an accuracy of 96%. The prototype is tested with few architectures for the training and test data out of which a two layer CNN classifier predictions were the best.

Download Full-text

Online reliability time series prediction via convolutional neural network and long short term memory for service-oriented systems

Knowledge-Based Systems ◽

10.1016/j.knosys.2018.07.006 ◽

2018 ◽

Vol 159 ◽

pp. 132-147 ◽

Cited By ~ 17

Author(s):

Hongbing Wang ◽

Zhengping Yang ◽

Qi Yu ◽

Tianjing Hong ◽

Xin Lin

Keyword(s):

Neural Network ◽

Time Series ◽

Convolutional Neural Network ◽

Short Term Memory ◽

Time Series Prediction ◽

Short Term ◽

Term Memory ◽

Service Oriented ◽

Long Short Term Memory

Download Full-text