scholarly journals Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention

2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Yan Chu ◽  
Xiao Yue ◽  
Lei Yu ◽  
Mikhailov Sergei ◽  
Zhengkui Wang

Captioning the images with proper descriptions automatically has become an interesting and challenging problem. In this paper, we present one joint model AICRL, which is able to conduct the automatic image captioning based on ResNet50 and LSTM with soft attention. AICRL consists of one encoder and one decoder. The encoder adopts ResNet50 based on the convolutional neural network, which creates an extensive representation of the given image by embedding it into a fixed length vector. The decoder is designed with LSTM, a recurrent neural network and a soft attention mechanism, to selectively focus the attention over certain parts of an image to predict the next sentence. We have trained AICRL over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics like BLEU, METEROR, and CIDEr. Our experimental results indicate that AICRL is effective in generating captions for the images.

Author(s):  
Qiannan Zhu ◽  
Xiaofei Zhou ◽  
Zeliang Song ◽  
Jianlong Tan ◽  
Li Guo

With the rapid information explosion of news, making personalized news recommendation for users becomes an increasingly challenging problem. Many existing recommendation methods that regard the recommendation procedure as the static process, have achieved better recommendation performance. However, they usually fail with the dynamic diversity of news and user’s interests, or ignore the importance of sequential information of user’s clicking selection. In this paper, taking full advantages of convolution neural network (CNN), recurrent neural network (RNN) and attention mechanism, we propose a deep attention neural network DAN for news recommendation. Our DAN model presents to use attention-based parallel CNN for aggregating user’s interest features and attention-based RNN for capturing richer hidden sequential features of user’s clicks, and combines these features for new recommendation. We conduct experiment on real-world news data sets, and the experimental results demonstrate the superiority and effectiveness of our proposed DAN model.


2019 ◽  
Vol 11 (12) ◽  
pp. 247
Author(s):  
Xin Zhou ◽  
Peixin Dong ◽  
Jianping Xing ◽  
Peijia Sun

Accurate prediction of bus arrival times is a challenging problem in the public transportation field. Previous studies have shown that to improve prediction accuracy, more heterogeneous measurements provide better results. So what other factors should be added into the prediction model? Traditional prediction methods mainly use the arrival time and the distance between stations, but do not make full use of dynamic factors such as passenger number, dwell time, bus driving efficiency, etc. We propose a novel approach that takes full advantage of dynamic factors. Our approach is based on a Recurrent Neural Network (RNN). The experimental results indicate that a variety of prediction algorithms (such as Support Vector Machine, Kalman filter, Multilayer Perceptron, and RNN) have significantly improved performance after using dynamic factors. Further, we introduce RNN with an attention mechanism to adaptively select the most relevant input factors. Experiments demonstrate that the prediction accuracy of RNN with an attention mechanism is better than RNN with no attention mechanism when there are heterogeneous input factors. The experimental results show the superior performances of our approach on the data set provided by Jinan Public Transportation Corporation.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6350
Author(s):  
Bin Wu ◽  
Shibo Yuan ◽  
Peng Li ◽  
Zehuan Jing ◽  
Shao Huang ◽  
...  

As the real electromagnetic environment grows complex and the quantity of radar signals turns massive, traditional methods, which require a large amount of prior knowledge, are time-consuming and ineffective for radar emitter signal recognition. In recent years, convolutional neural network (CNN) has shown its superiority in recognition so that experts have applied it in radar signal recognition. However, in the field of radar emitter signal recognition, the data are usually one-dimensional (1-D), which takes more time and storage space than by using the original two-dimensional CNN model directly. Moreover, the features extracted from convolutional layers are redundant so that the recognition accuracy is low. In order to solve these problems, this paper proposes a novel one-dimensional convolutional neural network with an attention mechanism (CNN-1D-AM) to extract more discriminative features and recognize the radar emitter signals. In this method, features of the given 1-D signal sequences are extracted directly by the 1-D convolutional layers and are weighted in accordance with their importance to recognition by the attention unit. The experiments based on seven different radar emitter signals indicate that the proposed CNN-1D-AM has the advantages of high accuracy and superior performance in radar emitter signal recognition.


CONVERTER ◽  
2021 ◽  
pp. 579-590
Author(s):  
Weirong Xiu

Convolutional neural network based on attention mechanism and a bidirectional independent recurrent neural network tandem joint algorithm (CATIR) are proposed. In natural language processing related technologies, word vector features are extracted based on URLs, and the extracted URL information features and host information features are merged. The proposed CATIR algorithm uses CNN (Convolutional Neural Network) to obtain the deep local features in the data, uses the Attention mechanism to adjust the weights, and uses IndRNN (Independent Recurrent Neural Network) to obtain the global features in the data. The experimental results shows that the CATIR algorithm has significantly improved the accuracy of malicious URL detection based on traditional algorithms to 96.9%.


Author(s):  
Xiaoyang Zheng ◽  
Zeyu Ye ◽  
Jinliang Wu

As a key part of modern industrial machinery, there has been a lot of fault diagnosis methods for gearbox. However, traditional fault diagnosis methods suffer from dependence on prior knowledge. This paper proposed an end-to-end method based on convolutional neural network (CNN), Bidirectional gated recurrent unit (BiGRU), and Attention Mechanism. Among them, the application of BiGRU not only made perfect use of the time sequence of signal, but also saved computing resources more than the same type of networks because of the low amount of calculation. In order to verify the effectiveness and generalization performance of the proposed method, experiments are carried out on two datasets, and the accuracy is calculated by the ten-fold crossvalidation. Compared with the existing fault diagnosis methods, the experimental results show that the proposed model has higher accuracy.


Author(s):  
Anish Banda

Abstract: In the model we proposed, we examine the deep neural networks-based image caption generation technique. We give image as input to the model, the technique give output in three different forms i.e., sentence in three different languages describing the image, mp3 audio file and an image file is also generated. In this model, we use the techniques of both computer vision and natural language processing. We are aiming to develop a model using the techniques of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to build a model to generate a Caption. Target image is compared with the training images, we have a large dataset containing the training images, this is done by convolutional neural network. This model generates a decent description utilizing the trained data. To extract features from images we need encoder, we use CNN as encoder. To decode the description of image generated we use LSTM. To evaluate the accuracy of generated caption we use BLEU metric algorithm. It grades the quality of content generated. Performance is calculated by the standard calculation matrices. Keywords: CNN, RNN, LSTM, BLEU score, encoder, decoder, captions, image description.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Haiyan Wang ◽  
Kaiming Yao ◽  
Jian Luo ◽  
Yi Lin

Sequential recommendation system has received widespread attention due to its good performance in solving data overload. However, most of the sequential recommendation methods assume that user’s preferences only depend on specific items in the current sequence and do not consider user’s implicit interests. In addition, most of the previous works mainly focus on exploiting relationships between items in the sequence and seldom consider quantifying the degree of preferences for items implied by user’s different behaviors. In order to address these above two problems, we propose an implicit preference-aware sequential recommendation method based on knowledge graph (IPAKG). Firstly, this method introduces knowledge graph to exploit user’s implicit preference representations. Secondly, we integrate recurrent neural network and attention mechanism to capture user’s evolving interests and relationships between different items in the sequence. Thirdly, we introduce the concept of behavior intensity and design a behavior activation unit to exploit the degree of preferences for items implied by a user’s different behaviors. Through the activation unit, the user’s preferences on different items are further quantified. Finally, we conduct experiments on an Amazon electronics dataset and Tmall dataset to evaluate the performance of our method. Experimental results demonstrate that our proposed method has better performance than those baseline methods.


Electronics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 81
Author(s):  
Jianbin Xiong ◽  
Dezheng Yu ◽  
Shuangyin Liu ◽  
Lei Shu ◽  
Xiaochan Wang ◽  
...  

Plant phenotypic image recognition (PPIR) is an important branch of smart agriculture. In recent years, deep learning has achieved significant breakthroughs in image recognition. Consequently, PPIR technology that is based on deep learning is becoming increasingly popular. First, this paper introduces the development and application of PPIR technology, followed by its classification and analysis. Second, it presents the theory of four types of deep learning methods and their applications in PPIR. These methods include the convolutional neural network, deep belief network, recurrent neural network, and stacked autoencoder, and they are applied to identify plant species, diagnose plant diseases, etc. Finally, the difficulties and challenges of deep learning in PPIR are discussed.


Sign in / Sign up

Export Citation Format

Share Document