Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning

Despite the fact that attribute-based approaches and attention-based approaches have been proven to be effective in image captioning, most attribute-based approaches simply predict attributes independently without taking the co-occurrence dependencies among attributes into account. Besides, most attention-based captioning models directly leverage the feature map extracted from CNN, in which many features may be redundant in relation to the image content. In this paper, we focus on training a good attribute-inference model via the recurrent neural network (RNN) for image captioning, where the co-occurrence dependencies among attributes can be maintained. The uniqueness of our inference model lies in the usage of a RNN with the visual attention mechanism to \textit{observe} the image before generating captions. Additionally, it is noticed that compact and attribute-driven features will be more useful for the attention-based captioning model. To this end, we extract the context feature for each attribute, and guide the captioning model adaptively attend to these context features. We verify the effectiveness and superiority of the proposed approach over the other captioning approaches by conducting massive experiments and comparisons on MS COCO image captioning dataset.

Download Full-text

Integrating Scene Semantic Knowledge into Image Captioning

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3439734 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-22

Author(s):

Haiyang Wei ◽

Zhixin Li ◽

Feicheng Huang ◽

Canlong Zhang ◽

Huifang Ma ◽

...

Keyword(s):

Visual Attention ◽

Visual Information ◽

Semantic Information ◽

Language Model ◽

Semantic Knowledge ◽

Attention Mechanism ◽

Image Captioning ◽

Attention Model ◽

Visual Attention Mechanism ◽

Focus Intensity

Most existing image captioning methods use only the visual information of the image to guide the generation of captions, lack the guidance of effective scene semantic information, and the current visual attention mechanism cannot adjust the focus intensity on the image. In this article, we first propose an improved visual attention model. At each timestep, we calculated the focus intensity coefficient of the attention mechanism through the context information of the model, then automatically adjusted the focus intensity of the attention mechanism through the coefficient to extract more accurate visual information. In addition, we represented the scene semantic knowledge of the image through topic words related to the image scene, then added them to the language model. We used the attention mechanism to determine the visual information and scene semantic information that the model pays attention to at each timestep and combined them to enable the model to generate more accurate and scene-specific captions. Finally, we evaluated our model on Microsoft COCO (MSCOCO) and Flickr30k standard datasets. The experimental results show that our approach generates more accurate captions and outperforms many recent advanced models in various evaluation metrics.

Download Full-text

Developing a seq2seq neural network using visual attention to transform mathematical expressions from images to LaTeX.

Doklady BGUIR ◽

10.35596/1729-7648-2021-19-8-40-44 ◽

2022 ◽

Vol 19 (8) ◽

pp. 40-44

Author(s):

P. A. Vyaznikov ◽

I. D. Kotilevets

Keyword(s):

Neural Network ◽

Visual Attention ◽

Network Architecture ◽

Network Models ◽

Neural Network Architecture ◽

Neural Network Models ◽

The Neural Network ◽

Mathematical Expressions ◽

Series Of Experiments ◽

Visual Attention Mechanism

The paper presents the methods of development and the results of research on the effectiveness of the seq2seq neural network architecture using Visual Attention mechanism to solve the im2latex problem. The essence of the task is to create a neural network capable of converting an image with mathematical expressions into a similar expression in the LaTeX markup language. This problem belongs to the Image Captioning type: the neural network scans the image and, based on the extracted features, generates a description in natural language. The proposed solution uses the seq2seq architecture, which contains the Encoder and Decoder mechanisms, as well as Bahdanau Attention. A series of experiments was conducted on training and measuring the effectiveness of several neural network models.

Download Full-text

A Recurrent Neural Network Approach to Image Captioning in Braille for Blind-Deaf People

2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON) ◽

10.1109/spicscon48833.2019.9065144 ◽

2019 ◽

Author(s):

Sameia Zaman ◽

M. Abid Abrar ◽

M. Muntasir Hassan ◽

A.N.M. Nafiul Islam

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Deaf People ◽

Network Approach ◽

Image Captioning ◽

Neural Network Approach

Download Full-text

Deep Learning Model for Real-Time Prediction of Intradialytic Hypotension

Clinical Journal of the American Society of Nephrology ◽

10.2215/cjn.09280620 ◽

2021 ◽

pp. CJN.09280620

Author(s):

Hojun Lee ◽

Donghwan Yun ◽

Jayeon Yoo ◽

Kiyoon Yoo ◽

Yong Chul Kim ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Receiver Operating Characteristic ◽

Network Model ◽

Recurrent Neural Network ◽

Neural Network Model ◽

Operating Characteristic ◽

The Other ◽

Intradialytic Hypotension ◽

Deep Learning Model

Background and objectivesIntradialytic hypotension has high clinical significance. However, predicting it using conventional statistical models may be difficult because several factors have interactive and complex effects on the risk. Herein, we applied a deep learning model (recurrent neural network) to predict the risk of intradialytic hypotension using a timestamp-bearing dataset.Design, setting, participants, & measurementsWe obtained 261,647 hemodialysis sessions with 1,600,531 independent timestamps (i.e., time-varying vital signs) and randomly divided them into training (70%), validation (5%), calibration (5%), and testing (20%) sets. Intradialytic hypotension was defined when nadir systolic BP was <90 mm Hg (termed intradialytic hypotension 1) or when a decrease in systolic BP ≥20 mm Hg and/or a decrease in mean arterial pressure ≥10 mm Hg on the basis of the initial BPs (termed intradialytic hypotension 2) or prediction time BPs (termed intradialytic hypotension 3) occurred within 1 hour. The area under the receiver operating characteristic curves, the area under the precision-recall curves, and F1 scores obtained using the recurrent neural network model were compared with those obtained using multilayer perceptron, Light Gradient Boosting Machine, and logistic regression models.ResultsThe recurrent neural network model for predicting intradialytic hypotension 1 achieved an area under the receiver operating characteristic curve of 0.94 (95% confidence intervals, 0.94 to 0.94), which was higher than those obtained using the other models (P<0.001). The recurrent neural network model for predicting intradialytic hypotension 2 and intradialytic hypotension 3 achieved area under the receiver operating characteristic curves of 0.87 (interquartile range, 0.87–0.87) and 0.79 (interquartile range, 0.79–0.79), respectively, which were also higher than those obtained using the other models (P≤0.001). The area under the precision-recall curve and F1 score were higher using the recurrent neural network model than they were using the other models. The recurrent neural network models for intradialytic hypotension were highly calibrated.ConclusionsOur deep learning model can be used to predict the real-time risk of intradialytic hypotension.

Download Full-text

Objects Classification by Learning-Based Visual Saliency Model and Convolutional Neural Network

Computational Intelligence and Neuroscience ◽

10.1155/2016/7942501 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Na Li ◽

Xinbo Zhao ◽

Yongjia Yang ◽

Xiaochun Zou

Keyword(s):

Neural Network ◽

Deep Learning ◽

Visual Attention ◽

Convolutional Neural Network ◽

Visual Information ◽

Local Features ◽

Classification Method ◽

Semantic Features ◽

Visual Attention Model ◽

Attention Model

Humans can easily classify different kinds of objects whereas it is quite difficult for computers. As a hot and difficult problem, objects classification has been receiving extensive interests with broad prospects. Inspired by neuroscience, deep learning concept is proposed. Convolutional neural network (CNN) as one of the methods of deep learning can be used to solve classification problem. But most of deep learning methods, including CNN, all ignore the human visual information processing mechanism when a person is classifying objects. Therefore, in this paper, inspiring the completed processing that humans classify different kinds of objects, we bring forth a new classification method which combines visual attention model and CNN. Firstly, we use the visual attention model to simulate the processing of human visual selection mechanism. Secondly, we use CNN to simulate the processing of how humans select features and extract the local features of those selected areas. Finally, not only does our classification method depend on those local features, but also it adds the human semantic features to classify objects. Our classification method has apparently advantages in biology. Experimental results demonstrated that our method made the efficiency of classification improve significantly.

Download Full-text

Image Content Extraction Using a Bottom-Up Visual Attention Model

10.1109/icds.2009.32 ◽

2009 ◽

Cited By ~ 1

Author(s):

Ionut Pirnog ◽

Cristina Oprea ◽

Constantin Paleologu

Keyword(s):

Visual Attention ◽

Image Content ◽

Bottom Up ◽

Content Extraction ◽

Visual Attention Model ◽

Attention Model

Download Full-text

An Improved Neural Network Model Based on Visual Attention Mechanism for Object Detection

Proceedings of the 2019 International Conference on Big Data, Electronics and Communication Engineering (BDECE 2019) ◽

10.2991/acsr.k.191223.035 ◽

2019 ◽

Author(s):

Zeren Jiang

Keyword(s):

Neural Network ◽

Visual Attention ◽

Object Detection ◽

Network Model ◽

Neural Network Model ◽

Attention Mechanism ◽

Model Based ◽

Visual Attention Mechanism

Download Full-text

Predicting Electric Power Energy, Using Recurrent Neural Network Forecasting Model

Journal of University of Human Development ◽

10.21928/juhd.v4n2y2018.pp53-60 ◽

2018 ◽

Vol 4 (2) ◽

pp. 53

Author(s):

Nawzad M. Ahmed ◽

Ayad O. Hamdeen

Keyword(s):

Neural Network ◽

Time Series ◽

Electric Power ◽

Recurrent Neural Network ◽

The Other ◽

Optimal Time ◽

Main Role ◽

Complex Data ◽

Suggested Model ◽

Great Ability

Electricity is counted as a one of the most important energy sources in the world. It has played a main role in developing several sectors. In this study two types of electricity variables have been used, the first was the demand on power energy, and the second was the consumption or energy load in Sulaimani city. The main goal of the study was to construct an analytic model of the recurrent neural network (RNN) for both variables. This model has a great ability in detecting the complex patterns for the data of a time series, which is most suitable for the data under consideration. This model is also more sensitive and reliable than the other artificial neutral network (ANN), so it can deal with more complex data that might be chaotic, seismic….etc. this model can also deal with nonlinear data which are mostly found in time series, and it deals with them differently compared to the other models. This research determined and defined the best model of RNN for electricity demand and consumption to be run in two levels. The first level is to predict the complexity of the suggested model (1-5-10-1) with the error function as (MSE: mean square error, AIC, and R2: coefficient of determination). The second level uses the suggested model to forecast the demand on electric power energy and the value of each unit. Another result of this study is to determine the suitable algorithm that can deal with such complex data. The algorithm (Levenberg-Marquardt) was found to be the most reliable and has the most optimal time to give accurate and reliable results in this study.

Download Full-text