Vision and Language Navigation using Multi-head Attention Mechanism

2021 ◽

Author(s):

Chenyu Gao ◽

Qi Zhu ◽

Peng Wang ◽

Qi Wu

Keyword(s):

Question Answering ◽

Original Model ◽

Attention Mechanism ◽

Visual Question Answering ◽

Question Types ◽

Different Types ◽

Different Levels ◽

Vision And Language

Vision-and-Language (VL) pre-training has shown great potential on many related downstream tasks, such as Visual Question Answering (VQA), one of the most popular problems in the VL field. All of these pre-trained models (such as VisualBERT, ViLBERT, LXMERT and UNITER) are built with Transformer, which extends the classical attention mechanism to multiple layers and heads. To investigate why and how these models work on VQA so well, in this paper we explore the roles of individual heads and layers in Transformer models when handling 12 different types of questions. Specifically, we manually remove (chop) heads (or layers) from a pre-trained VisualBERT model at a time, and test it on different levels of questions to record its performance. As shown in the interesting echelon shape of the result matrices, experiments reveal different heads and layers are responsible for different question types, with higher-level layers activated by higher-level visual reasoning questions. Based on this observation, we design a dynamic chopping module that can automatically remove heads and layers of the VisualBERT at an instance level when dealing with different questions. Our dynamic chopping module can effectively reduce the parameters of the original model by 50%, while only damaging the accuracy by less than 1% on the VQA task.

Download Full-text

Fooling Vision and Language Models Despite Localization and Attention Mechanism

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition ◽

10.1109/cvpr.2018.00520 ◽

2018 ◽

Cited By ~ 10

Author(s):

Xiaojun Xu ◽

Xinyun Chen ◽

Chang Liu ◽

Anna Rohrbach ◽

Trevor Darrell ◽

...

Keyword(s):

Attention Mechanism ◽

Language Models ◽

Vision And Language

Download Full-text

An Attention Mechanism Extension of Automatic HTML Generation from Web Page Design Images

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.140.1393 ◽

2020 ◽

Vol 140 (12) ◽

pp. 1393-1401

Author(s):

Hiroki Chinen ◽

Hidehiro Ohki ◽

Keiji Gyohten ◽

Toshiya Takami

Keyword(s):

Attention Mechanism ◽

Web Page ◽

Web Page Design ◽

Page Design

Download Full-text

Online Speaker Adaptation for LVCSR Based on Attention Mechanism

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.23919/apsipa.2018.8659609 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jia Pan ◽

Diyuan Liu ◽

Genshun Wan ◽

Jun Du ◽

Qingfeng Liu ◽

...

Keyword(s):

Attention Mechanism

Download Full-text

Mobile agents' dynamic small-world network based on attention mechanism

2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS) ◽

10.23919/apnoms50412.2020.9237037 ◽

2020 ◽

Author(s):

Rong Xie

Keyword(s):

Mobile Agents ◽

Small World ◽

Attention Mechanism ◽

Small World Network

Download Full-text

A New Time-Frequency Attention Mechanism for TDNN and CNN-LSTM-TDNN, with Application to Language Identification

10.21437/interspeech.2019-1256 ◽

2019 ◽

Cited By ~ 3

Author(s):

Xiaoxiao Miao ◽

Ian McLoughlin ◽

Yonghong Yan

Keyword(s):

Attention Mechanism ◽

Language Identification ◽

Time Frequency ◽

New Time

Download Full-text

Dam Deformation Interpretation and Prediction Based on a Long Short-Term Memory Model Coupled with an Attention Mechanism

Applied Sciences ◽

10.3390/app11146625 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6625

Author(s):

Yan Su ◽

Kailiang Weng ◽

Chuan Lin ◽

Zeqin Chen

Keyword(s):

Short Term Memory ◽

Attention Mechanism ◽

Impact Factors ◽

Nonlinear Prediction ◽

Time Dimension ◽

Short Term ◽

Deformation Prediction ◽

Term Memory ◽

Long Short Term Memory ◽

Dam Deformation

An accurate dam deformation prediction model is vital to a dam safety monitoring system, as it helps assess and manage dam risks. Most traditional dam deformation prediction algorithms ignore the interpretation and evaluation of variables and lack qualitative measures. This paper proposes a data processing framework that uses a long short-term memory (LSTM) model coupled with an attention mechanism to predict the deformation response of a dam structure. First, the random forest (RF) model is introduced to assess the relative importance of impact factors and screen input variables. Secondly, the density-based spatial clustering of applications with noise (DBSCAN) method is used to identify and filter the equipment based abnormal values to reduce the random error in the measurements. Finally, the coupled model is used to focus on important factors in the time dimension in order to obtain more accurate nonlinear prediction results. The results of the case study show that, of all tested methods, the proposed coupled method performed best. In addition, it was found that temperature and water level both have significant impacts on dam deformation and can serve as reliable metrics for dam management.

Download Full-text