scholarly journals Fooling Vision and Language Models Despite Localization and Attention Mechanism

Author(s):  
Xiaojun Xu ◽  
Xinyun Chen ◽  
Chang Liu ◽  
Anna Rohrbach ◽  
Trevor Darrell ◽  
...  
Author(s):  
Jian Li ◽  
Yue Wang ◽  
Michael R. Lyu ◽  
Irwin King

Intelligent code completion has become an essential research task to accelerate modern software development. To facilitate effective code completion for dynamically-typed programming languages, we apply neural language models by learning from large codebases, and develop a tailored attention mechanism for code completion. However, standard neural language models even with attention mechanism cannot correctly predict the out-of-vocabulary (OoV) words that restrict the code completion performance. In this paper, inspired by the prevalence of locally repeated terms in program source code, and the recently proposed pointer copy mechanism, we propose a pointer mixture network for better predicting OoV words in code completion. Based on the context, the pointer mixture network learns to either generate a within-vocabulary word through an RNN component, or regenerate an OoV word from local context through a pointer component. Experiments on two benchmarked datasets demonstrate the effectiveness of our attention mechanism and pointer mixture network on the code completion task.


Author(s):  
Jize Cao ◽  
Zhe Gan ◽  
Yu Cheng ◽  
Licheng Yu ◽  
Yen-Chun Chen ◽  
...  

Author(s):  
Chenyu Gao ◽  
Qi Zhu ◽  
Peng Wang ◽  
Qi Wu

Vision-and-Language (VL) pre-training has shown great potential on many related downstream tasks, such as Visual Question Answering (VQA), one of the most popular problems in the VL field. All of these pre-trained models (such as VisualBERT, ViLBERT, LXMERT and UNITER) are built with Transformer, which extends the classical attention mechanism to multiple layers and heads. To investigate why and how these models work on VQA so well, in this paper we explore the roles of individual heads and layers in Transformer models when handling 12 different types of questions. Specifically, we manually remove (chop) heads (or layers) from a pre-trained VisualBERT model at a time, and test it on different levels of questions to record its performance. As shown in the interesting echelon shape of the result matrices, experiments reveal different heads and layers are responsible for different question types, with higher-level layers activated by higher-level visual reasoning questions. Based on this observation, we design a dynamic chopping module that can automatically remove heads and layers of the VisualBERT at an instance level when dealing with different questions. Our dynamic chopping module can effectively reduce the parameters of the original model by 50%, while only damaging the accuracy by less than 1% on the VQA task.


Author(s):  
Ramprasaath Ramasamy Selvaraju ◽  
Stefan Lee ◽  
Yilin Shen ◽  
Hongxia Jin ◽  
Shalini Ghosh ◽  
...  

2020 ◽  
Vol 140 (12) ◽  
pp. 1393-1401
Author(s):  
Hiroki Chinen ◽  
Hidehiro Ohki ◽  
Keiji Gyohten ◽  
Toshiya Takami

Sign in / Sign up

Export Citation Format

Share Document