Adversarial Learning With Multi-Modal Attention for Visual Question Answering

Author(s):  
Yun Liu ◽  
Xiaoming Zhang ◽  
Feiran Huang ◽  
Lei Cheng ◽  
Zhoujun Li
2020 ◽  
pp. 1-14
Author(s):  
Yun Liu ◽  
Xiaoming Zhang ◽  
Zhiyun Zhao ◽  
Bo Zhang ◽  
Lei Cheng ◽  
...  

2020 ◽  
Author(s):  
Iqbal Chowdhury ◽  
Kien Nguyen Thanh ◽  
Clinton fookes ◽  
Sridha Sridharan

Solving the Visual Question Answering (VQA) task is a step towards achieving human-like reasoning capability of the machines. This paper proposes an approach to learn multimodal feature representation with adversarial training. The purpose of the adversarial training allows the model to learn from standard fusion methods in an unsupervised manner. The discriminator model is equipped with a siamese combinatin of two standard fusion method namely multimodal compact bilinear pooling and multimodal tucker fusion. Output multimodal feature representation from generator is a resultant of graph convolutional operation. The resultant multimodal representation of the adversarial training allows the proposed model to infer the correct answers from open-ended natural language questions from the VQA 2.0 dataset. An overall accuracy of 69.86\% demonstrates the accuracy of the proposed model.


2020 ◽  
Author(s):  
Iqbal Chowdhury ◽  
Kien Nguyen Thanh ◽  
Clinton fookes ◽  
Sridha Sridharan

Solving the Visual Question Answering (VQA) task is a step towards achieving human-like reasoning capability of the machines. This paper proposes an approach to learn multimodal feature representation with adversarial training. The purpose of the adversarial training allows the model to learn from standard fusion methods in an unsupervised manner. The discriminator model is equipped with a siamese combinatin of two standard fusion method namely multimodal compact bilinear pooling and multimodal tucker fusion. Output multimodal feature representation from generator is a resultant of graph convolutional operation. The resultant multimodal representation of the adversarial training allows the proposed model to infer the correct answers from open-ended natural language questions from the VQA 2.0 dataset. An overall accuracy of 69.86\% demonstrates the accuracy of the proposed model.


2021 ◽  
Author(s):  
Dezhi Han ◽  
Shuli Zhou ◽  
Kuan Ching Li ◽  
Rodrigo Fernandes de Mello

Sign in / Sign up

Export Citation Format

Share Document