Adversarial Learning With Multi-Modal Attention for Visual Question Answering

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3016083 ◽

2020 ◽

pp. 1-15

Author(s):

Yun Liu ◽

Xiaoming Zhang ◽

Feiran Huang ◽

Lei Cheng ◽

Zhoujun Li

Keyword(s):

Question Answering ◽

Adversarial Learning ◽

Visual Question Answering

Download Full-text

ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering

IEEE Transactions on Cybernetics ◽

10.1109/tcyb.2020.3029423 ◽

2020 ◽

pp. 1-14

Author(s):

Yun Liu ◽

Xiaoming Zhang ◽

Zhiyun Zhao ◽

Bo Zhang ◽

Lei Cheng ◽

...

Keyword(s):

Question Answering ◽

Adversarial Learning ◽

Visual Question Answering

Download Full-text

Visual Question Answering Through Adversarial Learning of Multi-modal Representation

10.36227/techrxiv.12731948 ◽

2020 ◽

Author(s):

Iqbal Chowdhury ◽

Kien Nguyen Thanh ◽

Clinton fookes ◽

Sridha Sridharan

Keyword(s):

Natural Language ◽

Question Answering ◽

Feature Representation ◽

Fusion Method ◽

Adversarial Learning ◽

Visual Question Answering ◽

Proposed Model ◽

Fusion Methods ◽

Adversarial Training ◽

Multimodal Representation

Solving the Visual Question Answering (VQA) task is a step towards achieving human-like reasoning capability of the machines. This paper proposes an approach to learn multimodal feature representation with adversarial training. The purpose of the adversarial training allows the model to learn from standard fusion methods in an unsupervised manner. The discriminator model is equipped with a siamese combinatin of two standard fusion method namely multimodal compact bilinear pooling and multimodal tucker fusion. Output multimodal feature representation from generator is a resultant of graph convolutional operation. The resultant multimodal representation of the adversarial training allows the proposed model to infer the correct answers from open-ended natural language questions from the VQA 2.0 dataset. An overall accuracy of 69.86\% demonstrates the accuracy of the proposed model.

Download Full-text

Visual Question Answering Through Adversarial Learning of Multi-modal Representation

10.36227/techrxiv.12731948.v1 ◽

2020 ◽

Author(s):

Iqbal Chowdhury ◽

Kien Nguyen Thanh ◽

Clinton fookes ◽

Sridha Sridharan

Keyword(s):

Natural Language ◽

Question Answering ◽

Feature Representation ◽

Fusion Method ◽

Adversarial Learning ◽

Visual Question Answering ◽

Proposed Model ◽

Fusion Methods ◽

Adversarial Training ◽

Multimodal Representation

Solving the Visual Question Answering (VQA) task is a step towards achieving human-like reasoning capability of the machines. This paper proposes an approach to learn multimodal feature representation with adversarial training. The purpose of the adversarial training allows the model to learn from standard fusion methods in an unsupervised manner. The discriminator model is equipped with a siamese combinatin of two standard fusion method namely multimodal compact bilinear pooling and multimodal tucker fusion. Output multimodal feature representation from generator is a resultant of graph convolutional operation. The resultant multimodal representation of the adversarial training allows the proposed model to infer the correct answers from open-ended natural language questions from the VQA 2.0 dataset. An overall accuracy of 69.86\% demonstrates the accuracy of the proposed model.

Download Full-text

Adversarial Learning of Answer-Related Representation for Visual Question Answering

Proceedings of the 27th ACM International Conference on Information and Knowledge Management - CIKM '18 ◽

10.1145/3269206.3271765 ◽

2018 ◽

Author(s):

Yun Liu ◽

Xiaoming Zhang ◽

Feiran Huang ◽

Zhoujun Li

Keyword(s):

Question Answering ◽

Adversarial Learning ◽

Visual Question Answering

Download Full-text

Vision And Text Transformer For Predicting Answerability On Visual Question Answering

10.1109/icip42928.2021.9506796 ◽

2021 ◽

Author(s):

Tung Le ◽

Huy Tien Nguyen ◽

Minh Le Nguyen

Keyword(s):

Question Answering ◽

Visual Question Answering

Download Full-text

Visual Question Answering for Monas Tourism Object using Deep Learning

2020 International Conference on Advanced Computer Science and Information Systems (ICACSIS) ◽

10.1109/icacsis51025.2020.9263149 ◽

2020 ◽

Author(s):

Ahmad Hasan Siregar ◽

Dina Chahyati

Keyword(s):

Deep Learning ◽

Question Answering ◽

Visual Question Answering

Download Full-text

Cross-modality co-attention networks for visual question answering

Soft Computing ◽

10.1007/s00500-020-05539-7 ◽

2021 ◽

Author(s):

Dezhi Han ◽

Shuli Zhou ◽

Kuan Ching Li ◽

Rodrigo Fernandes de Mello

Keyword(s):

Question Answering ◽

Attention Networks ◽

Visual Question Answering

Download Full-text

Comparative Study of Visual Question Answering Algorithms

2020 15th International Conference on Computer Engineering and Systems (ICCES) ◽

10.1109/icces51560.2020.9334686 ◽

2020 ◽

Author(s):

Ahmed Mostafa ◽

Hazem Abbas ◽

Mahmoud I. Khalil

Keyword(s):

Comparative Study ◽

Question Answering ◽

Visual Question Answering

Download Full-text

Visual Question Answering: Methodologies and Challenges

2020 International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE) ◽

10.1109/icstcee49637.2020.9277374 ◽

2020 ◽

Author(s):

Liyana Sahir Kallooriyakath ◽

Jithin M V ◽

Bindu P V ◽

Adith P P

Keyword(s):

Question Answering ◽

Visual Question Answering

Download Full-text

Boosting Visual Question Answering with Context-aware Knowledge Aggregation

Proceedings of the 28th ACM International Conference on Multimedia ◽

10.1145/3394171.3413943 ◽

2020 ◽

Author(s):

Guohao Li ◽

Xin Wang ◽

Wenwu Zhu

Keyword(s):

Question Answering ◽

Context Aware ◽

Visual Question Answering

Download Full-text