MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
2019 ◽
Vol 33
◽
pp. 8102-8109
◽