scholarly journals KVQA: Knowledge-Aware Visual Question Answering

Author(s):  
Sanket Shah ◽  
Anand Mishra ◽  
Naganand Yadati ◽  
Partha Pratim Talukdar

Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about an image which can be answered purely based on its content. For example, given an image with people in it, a typical VQA question may inquire about the number of people in the image. More recently, there is growing interest in answering questions which require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present in the image. In spite of this progress, the important problem of answering questions requiring world knowledge about named entities (e.g., Barack Obama, White House, United Nations) in the image has not been addressed in prior research. We address this gap in this paper, and introduce KVQA – the first dataset for the task of (world) knowledge-aware VQA. KVQA consists of 183K question-answer pairs involving more than 18K named entities and 24K images. Questions in this dataset require multi-entity, multi-relation, and multi-hop reasoning over large Knowledge Graphs (KG) to arrive at an answer. To the best of our knowledge, KVQA is the largest dataset for exploring VQA over KG. Further, we also provide baseline performances using state-of-the-art methods on KVQA.

Author(s):  
Fei Liu ◽  
Jing Liu ◽  
Zhiwei Fang ◽  
Richang Hong ◽  
Hanqing Lu

Learning effective interactions between multi-modal features is at the heart of visual question answering (VQA). A common defect of the existing VQA approaches is that they only consider a very limited amount of interactions, which may be not enough to model latent complex image-question relations that are necessary for accurately answering questions. Therefore, in this paper, we propose a novel DCAF (Densely Connected Attention Flow) framework for modeling dense interactions. It densely connects all pairwise layers of the network via Attention Connectors, capturing fine-grained interplay between image and question across all hierarchical levels. The proposed Attention Connector efficiently connects the multi-modal features at any two layers with symmetric co-attention, and produces interaction-aware attention features. Experimental results on three publicly available datasets show that the proposed method achieves state-of-the-art performance.


Author(s):  
Yunshi Lan ◽  
Shuohang Wang ◽  
Jing Jiang

Knowledge base question answering (KBQA) is an important task in natural language processing. Existing methods for KBQA usually start with entity linking, which considers mostly named entities found in a question as the starting points in the KB to search for answers to the question. However, relying only on entity linking to look for answer candidates may not be sufficient. In this paper, we propose to perform topic unit linking where topic units cover a wider range of units of a KB. We use a generation-and-scoring approach to gradually refine the set of topic units. Furthermore, we use reinforcement learning to jointly learn the parameters for topic unit linking and answer candidate ranking in an end-to-end manner. Experiments on three commonly used benchmark datasets show that our method consistently works well and outperforms the previous state of the art on two datasets.


2021 ◽  
Vol 47 (05) ◽  
Author(s):  
NGUYỄN CHÍ HIẾU

Knowledge Graphs are applied in many fields such as search engines, semantic analysis, and question answering in recent years. However, there are many obstacles for building knowledge graphs as methodologies, data and tools. This paper introduces a novel methodology to build knowledge graph from heterogeneous documents.  We use the methodologies of Natural Language Processing and deep learning to build this graph. The knowledge graph can use in Question answering systems and Information retrieval especially in Computing domain


2020 ◽  
Vol 34 (07) ◽  
pp. 13041-13049 ◽  
Author(s):  
Luowei Zhou ◽  
Hamid Palangi ◽  
Lei Zhang ◽  
Houdong Hu ◽  
Jason Corso ◽  
...  

This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that (1) it can be fine-tuned for either vision-language generation (e.g., image captioning) or understanding (e.g., visual question answering) tasks, and (2) it uses a shared multi-layer transformer network for both encoding and decoding, which differs from many existing methods where the encoder and decoder are implemented using separate models. The unified VLP model is pre-trained on a large amount of image-text pairs using the unsupervised learning objectives of two tasks: bidirectional and sequence-to-sequence (seq2seq) masked vision-language prediction. The two tasks differ solely in what context the prediction conditions on. This is controlled by utilizing specific self-attention masks for the shared transformer network. To the best of our knowledge, VLP is the first reported model that achieves state-of-the-art results on both vision-language generation and understanding tasks, as disparate as image captioning and visual question answering, across three challenging benchmark datasets: COCO Captions, Flickr30k Captions, and VQA 2.0. The code and the pre-trained models are available at https://github.com/LuoweiZhou/VLP.


Author(s):  
Xiangpeng Li ◽  
Jingkuan Song ◽  
Lianli Gao ◽  
Xianglong Liu ◽  
Wenbing Huang ◽  
...  

Most of the recent progresses on visual question answering are based on recurrent neural networks (RNNs) with attention. Despite the success, these models are often timeconsuming and having difficulties in modeling long range dependencies due to the sequential nature of RNNs. We propose a new architecture, Positional Self-Attention with Coattention (PSAC), which does not require RNNs for video question answering. Specifically, inspired by the success of self-attention in machine translation task, we propose a Positional Self-Attention to calculate the response at each position by attending to all positions within the same sequence, and then add representations of absolute positions. Therefore, PSAC can exploit the global dependencies of question and temporal information in the video, and make the process of question and video encoding executed in parallel. Furthermore, in addition to attending to the video features relevant to the given questions (i.e., video attention), we utilize the co-attention mechanism by simultaneously modeling “what words to listen to” (question attention). To the best of our knowledge, this is the first work of replacing RNNs with selfattention for the task of visual question answering. Experimental results of four tasks on the benchmark dataset show that our model significantly outperforms the state-of-the-art on three tasks and attains comparable result on the Count task. Our model requires less computation time and achieves better performance compared with the RNNs-based methods. Additional ablation study demonstrates the effect of each component of our proposed model.


2020 ◽  
Vol 34 (05) ◽  
pp. 7578-7585
Author(s):  
Ting-Rui Chiang ◽  
Hao-Tong Ye ◽  
Yun-Nung Chen

With a lot of work about context-free question answering systems, there is an emerging trend of conversational question answering models in the natural language processing field. Thanks to the recently collected datasets, including QuAC and CoQA, there has been more work on conversational question answering, and recent work has achieved competitive performance on both datasets. However, to best of our knowledge, two important questions for conversational comprehension research have not been well studied: 1) How well can the benchmark dataset reflect models' content understanding? 2) Do the models well utilize the conversation content when answering questions? To investigate these questions, we design different training settings, testing settings, as well as an attack to verify the models' capability of content understanding on QuAC and CoQA. The experimental results indicate some potential hazards in the benchmark datasets, QuAC and CoQA, for conversational comprehension research. Our analysis also sheds light on both what models may learn and how datasets may bias the models. With deep investigation of the task, it is believed that this work can benefit the future progress of conversation comprehension. The source code is available at https://github.com/MiuLab/CQA-Study.


2013 ◽  
Vol 39 (4) ◽  
pp. 847-884 ◽  
Author(s):  
Emili Sapena ◽  
Lluís Padró ◽  
Jordi Turmo

This work is focused on research in machine learning for coreference resolution. Coreference resolution is a natural language processing task that consists of determining the expressions in a discourse that refer to the same entity. The main contributions of this article are (i) a new approach to coreference resolution based on constraint satisfaction, using a hypergraph to represent the problem and solving it by relaxation labeling; and (ii) research towards improving coreference resolution performance using world knowledge extracted from Wikipedia. The developed approach is able to use an entity-mention classification model with more expressiveness than the pair-based ones, and overcome the weaknesses of previous approaches in the state of the art such as linking contradictions, classifications without context, and lack of information evaluating pairs. Furthermore, the approach allows the incorporation of new information by adding constraints, and research has been done in order to use world knowledge to improve performances. RelaxCor, the implementation of the approach, achieved results at the state-of-the-art level, and participated in international competitions: SemEval-2010 and CoNLL-2011. RelaxCor achieved second place in CoNLL-2011.


Author(s):  
Hao Zhou ◽  
Tom Young ◽  
Minlie Huang ◽  
Haizhou Zhao ◽  
Jingfang Xu ◽  
...  

Commonsense knowledge is vital to many natural language processing tasks. In this paper, we present a novel open-domain conversation generation model to demonstrate how large-scale commonsense knowledge can facilitate language understanding and generation. Given a user post, the model retrieves relevant knowledge graphs from a knowledge base and then encodes the graphs with a static graph attention mechanism, which augments the semantic information of the post and thus supports better understanding of the post. Then, during word generation, the model attentively reads the retrieved knowledge graphs and the knowledge triples within each graph to facilitate better generation through a dynamic graph attention mechanism. This is the first attempt that uses large-scale commonsense knowledge in conversation generation. Furthermore, unlike existing models that use knowledge triples (entities) separately and independently, our model treats each knowledge graph as a whole, which encodes more structured, connected semantic information in the graphs. Experiments show that the proposed model can generate more appropriate and informative responses than state-of-the-art baselines. 


Semantic Web ◽  
2021 ◽  
pp. 1-17
Author(s):  
Lucia Siciliani ◽  
Pierpaolo Basile ◽  
Pasquale Lops ◽  
Giovanni Semeraro

Question Answering (QA) over Knowledge Graphs (KG) aims to develop a system that is capable of answering users’ questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata, and so on. Question Answering systems need to translate the user’s question, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG. This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern. It becomes even more troublesome when trying to cope with questions that require modifiers in the final query, i.e., aggregate functions, query forms, and so on. The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature. Starting from the latest advances in this field, we want to further step in this direction. This work aims to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language. This dataset has also been used to evaluate three QA systems available at the state of the art.


2019 ◽  
Vol 5 (5) ◽  
pp. 212-215
Author(s):  
Abeer AlArfaj

Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. The Arabic language has complex morphological, grammatical, and semantic aspects since it is a highly inflectional and derivational language, which makes task even more challenging. In this paper, we present a review of the state of the art for relation extraction from texts, addressing the progress and difficulties in this field. We discuss several aspects related to this task, considering the taxonomic and non-taxonomic relation extraction methods. Majority of relation extraction approaches implement a combination of statistical and linguistic techniques to extract semantic relations from text. We also give special attention to the state of the work on relation extraction from Arabic texts, which need further progress.


Sign in / Sign up

Export Citation Format

Share Document