Large-Scale Relation Learning for Question Answering over Knowledge Bases with Pre-trained Language Models

Metaknowledge Enhanced Open Domain Question Answering with Wiki Documents

Sensors ◽

10.3390/s21248439 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8439

Author(s):

Shukan Liu ◽

Ruilin Xu ◽

Li Duan ◽

Mingjie Li ◽

Yiming Liu

Keyword(s):

Large Scale ◽

Question Answering ◽

Weighted Graph ◽

Knowledge Bases ◽

Language Models ◽

Main Stream ◽

Semantic Features ◽

Open Domain ◽

Original Graph

The commonly-used large-scale knowledge bases have been facing challenges in open domain question answering tasks which are caused by the loose knowledge association and weak structural logic of triplet-based knowledge. To find a way out of this dilemma, this work proposes a novel metaknowledge enhanced approach for open domain question answering. We design an automatic approach to extract metaknowledge and build a metaknowledge network from Wiki documents. For the purpose of representing the directional weighted graph with hierarchical and semantic features, we present an original graph encoder GE4MK to model the metaknowledge network. Then, a metaknowledge enhanced graph reasoning model MEGr-Net is proposed for question answering, which aggregates both relational and neighboring interactions comparing with R-GCN and GAT. Experiments have proved the improvement of metaknowledge over main-stream triplet-based knowledge. We have found that the graph reasoning models and pre-trained language models also have influences on the metaknowledge enhanced question answering approaches.

Download Full-text

Metaknowledge Enhanced Open Domain Question Answering with Wiki Documents

10.20944/preprints202110.0220.v1 ◽

2021 ◽

Author(s):

Shukan Liu ◽

Ruilin Xu ◽

Li Duan ◽

Mingjie Li ◽

Yiming Liu

Keyword(s):

Large Scale ◽

Question Answering ◽

Weighted Graph ◽

Knowledge Bases ◽

Language Models ◽

Main Stream ◽

Semantic Features ◽

Open Domain ◽

Original Graph

The commonly-used large-scale knowledge bases have been facing challenges in open domain question answering tasks which are caused by the loose knowledge association and weak structural logic of triplet-based knowledge. To find a way out of this dilemma, this work proposes a novel metaknowledge enhanced approach for open domain question answering. We design an automatic approach to extract metaknowledge and build metaknowledge network from Wiki documents. For the purpose of representing the directional weighted graph with hierarchical and semantic features, we present an original graph encoder GE4MK to model the metaknowledge network. Then a metaknowledge enhanced graph reasoning model MEGr-Net is proposed for question answering, which aggregates both relational and neighboring interactions comparing with R-GCN and GAT. Experiments have proved the improvement of metaknowledge over main-stream triplet-based knowledge. We have found that the graph reasoning models and pre-trained language models also have influences on the metaknowledge enhanced question answering approaches.

Download Full-text

UniMF: A Unified Framework to Incorporate Multimodal Knowledge Bases intoEnd-to-End Task-Oriented Dialogue Systems

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/548 ◽

2021 ◽

Author(s):

Shiquan Yang ◽

Rui Zhang ◽

Sarah M. Erfani ◽

Jey Han Lau

Keyword(s):

Large Scale ◽

Knowledge Bases ◽

Language Models ◽

Dialogue Systems ◽

Unified Framework ◽

Proposed Model ◽

Multimodal Information ◽

Task Oriented ◽

Novel Model ◽

Single Modality

Knowledge bases (KBs) are usually essential for building practical dialogue systems. Recently we have seen rapidly growing interest in integrating knowledge bases into dialogue systems. However, existing approaches mostly deal with knowledge bases of a single modality, typically textual information. As today's knowledge bases become abundant with multimodal information such as images, audios and videos, the limitation of existing approaches greatly hinders the development of dialogue systems. In this paper, we focus on task-oriented dialogue systems and address this limitation by proposing a novel model that integrates external multimodal KB reasoning with pre-trained language models. We further enhance the model via a novel multi-granularity fusion mechanism to capture multi-grained semantics in the dialogue history. To validate the effectiveness of the proposed model, we collect a new large-scale (14K) dialogue dataset MMDialKB, built upon multimodal KB. Both automatic and human evaluation results on MMDialKB demonstrate the superiority of our proposed framework over strong baselines.

Download Full-text

From UBGs to CFGs A practical corpus-driven approach

Natural Language Engineering ◽

10.1017/s1351324906004128 ◽

2007 ◽

Vol 13 (4) ◽

pp. 317-351

Author(s):

HANS-ULRICH KRIEGER

Keyword(s):

Approximation Method ◽

Large Scale ◽

Question Answering ◽

Syntactic Structure ◽

Language Models ◽

Open Approach ◽

Domain Specific ◽

Stochastic Parsing ◽

Speech Recognizer ◽

Context Free

AbstractWe present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars, such as HPSG, CLE, or PATR-II into context-free grammars (CFGs). Our research is motivated by the idea that we can exploit (large-scale), hand-written unification grammars not only for the purpose of describing natural language and obtaining a syntactic structure (and perhaps a semantic form), but also to address several other very practical topics. Firstly, to speed up deep parsing by having a cheap recognition pre-flter (the approximated CFG). Secondly, to obtain an indirect stochastic parsing model for the unification grammar through a trained PCFG, obtained from the approximated CFG. This gives us an efficient disambiguation model for the unification-based grammar. Thirdly, to generate domain-specific subgrammars for application areas such as information extraction or question answering. And finally, to compile context-free language models which assist the acoustic model of a speech recognizer. The approximation method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in JAVA.

Download Full-text

Designing production-friendly machine learning

Proceedings of the VLDB Endowment ◽

10.14778/3484224.3484241 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3420-3420

Author(s):

Matei Zaharia

Keyword(s):

Machine Learning ◽

Open Source ◽

Large Scale ◽

Question Answering ◽

Failure Modes ◽

Computational Cost ◽

Language Models ◽

Software Systems ◽

Resource Cost ◽

Low Computational Cost

Building production ML applications is difficult because of their resource cost and complex failure modes. I will discuss these challenges from two perspectives: the Stanford DAWN Lab and experience with large-scale commercial ML users at Databricks. I will then present two emerging ideas to help address these challenges. The first is "ML platforms", an emerging class of software systems that standardize the interfaces used in ML applications to make them easier to build and maintain. I will give a few examples, including the open-source MLflow system from Databricks [3]. The second idea is models that are more "production-friendly" by design. As a concrete example, I will discuss retrieval-based NLP models such as Stanford's ColBERT [1, 2] that query documents from an updateable corpus to perform tasks such as question-answering, which gives multiple practical advantages, including low computational cost, high interpretability, and very fast updates to the model's "knowledge". These models are an exciting alternative to large language models such as GPT-3.

Download Full-text

CFO: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases

10.18653/v1/p16-1076 ◽

2016 ◽

Cited By ~ 32

Author(s):

Zihang Dai ◽

Lei Li ◽

Wei Xu

Keyword(s):

Large Scale ◽

Question Answering ◽

Knowledge Bases

Download Full-text

Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via Alternate Meta-learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/509 ◽

2020 ◽

Author(s):

Yuncheng Hua ◽

Yuan-Fang Li ◽

Gholamreza Haffari ◽

Guilin Qi ◽

Wei Wu

Keyword(s):

Knowledge Base ◽

Large Scale ◽

Question Answering ◽

Knowledge Bases ◽

Retrieval Model ◽

Test Question ◽

Weak Supervision ◽

Meta Learning ◽

Complex Knowledge ◽

Complex Question

A compelling approach to complex question answering is to convert the question to a sequence of actions, which can then be executed on the knowledge base to yield the answer, aka the programmer-interpreter approach. Use similar training questions to the test question, meta-learning enables the programmer to adapt to unseen questions to tackle potential distributional biases quickly. However, this comes at the cost of manually labeling similar questions to learn a retrieval model, which is tedious and expensive. In this paper, we present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision, i.e., the system’s performance with respect to the produced answers. To the best of our knowledge, this is the first attempt to train the retrieval model with the programmer jointly. Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases. We have released our code at https://github.com/DevinJake/MARL.

Download Full-text

The Value of Paraphrase for Knowledge Base Predicates

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6475 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9346-9353

Author(s):

Bingcong Xue ◽

Sen Hu ◽

Lei Zou ◽

Jiashu Cheng

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Question Answering ◽

Knowledge Bases ◽

Human Beings ◽

High Quality ◽

Language Generation

Paraphrase, i.e., differing textual realizations of the same meaning, has proven useful for many natural language processing (NLP) applications. Collecting paraphrase for predicates in knowledge bases (KBs) is the key to comprehend the RDF triples in KBs. Existing works have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don't cover enough predicates, which cannot be improved by computer only and need the help of human beings. This paper shows a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. Our dataset comprises 2284 distinct predicates in DBpedia and 31130 paraphrase pairs in total, the quality of which is a great leap over previous works. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.

Download Full-text

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017208 ◽

2019 ◽

Vol 33 ◽

pp. 7208-7215 ◽

Cited By ~ 3

Author(s):

Xiaoyan Wang ◽

Pavan Kapanipathi ◽

Ryan Musa ◽

Mo Yu ◽

Kartik Talamadupula ◽

...

Keyword(s):

Natural Language ◽

Language Processing ◽

Large Scale ◽

Question Answering ◽

State Of The Art ◽

Knowledge Bases ◽

Improve Performance ◽

External Knowledge ◽

Structured Knowledge ◽

Significant Attention

Natural Language Inference (NLI) is fundamental to many Natural Language Processing (NLP) applications including semantic search and question answering. The NLI problem has gained significant attention due to the release of large scale, challenging datasets. Present approaches to the problem largely focus on learning-based methods that use only textual information in order to classify whether a given premise entails, contradicts, or is neutral with respect to a given hypothesis. Surprisingly, the use of methods based on structured knowledge – a central topic in artificial intelligence – has not received much attention vis-a-vis the NLI problem. While there are many open knowledge bases that contain various types of reasoning information, their use for NLI has not been well explored. To address this, we present a combination of techniques that harness external knowledge to improve performance on the NLI problem in the science questions domain. We present the results of applying our techniques on text, graph, and text-and-graph based models; and discuss the implications of using external knowledge to solve the NLI problem. Our model achieves close to state-of-the-art performance for NLI on the SciTail science questions dataset.

Download Full-text

Using Wikipedia's Big Data for creation of Knowledge Bases

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217546 ◽

2021 ◽

pp. 11-18

Author(s):

Mohamed Minhaj

Keyword(s):

Big Data ◽

Large Scale ◽

Question Answering ◽

Semantic Information ◽

Knowledge Bases ◽

Sources Of Information ◽

Direct Interpretation ◽

Efficient Retrieval ◽

Retrieval Mechanism ◽

And Storage

Wikipedia is among the most prominent and comprehensive sources of information available on the WWW. However, its unstructured form impedes direct interpretation by machines. Knowledge Base (KB) creation is a line of research that enables interpretation of Wikipedia's concealed knowledge by machines. In light of the efficacy of KBs for the storage and efficient retrieval of semantic information required for powering several IT applications such Question-Answering System, many large-scale knowledge bases have been developed. These KBs have employed different approaches for data curation and storage. The retrieval mechanism facilitated by these KBs is also different. Further, they differ in their depth and breadth of knowledge. This paper endeavours to explicate the process of KB creation using Wikipedia and compare the prominent KBs developed using the big data of Wikipedia.

Download Full-text