scholarly journals Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

Author(s):  
Xiaoyan Wang ◽  
Pavan Kapanipathi ◽  
Ryan Musa ◽  
Mo Yu ◽  
Kartik Talamadupula ◽  
...  

Natural Language Inference (NLI) is fundamental to many Natural Language Processing (NLP) applications including semantic search and question answering. The NLI problem has gained significant attention due to the release of large scale, challenging datasets. Present approaches to the problem largely focus on learning-based methods that use only textual information in order to classify whether a given premise entails, contradicts, or is neutral with respect to a given hypothesis. Surprisingly, the use of methods based on structured knowledge – a central topic in artificial intelligence – has not received much attention vis-a-vis the NLI problem. While there are many open knowledge bases that contain various types of reasoning information, their use for NLI has not been well explored. To address this, we present a combination of techniques that harness external knowledge to improve performance on the NLI problem in the science questions domain. We present the results of applying our techniques on text, graph, and text-and-graph based models; and discuss the implications of using external knowledge to solve the NLI problem. Our model achieves close to state-of-the-art performance for NLI on the SciTail science questions dataset.

2020 ◽  
Vol 34 (05) ◽  
pp. 9346-9353
Author(s):  
Bingcong Xue ◽  
Sen Hu ◽  
Lei Zou ◽  
Jiashu Cheng

Paraphrase, i.e., differing textual realizations of the same meaning, has proven useful for many natural language processing (NLP) applications. Collecting paraphrase for predicates in knowledge bases (KBs) is the key to comprehend the RDF triples in KBs. Existing works have published some paraphrase datasets automatically extracted from large corpora, but have too many redundant pairs or don't cover enough predicates, which cannot be improved by computer only and need the help of human beings. This paper shows a full process of collecting large-scale and high-quality paraphrase dictionaries for predicates in knowledge bases, which takes advantage of existing datasets and combines the technologies of machine mining and crowdsourcing. Our dataset comprises 2284 distinct predicates in DBpedia and 31130 paraphrase pairs in total, the quality of which is a great leap over previous works. Then it is demonstrated that such good paraphrase dictionaries can do great help to natural language processing tasks such as question answering and language generation. We also publish our own dictionary for further research.


2020 ◽  
Vol 12 (3) ◽  
pp. 45
Author(s):  
Wenqing Wu ◽  
Zhenfang Zhu ◽  
Qiang Lu ◽  
Dianyuan Zhang ◽  
Qiangqiang Guo

Knowledge base question answering (KBQA) aims to analyze the semantics of natural language questions and return accurate answers from the knowledge base (KB). More and more studies have applied knowledge bases to question answering systems, and when using a KB to answer a natural language question, there are some words that imply the tense (e.g., original and previous) and play a limiting role in questions. However, most existing methods for KBQA cannot model a question with implicit temporal constraints. In this work, we propose a model based on a bidirectional attentive memory network, which obtains the temporal information in the question through attention mechanisms and external knowledge. Specifically, we encode the external knowledge as vectors, and use additive attention between the question and external knowledge to obtain the temporal information, then further enhance the question vector to increase the accuracy. On the WebQuestions benchmark, our method not only performs better with the overall data, but also has excellent performance regarding questions with implicit temporal constraints, which are separate from the overall data. As we use attention mechanisms, our method also offers better interpretability.


2013 ◽  
Vol 21 (1) ◽  
pp. 113-138 ◽  
Author(s):  
MUHUA ZHU ◽  
JINGBO ZHU ◽  
HUIZHEN WANG

AbstractShift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–713) to improve part-of-speech taggers to provide better part-of-speech tags to subsequent shift-reduce parsers. Second, we enhance shift-reduce parsing models with novel features that are defined on lexical dependency information. Both stages depend on the use of large-scale unlabeled data. Experimental results show that the approach achieves overall improvements of 1.5 percent and 2.1 percent on English and Chinese data respectively. Moreover, the final parsing accuracies reach 90.9 percent and 82.2 percent respectively, which are comparable with the accuracy of state-of-the-art parsers.


2020 ◽  
pp. 259-269
Author(s):  
H.I. Hoherchak ◽  

The article describes some ways of knowledge bases application to natural language texts analysis and solving some of their processing tasks. The basic problems of natural language processing are considered, which are the basis for their semantic analysis: problems of tokenization, parts of speech tagging, dependency parsing, correference resolution. The basic concepts of knowledge bases theory are presented and the approach to their filling based on Universal Dependencies framework and the correference resolution problem is proposed. Examples of applications for knowledge bases filled with natural language texts in practical problems are given, including checking constructed syntactic and semantic models for consistency and question answering.


2018 ◽  
Vol 18 (1) ◽  
pp. 93-94
Author(s):  
Kiril Simov ◽  
Petya Osenova

Abstract With the availability of large language data online, cross-linked lexical resources (such as BabelNet, Predicate Matrix and UBY) and semantically annotated corpora (SemCor, OntoNotes, etc.), more and more applications in Natural Language Processing (NLP) have started to exploit various semantic models. The semantic models have been created on the base of LSA, clustering, word embeddings, deep learning, neural networks, etc., and abstract logical forms, such as Minimal Recursion Semantics (MRS) or Abstract Meaning Representation (AMR), etc. Additionally, the Linguistic Linked Open Data Cloud has been initiated (LLOD Cloud) which interlinks linguistic data for improving the tasks of NLP. This cloud has been expanding enormously for the last four-five years. It includes corpora, lexicons, thesauri, knowledge bases of various kinds, organized around appropriate ontologies, such as LEMON. The semantic models behind the data organization as well as the representation of the semantic resources themselves are a challenge to the NLP community. The NLP applications that extensively rely on the above discussed models include Machine Translation, Information Extraction, Question Answering, Text Simplification, etc.


Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 452
Author(s):  
Ammar Arbaaeen ◽  
Asadullah Shah

Within the space of question answering (QA) systems, the most critical module to improve overall performance is question analysis processing. Extracting the lexical semantic of a Natural Language (NL) question presents challenges at syntactic and semantic levels for most QA systems. This is due to the difference between the words posed by a user and the terms presently stored in the knowledge bases. Many studies have achieved encouraging results in lexical semantic resolution on the topic of word sense disambiguation (WSD), and several other works consider these challenges in the context of QA applications. Additionally, few scholars have examined the role of WSD in returning potential answers corresponding to particular questions. However, natural language processing (NLP) is still facing several challenges to determine the precise meaning of various ambiguities. Therefore, the motivation of this work is to propose a novel knowledge-based sense disambiguation (KSD) method for resolving the problem of lexical ambiguity associated with questions posed in QA systems. The major contribution is the proposed innovative method, which incorporates multiple knowledge sources. This includes the question’s metadata (date/GPS), context knowledge, and domain ontology into a shallow NLP. The proposed KSD method is developed into a unique tool for a mobile QA application that aims to determine the intended meaning of questions expressed by pilgrims. The experimental results reveal that our method obtained comparable and better accuracy performance than the baselines in the context of the pilgrimage domain.


Author(s):  
Pierre-Alexandre Murena ◽  
Marie Al-Ghossein ◽  
Jean-Louis Dessalles ◽  
Antoine Cornuéjols

Analogies are 4-ary relations of the form "A is to B as C is to D". When A, B and C are fixed, we call analogical equation the problem of finding the correct D. A direct applicative domain is Natural Language Processing, in which it has been shown successful on word inflections, such as conjugation or declension. If most approaches rely on the axioms of proportional analogy to solve these equations, these axioms are known to have limitations, in particular in the nature of the considered flections. In this paper, we propose an alternative approach, based on the assumption that optimal word inflections are transformations of minimal complexity. We propose a rough estimation of complexity for word analogies and an algorithm to find the optimal transformations. We illustrate our method on a large-scale benchmark dataset and compare with state-of-the-art approaches to demonstrate the interest of using complexity to solve analogies on words.


2021 ◽  
Vol 26 (jai2021.26(2)) ◽  
pp. 88-95
Author(s):  
Hlybovets A ◽  
◽  
Tsaruk A ◽  

Within the framework of this paper, the analysis of software systems of question-answering type and their basic architectures has been carried out. With the development of machine learning technologies, creation of natural language processing (NLP) engines, as well as the rising popularity of virtual personal assistant programs that use the capabilities of speech synthesis (text-to-speech), there is a growing need in developing question-answering systems which can provide personalized answers to users' questions. All modern cloud providers proposed frameworks for organization of question answering systems but still we have a problem with personalized dialogs. Personalization is very important, it can put forward additional demands to a question-answering system’s capabilities to take this information into account while processing users’ questions. Traditionally, a question-answering system (QAS) is developed in the form of an application that contains a knowledge base and a user interface, which provides a user with answers to questions, and a means of interaction with an expert. In this article we analyze modern approaches to architecture development and try to build system from the building blocks that already exist on the market. Main criteria for the NLP modules were: support of the Ukrainian language, natural language understanding, functions of automatic definition of entities (attributes), ability to construct a dialogue flow, quality and completeness of documentation, API capabilities and integration with external systems, possibilities of external knowledge bases integration After provided analyses article propose the detailed architecture of the question-answering subsystem with elements of self-learning in the Ukrainian language. In the work you can find detailed description of main semantic components of the system (architecture components)


2019 ◽  
Vol 9 (1) ◽  
pp. 88-106
Author(s):  
Irphan Ali ◽  
Divakar Yadav ◽  
Ashok Kumar Sharma

A question answering system aims to provide the correct and quick answer to users' query from a knowledge base. Due to the growth of digital information on the web, information retrieval system is the need of the day. Most recent question answering systems consult knowledge bases to answer a question, after parsing and transforming natural language queries to knowledge base-executable forms. In this article, the authors propose a semantic web-based approach for question answering system that uses natural language processing for analysis and understanding the user query. It employs a “Total Answer Relevance Score” to find the relevance of each answer returned by the system. The results obtained thereof are quite promising. The real-time performance of the system has been evaluated on the answers, extracted from the knowledge base.


2008 ◽  
Vol 02 (03) ◽  
pp. 343-364 ◽  
Author(s):  
BRIAN HARRINGTON ◽  
STEPHEN CLARK

Extracting semantic information from multiple natural language sources and combining that information into a single unified resource is an important and fundamental goal for natural language processing. Large scale resources of this kind can be useful for a wide variety of tasks including question answering, word sense disambiguation and knowledge discovery. A single resource representing the information in multiple documents can provide significantly more semantic information than is available from the documents considered independently. The ASKNet system utilises existing NLP tools and resources, together with spreading activation based techniques, to automatically extract semantic information from a large number of English texts, and combines that information into a large scale semantic network. The initial emphasis of the ASKNet system is on wide-coverage, robustness and speed of construction. In this paper we show how a network consisting of over 1.5 million nodes and 3.5 million edges, more than twice as large as any network currently available, can be created in less than 3 days. Evaluation of large-scale semantic networks is a difficult problem. In order to evaluate ASKNet we have developed a novel evaluation metric based on the notion of a network "core" and employed human evaluators to determine the precision of various components of that core. We have applied this evaluation to networks created from randomly chosen articles used by DUC (Document Understanding Conference). The results are highly promising: almost 80% precision in the semantic core of the networks.


Sign in / Sign up

Export Citation Format

Share Document