scholarly journals Design and realization of a modular architecture for textual entailment

2013 ◽  
Vol 21 (2) ◽  
pp. 167-200 ◽  
Author(s):  
SEBASTIAN PADÓ ◽  
TAE-GIL NOH ◽  
ASHER STERN ◽  
RUI WANG ◽  
ROBERTO ZANOLI

AbstractA key challenge at the core of many Natural Language Processing (NLP) tasks is the ability to determine which conclusions can be inferred from a given natural language text. This problem, called theRecognition of Textual Entailment (RTE), has initiated the development of a range of algorithms, methods, and technologies. Unfortunately, research on Textual Entailment (TE), like semantics research more generally, is fragmented into studies focussing on various aspects of semantics such as world knowledge, lexical and syntactic relations, or more specialized kinds of inference. This fragmentation has problematic practical consequences. Notably, interoperability among the existing RTE systems is poor, and reuse of resources and algorithms is mostly infeasible. This also makes systematic evaluations very difficult to carry out. Finally, textual entailment presents a wide array of approaches to potential end users with little guidance on which to pick. Our contribution to this situation is the novel EXCITEMENT architecture, which was developed to enable and encourage the consolidation of methods and resources in the textual entailment area. It decomposes RTE into components with strongly typed interfaces. We specify (a) a modular linguistic analysis pipeline and (b) a decomposition of the ‘core’ RTE methods into top-level algorithms and subcomponents. We identify four major subcomponent types, including knowledge bases and alignment methods. The architecture was developed with a focus on generality, supporting all major approaches to RTE and encouraging language independence. We illustrate the feasibility of the architecture by constructing mappings of major existing systems onto the architecture. The practical implementation of this architecture forms the EXCITEMENT open platform. It is a suite of textual entailment algorithms and components which contains the three systems named above, including linguistic-analysis pipelines for three languages (English, German, and Italian), and comprises a number of linguistic resources. By addressing the problems outlined above, the platform provides a comprehensive and flexible basis for research and experimentation in textual entailment and is available as open source software under the GNU General Public License.

2009 ◽  
Vol 34 ◽  
pp. 443-498 ◽  
Author(s):  
E. Gabrilovich ◽  
S. Markovitch

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.


2013 ◽  
Vol 846-847 ◽  
pp. 1376-1379
Author(s):  
Li Fei Geng ◽  
Hong Lian Li

Syntactic analysis is the core technology of natural language processing and it is the cornerstone for further linguistic analysis. This paper, first introduces the basic grammatical system and summary the technology of current parsing. Then analysis the characteristics of probabilistic context-free grammars deep and introduce the method of improving for probabilistic context-free. The last we point the difficulty of Chinese parsing.


Author(s):  
TIAN-SHUN YAO

With the word-based theory of natural language processing, a word-based Chinese language understanding system has been developed. In the light of psychological language analysis and the features of the Chinese language, this theory of natural language processing is presented with the description of the computer programs based on it. The heart of the system is to define a Total Information Dictionary and the World Knowledge Source used in the system. The purpose of this research is to develop a system which can understand not only Chinese sentences but also the whole text.


2015 ◽  
Vol 21 (5) ◽  
pp. 699-724 ◽  
Author(s):  
LILI KOTLERMAN ◽  
IDO DAGAN ◽  
BERNARDO MAGNINI ◽  
LUISA BENTIVOGLI

AbstractIn this work, we present a novel type of graphs for natural language processing (NLP), namely textual entailment graphs (TEGs). We describe the complete methodology we developed for the construction of such graphs and provide some baselines for this task by evaluating relevant state-of-the-art technology. We situate our research in the context of text exploration, since it was motivated by joint work with industrial partners in the text analytics area. Accordingly, we present our motivating scenario and the first gold-standard dataset of TEGs. However, while our own motivation and the dataset focus on the text exploration setting, we suggest that TEGs can have different usages and suggest that automatic creation of such graphs is an interesting task for the community.


2018 ◽  
Vol 24 (3) ◽  
pp. 393-413 ◽  
Author(s):  
STELLA FRANK ◽  
DESMOND ELLIOTT ◽  
LUCIA SPECIA

AbstractTwo studies on multilingual multimodal image description provide empirical evidence towards two questions at the core of the task: (i) whether target language speakers prefer descriptions generated directly in their native language, as compared to descriptions translated from a different language; (ii) whether images improve human translation of descriptions. These results provide guidance for future work in multimodal natural language processing by first showing that on the whole, translations are not distinguished from native language descriptions, and second delineating and quantifying the information gained from the image during the human translation task.


2021 ◽  
Vol 7 ◽  
pp. e508
Author(s):  
Sara Renjit ◽  
Sumam Idicula

Natural language inference (NLI) is an essential subtask in many natural language processing applications. It is a directional relationship from premise to hypothesis. A pair of texts is defined as entailed if a text infers its meaning from the other text. The NLI is also known as textual entailment recognition, and it recognizes entailed and contradictory sentences in various NLP systems like Question Answering, Summarization and Information retrieval systems. This paper describes the NLI problem attempted for a low resource Indian language Malayalam, the regional language of Kerala. More than 30 million people speak this language. The paper is about the Malayalam NLI dataset, named MaNLI dataset, and its application of NLI in Malayalam language using different models, namely Doc2Vec (paragraph vector), fastText, BERT (Bidirectional Encoder Representation from Transformers), and LASER (Language Agnostic Sentence Representation). Our work attempts NLI in two ways, as binary classification and as multiclass classification. For both the classifications, LASER outperformed the other techniques. For multiclass classification, NLI using LASER based sentence embedding technique outperformed the other techniques by a significant margin of 12% accuracy. There was also an accuracy improvement of 9% for LASER based NLI system for binary classification over the other techniques.


Author(s):  
L.A. Zadeh

<p>I feel honored by the dedication of the Special Issue of IJCCC to me. I should like to express my deep appreciation to the distinguished Co-Editors and my good friends, Professors Balas, Dzitac and Teodorescu, and to distinguished contributors, for honoring me. The subjects which are addressed in the Special Issue are on the frontiers of fuzzy logic.<br /> <br /> The Foreword gives me an opportunity to share with the readers of the Journal my recent thoughts regarding a subject which I have been pondering about for many years - fuzzy logic and natural languages. The first step toward linking fuzzy logic and natural languages was my 1973 paper," Outline of a New Approach to the Analysis of Complex Systems and Decision Processes." Two key concepts were introduced in that paper. First, the concept of a linguistic variable - a variable which takes words as values; and second, the concept of a fuzzy if- then rule - a rule in which the antecedent and consequent involve linguistic variables. Today, close to forty years later, these concepts are widely used in most applications of fuzzy logic.<br /> <br /> The second step was my 1978 paper, "PRUF - a Meaning Representation Language for Natural Languages." This paper laid the foundation for a series of papers in the eighties in which a fairly complete theory of fuzzy - logic-based semantics of natural languages was developed. My theory did not attract many followers either within the fuzzy logic community or within the linguistics and philosophy of languages communities. There is a reason. The fuzzy logic community is largely a community of engineers, computer scientists and mathematicians - a community which has always shied away from semantics of natural languages. Symmetrically, the linguistics and philosophy of languages communities have shied away from fuzzy logic.<br /> <br /> In the early nineties, a thought that began to crystallize in my mind was that in most of the applications of fuzzy logic linguistic concepts play an important, if not very visible role. It is this thought that motivated the concept of Computing with Words (CW or CWW), introduced in my 1996 paper "Fuzzy Logic = Computing with Words." In essence, Computing with Words is a system of computation in which the objects of computation are words, phrases and propositions drawn from a natural language. The same can be said about Natural Language Processing (NLP.) In fact, CW and NLP have little in common and have altogether different agendas.<br /> <br /> In large measure, CW is concerned with solution of computational problems which are stated in a natural language. Simple example. Given: Probably John is tall. What is the probability that John is short? What is the probability that John is very short? What is the probability that John is not very tall? A less simple example. Given: Usually Robert leaves office at about 5 pm. Typically it takes Robert about an hour to get home from work. What is the probability that Robert is home at 6:l5 pm.? What should be noted is that CW is the only system of computation which has the capability to deal with problems of this kind. The problem-solving capability of CW rests on two key ideas. First, employment of so-called restriction-based semantics (RS) for translation of a natural language into a mathematical language in which the concept of a restriction plays a pivotal role; and second, employment of a calculus of restrictions - a calculus which is centered on the Extension Principle of fuzzy logic.<br /> <br /> What is thought-provoking is that neither traditional mathematics nor standard probability theory has the capability to deal with computational problems which are stated in a natural language. Not having this capability, it is traditional to dismiss such problems as ill-posed. In this perspective, perhaps the most remarkable contribution of CW is that it opens the door to empowering of mathematics with a fascinating capability - the capability to construct mathematical solutions of computational problems which are stated in a natural language. The basic importance of this capability derives from the fact that much of human knowledge, and especially world knowledge, is described in natural language.<br /> <br /> In conclusion, only recently did I begin to realize that the formalism of CW suggests a new and challenging direction in mathematics - mathematical solution of computational problems which are stated in a natural language. For mathematics, this is an unexplored territory.</p>


Author(s):  
Sebastião Pais ◽  
Gaël Dias

In this work we present a new unsupervised and language-independent methodology to detect relations of textual generality, for this, we introduce a particular case of textual entailment (TE), namely Textual Entailment by Generality (TEG). TE aims to capture primary semantic inference needs across applications in Natural Language Processing (NLP). Since 2005, in the TE recognition (RTE) task, systems are asked to automatically judge whether the meaning of a portion of the text, the Text - T, entails the meaning of another text, the Hypothesis - H. Several novel approaches and improvements in TE technologies demonstrated in RTE Challenges are signalling of renewed interest towards a more in-depth and better understanding of the core phenomena involved in TE. In line with this direction, in this work, we focus on a particular case of entailment, entailment by generality, to detect relations of textual generality. In-text, there are different kinds of entailment, yielded from different types of implicative reasoning (lexical, syntactical, common sense based), but here we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more general one. Therefore, we have T&rarr;GH whenever the premise T entails the hypothesis H, being it also more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, from a pair &lang;T,H&rang; having an entailment relation. To this end, we introduce an Informative Asymmetric Measure (IAM) called Simplified Asymmetric InfoSimba (AISs), which we combine with different Asymmetric Association Measures (AAM). In this work, we hypothesize the existence of a particular mode of TE, namely TEG. Thus, the main contribution of our study is to highlight the importance of this inference mechanism. Consequently, the new annotation data seems to be a valuable resource for the community.


Author(s):  
Louis Massey ◽  
Wilson Wong

This chapter explores the problem of topic identification from text. It is first argued that the conventional representation of text as bag-of-words vectors will always have limited success in arriving at the underlying meaning of text until the more fundamental issues of feature independence in vector-space and ambiguity of natural language are addressed. Next, a groundbreaking approach to text representation and topic identification that deviates radically from current techniques used for document classification, text clustering, and concept discovery is proposed. This approach is inspired by human cognition, which allows ‘meaning’ to emerge naturally from the activation and decay of unstructured text information retrieved from the Web. This paradigm shift allows for the exploitation rather than avoidance of dependence between terms to derive meaning without the complexity introduced by conventional natural language processing techniques. Using the unstructured texts in Web pages as a source of knowledge alleviates the laborious handcrafting of formal knowledge bases and ontologies that are required by many existing techniques. Some initial experiments have been conducted, and the results are presented in this chapter to illustrate the power of this new approach.


Author(s):  
A. Egemen Yilmaz ◽  
I. Berk Yilmaz

Requirement analysis is the very first and crucial step in the software development processes. Stating the requirements in a clear manner, not only eases the following steps in the process, but also reduces the number of potential errors. In this chapter, techniques for the improvement of the requirements expressed in the natural language are revisited. These techniques try to check the requirement quality attributes via lexical and syntactic analysis methods sometimes with generic, and sometimes domain and application specific knowledge bases.


Sign in / Sign up

Export Citation Format

Share Document