Benchmarking Applied Semantic Inference: The PASCAL Recognising Textual Entailment Challenges

Beside formal approaches to semantic inference that rely on logical representation of meaning, the notion of Textual Entailment (TE) has been proposed as an applied framework to capture major semantic inference needs across applications in Computational Linguistics. Although several approaches have been tried and evaluation campaigns have shown improvements in TE, a renewed interest is rising in the research community towards a deeper and better understanding of the core phenomena involved in textual inference. Pursuing this direction, we are convinced that crucial progress will derive from a focus on decomposing the complexity of the TE task into basic phenomena and on their combination. In this paper, we carry out a deep analysis on TE data sets, investigating the relations among two relevant aspects of semantic inferences: the logical dimension, i.e. the capacity of the inference to prove the conclusion from its premises, and the linguistic dimension, i.e. the linguistic devices used to accomplish the goal of the inference. We propose a decomposition approach over TE pairs, where single linguistic phenomena are isolated in what we have called atomic inference pairs, and we show that at this granularity level the actual correlation between the linguistic and the logical dimensions of semantic inferences emerges and can be empirically observed.

Download Full-text

Asymmetric Attributional Word Similarities Measures to detect Relations of Textual Generality

10.20944/preprints202008.0210.v1 ◽

2020 ◽

Author(s):

Sebastião Pais ◽

Gaël Dias

Keyword(s):

Language Processing ◽

Independent Method ◽

Inference Mechanism ◽

Semantic Inference ◽

The Core ◽

Annotation Data ◽

Entailment Relation ◽

Different Types ◽

Textual Entailment ◽

Task Systems

In this work we present a new unsupervised and language-independent methodology to detect relations of textual generality, for this, we introduce a particular case of textual entailment (TE), namely Textual Entailment by Generality (TEG). TE aims to capture primary semantic inference needs across applications in Natural Language Processing (NLP). Since 2005, in the TE recognition (RTE) task, systems are asked to automatically judge whether the meaning of a portion of the text, the Text - T, entails the meaning of another text, the Hypothesis - H. Several novel approaches and improvements in TE technologies demonstrated in RTE Challenges are signalling of renewed interest towards a more in-depth and better understanding of the core phenomena involved in TE. In line with this direction, in this work, we focus on a particular case of entailment, entailment by generality, to detect relations of textual generality. In-text, there are different kinds of entailment, yielded from different types of implicative reasoning (lexical, syntactical, common sense based), but here we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more general one. Therefore, we have T→GH whenever the premise T entails the hypothesis H, being it also more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, from a pair &lang;T,H&rang; having an entailment relation. To this end, we introduce an Informative Asymmetric Measure (IAM) called Simplified Asymmetric InfoSimba (AISs), which we combine with different Asymmetric Association Measures (AAM). In this work, we hypothesize the existence of a particular mode of TE, namely TEG. Thus, the main contribution of our study is to highlight the importance of this inference mechanism. Consequently, the new annotation data seems to be a valuable resource for the community.

Download Full-text

Asymmetric Attributional Word Similarity Measures to Detect the Relations of Textual Generality

Computers ◽

10.3390/computers9040081 ◽

2020 ◽

Vol 9 (4) ◽

pp. 81

Author(s):

Sebastião Pais ◽

Gaël Dias

Keyword(s):

Language Processing ◽

Similarity Measures ◽

Inference Mechanism ◽

Word Similarity ◽

Semantic Inference ◽

Annotation Data ◽

Entailment Relation ◽

Different Types ◽

Textual Entailment ◽

Task Systems

In this work, we present a new unsupervised and language-independent methodology to detect the relations of textual generality. For this, we introduce a particular case of Textual Entailment (TE), namely Textual Entailment by Generality (TEG). TE aims to capture primary semantic inference needs across applications in Natural Language Processing (NLP). Since 2005, in the TE Recognition (RTE) task, systems have been asked to automatically judge whether the meaning of a portion of the text, the Text (T), entails the meaning of another text, the Hypothesis (H). Several novel approaches and improvements in TE technologies demonstrated in RTE Challenges are signaling renewed interest towards a more in-depth and better understanding of the core phenomena involved in TE. In line with this direction, in this work, we focus on a particular case of entailment, entailment by generality, to detect the relations of textual generality. In text, there are different kinds of entailments, yielded from different types of implicative reasoning (lexical, syntactical, common sense based), but here, we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more general one. Therefore, we have T→GH whenever the premise T entails the hypothesis H, this also being more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, from a pair ⟨T,H⟩ having an entailment relation. To this end, we introduce an Informative Asymmetric Measure (IAM) called Simplified Asymmetric InfoSimba (AISs), which we combine with different Asymmetric Association Measures (AAM). In this work, we hypothesize about the existence of a particular mode of TE, namely TEG. Thus, the main contribution of our study is highlighting the importance of this inference mechanism. Consequently, the new annotation data seem to be a valuable resource for the community.

Download Full-text

Textual Entailment

The Oxford Handbook of Computational Linguistics 2nd edition ◽

10.1093/oxfordhb/9780199573691.013.024 ◽

2016 ◽

Author(s):

Sebastian Padó ◽

Ido Dagan

Keyword(s):

Binary Relation ◽

Language Processing ◽

Common Sense ◽

Real World ◽

World Language ◽

Semantic Inference ◽

Common Sense Reasoning ◽

Textual Entailment ◽

History Of ◽

Defeasible Inferences

Textual entailment is a binary relation between two natural-language texts (called ‘text’ and ‘hypothesis’), where readers of the ‘text’ would agree the ‘hypothesis’ is most likely true (Peter is snoring → A man sleeps). Its recognition requires an account of linguistic variability ( an event may be realized in different ways, e.g. Peter buys the car ↔ The car is purchased by Peter) and of relationships between events (e.g. Peter buys the car → Peter owns the car). Unlike logics-based inference, textual entailment also covers cases of probable but still defeasible entailment (A hurricane hit Peter’s town → Peter’s town was damaged). Since human common-sense reasoning often involves such defeasible inferences, textual entailment is of considerable interest for real-world language processing tasks, as a generic, application-independent framework for semantic inference. This chapter discusses the history of textual entailment, approaches to recognizing it, and its integration in various NLP tasks.

Download Full-text

Semantic inference at the lexical-syntactic level for textual entailment recognition

10.3115/1654536.1654563 ◽

2007 ◽

Cited By ~ 9

Author(s):

Roy Bar-Haim ◽

Ido Dagan ◽

Iddo Greental ◽

Idan Szpektor ◽

Moshe Friedman

Keyword(s):

Semantic Inference ◽

Textual Entailment

Download Full-text

Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing - RTE '07

10.3115/1654536 ◽

2007 ◽

Cited By ~ 5

Keyword(s):

Textual Entailment

Download Full-text

Leveraging Road Characteristics and Contributor Behaviour for Assessing Road Type Quality in OSM

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070436 ◽

2021 ◽

Vol 10 (7) ◽

pp. 436

Author(s):

Amerah Alghanim ◽

Musfira Jilani ◽

Michela Bertolotto ◽

Gavin McArdle

Keyword(s):

Machine Learning ◽

Spatial Data ◽

Classification Accuracy ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Data Set ◽

Semantic Inference ◽

Road Type ◽

The Impact

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.

Download Full-text