scholarly journals Asymmetric Attributional Word Similarity Measures to Detect the Relations of Textual Generality

Computers ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 81
Author(s):  
Sebastião Pais ◽  
Gaël Dias

In this work, we present a new unsupervised and language-independent methodology to detect the relations of textual generality. For this, we introduce a particular case of Textual Entailment (TE), namely Textual Entailment by Generality (TEG). TE aims to capture primary semantic inference needs across applications in Natural Language Processing (NLP). Since 2005, in the TE Recognition (RTE) task, systems have been asked to automatically judge whether the meaning of a portion of the text, the Text (T), entails the meaning of another text, the Hypothesis (H). Several novel approaches and improvements in TE technologies demonstrated in RTE Challenges are signaling renewed interest towards a more in-depth and better understanding of the core phenomena involved in TE. In line with this direction, in this work, we focus on a particular case of entailment, entailment by generality, to detect the relations of textual generality. In text, there are different kinds of entailments, yielded from different types of implicative reasoning (lexical, syntactical, common sense based), but here, we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more general one. Therefore, we have T→GH whenever the premise T entails the hypothesis H, this also being more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, from a pair ⟨T,H⟩ having an entailment relation. To this end, we introduce an Informative Asymmetric Measure (IAM) called Simplified Asymmetric InfoSimba (AISs), which we combine with different Asymmetric Association Measures (AAM). In this work, we hypothesize about the existence of a particular mode of TE, namely TEG. Thus, the main contribution of our study is highlighting the importance of this inference mechanism. Consequently, the new annotation data seem to be a valuable resource for the community.

Author(s):  
Sebastião Pais ◽  
Gaël Dias

In this work we present a new unsupervised and language-independent methodology to detect relations of textual generality, for this, we introduce a particular case of textual entailment (TE), namely Textual Entailment by Generality (TEG). TE aims to capture primary semantic inference needs across applications in Natural Language Processing (NLP). Since 2005, in the TE recognition (RTE) task, systems are asked to automatically judge whether the meaning of a portion of the text, the Text - T, entails the meaning of another text, the Hypothesis - H. Several novel approaches and improvements in TE technologies demonstrated in RTE Challenges are signalling of renewed interest towards a more in-depth and better understanding of the core phenomena involved in TE. In line with this direction, in this work, we focus on a particular case of entailment, entailment by generality, to detect relations of textual generality. In-text, there are different kinds of entailment, yielded from different types of implicative reasoning (lexical, syntactical, common sense based), but here we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more general one. Therefore, we have T→GH whenever the premise T entails the hypothesis H, being it also more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, from a pair ⟨T,H⟩ having an entailment relation. To this end, we introduce an Informative Asymmetric Measure (IAM) called Simplified Asymmetric InfoSimba (AISs), which we combine with different Asymmetric Association Measures (AAM). In this work, we hypothesize the existence of a particular mode of TE, namely TEG. Thus, the main contribution of our study is to highlight the importance of this inference mechanism. Consequently, the new annotation data seems to be a valuable resource for the community.


2010 ◽  
Vol 16 (4) ◽  
pp. 359-389 ◽  
Author(s):  
LILI KOTLERMAN ◽  
IDO DAGAN ◽  
IDAN SZPEKTOR ◽  
MAAYAN ZHITOMIRSKY-GEFFET

AbstractDistributional word similarity is most commonly perceived as a symmetric relation. Yet, directional relations are abundant in lexical semantics and in many Natural Language Processing (NLP) settings that require lexical inference, making symmetric similarity measures less suitable for their identification. This paper investigates the nature of directional (asymmetric) similarity measures that aim to quantify distributional feature inclusion. We identify desired properties of such measures for lexical inference, specify a particular measure based on Average Precision that addresses these properties, and demonstrate the empirical benefit of directional measures for two different NLP datasets.


Author(s):  
Sebastian Padó ◽  
Ido Dagan

Textual entailment is a binary relation between two natural-language texts (called ‘text’ and ‘hypothesis’), where readers of the ‘text’ would agree the ‘hypothesis’ is most likely true (Peter is snoring → A man sleeps). Its recognition requires an account of linguistic variability ( an event may be realized in different ways, e.g. Peter buys the car ↔ The car is purchased by Peter) and of relationships between events (e.g. Peter buys the car → Peter owns the car). Unlike logics-based inference, textual entailment also covers cases of probable but still defeasible entailment (A hurricane hit Peter’s town → Peter’s town was damaged). Since human common-sense reasoning often involves such defeasible inferences, textual entailment is of considerable interest for real-world language processing tasks, as a generic, application-independent framework for semantic inference. This chapter discusses the history of textual entailment, approaches to recognizing it, and its integration in various NLP tasks.


2021 ◽  
Vol 10 (7) ◽  
pp. 474
Author(s):  
Bingqing Wang ◽  
Bin Meng ◽  
Juan Wang ◽  
Siyu Chen ◽  
Jian Liu

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-37
Author(s):  
Dhivya Chandrasekaran ◽  
Vijay Mago

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.


2017 ◽  
Author(s):  
Dat Duong ◽  
Wasi Uddin Ahmad ◽  
Eleazar Eskin ◽  
Kai-Wei Chang ◽  
Jingyi Jessica Li

AbstractThe Gene Ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. In this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this paper, we introduce two new solutions for this problem, by focusing instead on the definitions of the GO terms. We apply neural network based techniques from the natural language processing (NLP) domain. The first method does not rely on the GO tree, whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO definitions by treating them as two unordered sets of words. The word similarity is estimated by a word embedding model that maps words into an N-dimensional space. In our second approach, we account for the word-ordering within a sentence. We use a sentence encoder to embed GO definitions into vectors and estimate how likely one definition entails another. We validate our methods in two ways. In the first experiment, we test the model’s ability to differentiate a true protein-protein network from a randomly generated network. In the second experiment, we test the model in identifying orthologs from randomly-matched genes in human, mouse, and fly. In both experiments, a hybrid of NLP and GO-tree based method achieves the best classification accuracy.Availabilitygithub.com/datduong/NLPMethods2CompareGOterms


2018 ◽  
Author(s):  
Maria Montefinese ◽  
Erin Michelle Buchanan ◽  
David Vinson

Models of semantic representation predict that automatic priming is determined by associative and co-occurrence relations (i.e., spreading activation accounts), or to similarity in words' semantic features (i.e., featural models). Although, these three factors are correlated in characterizing semantic representation, they seem to tap different aspects of meaning. We designed two lexical decision experiments to dissociate these three different types of meaning similarity. For unmasked primes, we observed priming only due to association strength and not the other two measures; and no evidence for differences in priming for concrete and abstract concepts. For masked primes there was no priming regardless of the semantic relation. These results challenge theoretical accounts of automatic priming. Rather, they are in line with the idea that priming may be due to participants’ controlled strategic processes. These results provide important insight about the nature of priming and how association strength, as determined from word-association norms, relates to the nature of semantic representation.


Sign in / Sign up

Export Citation Format

Share Document