Distributional Similarity in the Varied Order of Syntactic Spaces

AbstractThe distributional similarity methods have proven to be a valuable tool for the induction of semantic similarity. Until now, most algorithms use two-way co-occurrence data to compute the meaning of words. Co-occurrence frequencies, however, need not be pairwise. One can easily imagine situations where it is desirable to investigate co-occurrence frequencies of three modes and beyond. This paper will investigate tensor factorization methods to build a model of three-way co-occurrences. The approach is applied to the problem of selectional preference induction, and automatically evaluated in a pseudo-disambiguation task. The results show that tensor factorization, and non-negative tensor factorization in particular, is a promising tool for Natural Language Processing (nlp).

Download Full-text

Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics ◽

10.1162/089120105775299122 ◽

2005 ◽

Vol 31 (4) ◽

pp. 439-475 ◽

Cited By ~ 65

Author(s):

Julie Weeds ◽

David Weir

Keyword(s):

Language Processing ◽

Similarity Measures ◽

Document Retrieval ◽

Wide Range ◽

Distributional Similarity ◽

Potential Applications ◽

The Difference ◽

Occurrence Type ◽

Definition Of ◽

Relationship Of

Techniques that exploit knowledge of distributional similarity between words have been proposed in many areas of Natural Language Processing. For example, in language modeling, the sparse data problem can be alleviated by estimating the probabilities of unseen co-occurrences of events from the probabilities of seen co-occurrences of similar events. In other applications, distributional similarity is taken to be an approximation to semantic similarity. However, due to the wide range of potential applications and the lack of a strict definition of the concept of distributional similarity, many methods of calculating distributional similarity have been proposed or adopted. In this work, a flexible, parameterized framework for calculating distributional similarity is proposed. Within this framework, the problem of finding distributionally similar words is cast as one of co-occurrence retrieval (CR) for which precision and recall can be measured by analogy with the way they are measured in document retrieval. As will be shown, a number of popular existing measures of distributional similarity are simulated with parameter settings within the CR framework. In this article, the CR framework is then used to systematically investigate three fundamental questions concerning distributional similarity. First, is the relationship of lexical similarity necessarily symmetric, or are there advantages to be gained from considering it as an asymmetric relationship? Second, are some co-occurrences inherently more salient than others in the calculation of distributional similarity? Third, is it necessary to consider the difference in the extent to which each word occurs in each co-occurrence type? Two application-based tasks are used for evaluation: automatic thesaurus generation and pseudo-disambiguation. It is possible to achieve significantly better results on both these tasks by varying the parameters within the CR framework rather than using other existing distributional similarity measures; it will also be shown that any single unparameterized measure is unlikely to be able to do better on both tasks. This is due to an inherent asymmetry in lexical substitutability and therefore also in lexical distributional similarity.

Download Full-text

Reducing semantic drift with bagging and distributional similarity

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09 ◽

10.3115/1687878.1687935 ◽

2009 ◽

Cited By ~ 8

Author(s):

Tara McIntosh ◽

James R. Curran

Keyword(s):

Distributional Similarity

Download Full-text

Characterising measures of lexical distributional similarity

10.3115/1220355.1220501 ◽

2004 ◽

Cited By ~ 62

Author(s):

Julie Weeds ◽

David Weir ◽

Diana McCarthy

Keyword(s):

Distributional Similarity

Download Full-text

Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics ◽

10.1162/coli.2006.32.1.13 ◽

2006 ◽

Vol 32 (1) ◽

pp. 13-47 ◽

Cited By ~ 646

Author(s):

Alexander Budanitsky ◽

Graeme Hirst

Keyword(s):

Information Content ◽

Semantic Relatedness ◽

Real Word ◽

Spelling Errors ◽

Lexical Semantic ◽

Distributional Similarity ◽

Central Resource ◽

Word Spelling

The quantification of lexical semantic relatedness has many applications in NLP, and many different measures have been proposed. We evaluate five of these measures, all of which use WordNet as their central resource, by comparing their performance in detecting and correcting real-word spelling errors. An information-content-based measure proposed by Jiang and Conrath is found superior to those proposed by Hirst and St-Onge, Leacock and Chodorow, Lin, and Resnik. In addition, we explain why distributional similarity is not an adequate proxy for lexical semantic relatedness.

Download Full-text

Bootstrapping Distributional Feature Vector Quality

Computational Linguistics ◽

10.1162/coli.08-032-r1-06-96 ◽

2009 ◽

Vol 35 (3) ◽

pp. 435-461 ◽

Cited By ~ 11

Author(s):

Maayan Zhitomirsky-Geffet ◽

Ido Dagan

Keyword(s):

Feature Vector ◽

Similarity Measures ◽

Feature Reduction ◽

Feature Weighting ◽

Superior Performance ◽

Weighting Functions ◽

Word Similarity ◽

Feature Vectors ◽

Distributional Similarity

This article presents a novel bootstrapping approach for improving the quality of feature vector weighting in distributional word similarity. The method was motivated by attempts to utilize distributional similarity for identifying the concrete semantic relationship of lexical entailment. Our analysis revealed that a major reason for the rather loose semantic similarity obtained by distributional similarity methods is insufficient quality of the word feature vectors, caused by deficient feature weighting. This observation led to the definition of a bootstrapping scheme which yields improved feature weights, and hence higher quality feature vectors. The underlying idea of our approach is that features which are common to similar words are also most characteristic for their meanings, and thus should be promoted. This idea is realized via a bootstrapping step applied to an initial standard approximation of the similarity space. The superior performance of the bootstrapping method was assessed in two different experiments, one based on direct human gold-standard annotation and the other based on an automatically created disambiguation dataset. These results are further supported by applying a novel quantitative measurement of the quality of feature weighting functions. Improved feature weighting also allows massive feature reduction, which indicates that the most characteristic features for a word are indeed concentrated at the top ranks of its vector. Finally, experiments with three prominent similarity measures and two feature weighting functions showed that the bootstrapping scheme is robust and is independent of the original functions over which it is applied.

Download Full-text