Unsupervised Induction of Meaningful Semantic Classes through Selectional Preferences

Author(s):  
Henry Anaya-Sánchez ◽  
Anselmo Peñas
2014 ◽  
Vol 40 (3) ◽  
pp. 587-631 ◽  
Author(s):  
Diarmuid Ó Séaghdha ◽  
Anna Korhonen

We describe a probabilistic framework for acquiring selectional preferences of linguistic predicates and for using the acquired representations to model the effects of context on word meaning. Our framework uses Bayesian latent-variable models inspired by, and extending, the well-known Latent Dirichlet Allocation (LDA) model of topical structure in documents; when applied to predicate–argument data, topic models automatically induce semantic classes of arguments and assign each predicate a distribution over those classes. We consider LDA and a number of extensions to the model and evaluate them on a variety of semantic prediction tasks, demonstrating that our approach attains state-of-the-art performance. More generally, we argue that probabilistic methods provide an effective and flexible methodology for distributional semantics.


Languages ◽  
2021 ◽  
Vol 6 (3) ◽  
pp. 126
Author(s):  
Catherine E. Travis ◽  
Rena Torres Cacoullos

Are semantic classes of verbs genuine or do they merely mask idiosyncrasies of frequent verbs? Here, we examine the interplay between semantic classes and frequent verb-form combinations, providing new evidence from variation patterns in spontaneous speech that linguistic categories are centered on high frequency members to which other members are similar. We offer an account of the well-known favoring effect of cognition verbs on Spanish subject pronoun expression by considering the role of high-frequency verbs (e.g., creer ‘think’ and saber ‘know’) and particular expressions ((yo) creo ‘I think’, (yo) no sé ‘I don’t know’). Analysis of variation in nearly 3000 tokens of unexpressed and pronominal subjects in conversational data replicates well-established predictors, but highlights that the cognition verb effect is really one of 1sg cognition verbs. In addition, particular expressions stand out for their high frequency relative to their component parts (for (yo) creo, proportion of lexical type, and proportion of pronoun). Further analysis of 1sg verbs with frequent expressions as fixed effects reveals shared patterns with other cognition verbs, including an association with non-coreferential contexts. Thus, classes can be identified by variation constraints and contextual distributions that are shared among class members and are measurably different from those of the more general variable structure. Cognition verbs in variable Spanish subject expression form a class anchored in lexically particular constructions.


2021 ◽  
Vol 11 (12) ◽  
pp. 5743
Author(s):  
Pablo Gamallo

This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models.


2020 ◽  
Vol 65 (1) ◽  
pp. 96-133
Author(s):  
Christina Clasmeier

SummaryThis paper investigates the position of Polish color adjectives in their attributive function in the noun phrase. In general, Polish attributive adjectives may precede the noun (AN) or follow it (NA). There is rich literature on this issue, especially on the motivation for AN or NA order in particular semantic classes of adjectives or types of adjective-noun constructions. However, most of the contributions are theoretical in nature and account for only a part of linguistic reality but fail to capture the entire scope of data. One of the reasons for this might be that, so far, no systematic empirical analysis of this specific syntactical phenomenon has been conducted. This paper presents the results from a corpus analysis (NKJP) of 203 noun-with-color-adjective constructions and their AN/NA distributions. These constructions were classified based on the color adjective’s function (qualifying, classificatory, or part of an idiom). The results show that, regardless of its respective function, Polish color adjectives typically tend to appear in the AN order.


Author(s):  
Niek Van Wettere

Abstract This paper examines the productivity of the subject complement slot in a set of French and Dutch (semi-)copular micro-constructions. The presumed counterpart of productivity, conventionalization in the form of high token frequency, will also be taken into account in the analysis of the productivity complex. On the one hand, it will be shown that prototypical copulas generally have a higher productivity than semi-copulas, although there are some semi-copulas that can rival the productivity of prototypical copulas. On the other hand, it will be demonstrated that high token frequency is in general detrimental to productivity, on the level of the entire subject complement slot and on the level of the different semantic classes. However, the shape of the frequency distribution also seems to play a role: multiple highly frequent types are in my data more detrimental to productivity than one extremely frequent type, although the semantic connectedness of the types in the distribution might also be an explanatory factor.


Author(s):  
Rubén Izquierdo ◽  
Sonia Vázquez ◽  
Andrés Montoyo
Keyword(s):  

2017 ◽  
Vol 16 (2) ◽  
pp. 313-333
Author(s):  
Lydia Catedral

Abstract This study investigates the relationship between Russian language use and language planning in the context of newly independent, post-soviet Uzbekistan (1991–1992). It is guided by the question: In what ways does the use of Russian loanwords in Uzbek language newspapers accomplish language planning in newly independent Uzbekistan? The main finding from this analysis is that post-independence use of Russian loanwords from particular semantic classes in particular contexts reinforce overtly stated ideologies about Russian and construct difference between soviet Uzbekistan and independent Uzbekistan. These findings demonstrate the need to reexamine the role of Russian language in post-soviet contexts, and they contribute a unique approach to analyzing links between lexical items and ideology in language planning.


Corpora ◽  
2006 ◽  
Vol 1 (2) ◽  
pp. 187-216 ◽  
Author(s):  
May L-Y Wong

This paper presents a corpus-based approach to investigating the distribution of adverbial clauses and their subjects (overt vs. non-overt) in spoken and written Mandarin Chinese. It argues that the choice of subject type is determined by three variables, namely, given-new information, semantic function of adverbial clause and text type. In written Chinese, the distribution of subject types varies across semantic classes of adverbial clauses, but not across text categories. The influence of semantic classes on the distribution of subject types, however, depends on text type. For the same semantic function, the decision as to whether to include a subject is governed by given and new information. In contrasting the distribution of subject types of adverbial clauses across speech and writing, it was found that both spoken and written Chinese use more overt subjects in clauses of reason. Methodologically, this study demonstrates how quantitative corpus-linguistic methods can be used to supplement introspective theoretical assumptions with authentic, observable evidence in order to gain better insights into the behaviour of adverbial clauses in speech and writing.


Sign in / Sign up

Export Citation Format

Share Document