Unsupervised Induction of Meaningful Semantic Classes through Selectional Preferences

We describe a probabilistic framework for acquiring selectional preferences of linguistic predicates and for using the acquired representations to model the effects of context on word meaning. Our framework uses Bayesian latent-variable models inspired by, and extending, the well-known Latent Dirichlet Allocation (LDA) model of topical structure in documents; when applied to predicate–argument data, topic models automatically induce semantic classes of arguments and assign each predicate a distribution over those classes. We consider LDA and a number of extensions to the model and evaluate them on a variety of semantic prediction tasks, demonstrating that our approach attains state-of-the-art performance. More generally, we argue that probabilistic methods provide an effective and flexible methodology for distributional semantics.

Download Full-text

Categories and Frequency: Cognition Verbs in Spanish Subject Expression

Languages ◽

10.3390/languages6030126 ◽

2021 ◽

Vol 6 (3) ◽

pp. 126

Author(s):

Catherine E. Travis ◽

Rena Torres Cacoullos

Keyword(s):

High Frequency ◽

Fixed Effects ◽

Variable Structure ◽

Semantic Classes ◽

Variation Patterns ◽

Subject Pronoun ◽

Subject Expression ◽

Linguistic Categories ◽

Analysis Of Variation ◽

New Evidence

Are semantic classes of verbs genuine or do they merely mask idiosyncrasies of frequent verbs? Here, we examine the interplay between semantic classes and frequent verb-form combinations, providing new evidence from variation patterns in spontaneous speech that linguistic categories are centered on high frequency members to which other members are similar. We offer an account of the well-known favoring effect of cognition verbs on Spanish subject pronoun expression by considering the role of high-frequency verbs (e.g., creer ‘think’ and saber ‘know’) and particular expressions ((yo) creo ‘I think’, (yo) no sé ‘I don’t know’). Analysis of variation in nearly 3000 tokens of unexpressed and pronominal subjects in conversational data replicates well-established predictors, but highlights that the cognition verb effect is really one of 1sg cognition verbs. In addition, particular expressions stand out for their high frequency relative to their component parts (for (yo) creo, proportion of lexical type, and proportion of pronoun). Further analysis of 1sg verbs with frequent expressions as fixed effects reveals shared patterns with other cognition verbs, including an association with non-coreferential contexts. Thus, classes can be identified by variation constraints and contextual distributions that are shared among class members and are measurably different from those of the more general variable structure. Cognition verbs in variable Spanish subject expression form a class anchored in lexically particular constructions.

Download Full-text

Compositional Distributional Semantics with Syntactic Dependencies and Selectional Preferences

Applied Sciences ◽

10.3390/app11125743 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5743

Author(s):

Pablo Gamallo

Keyword(s):

English Language ◽

State Of The Art ◽

Current Work ◽

Distributional Semantics ◽

Compositional Model ◽

Compositional Approach ◽

Sentence Similarity ◽

Selectional Preferences ◽

Syntactic Dependencies ◽

Compositional Distributional Semantics

This article describes a compositional model based on syntactic dependencies which has been designed to build contextualized word vectors, by following linguistic principles related to the concept of selectional preferences. The compositional strategy proposed in the current work has been evaluated on a syntactically controlled and multilingual dataset, and compared with Transformer BERT-like models, such as Sentence BERT, the state-of-the-art in sentence similarity. For this purpose, we created two new test datasets for Portuguese and Spanish on the basis of that defined for the English language, containing expressions with noun-verb-noun transitive constructions. The results we have obtained show that the linguistic-based compositional approach turns out to be competitive with Transformer models.

Download Full-text

Niebieski ptak und cukier biały – Eine Klassifikation und Korpusanalyse der Funktion und Wortfolge polnischer Farbadjektive

Zeitschrift für Slawistik ◽

10.1515/slaw-2020-0005 ◽

2020 ◽

Vol 65 (1) ◽

pp. 96-133

Author(s):

Christina Clasmeier

Keyword(s):

Empirical Analysis ◽

Noun Phrase ◽

Corpus Analysis ◽

Semantic Classes ◽

Respective Function ◽

Attributive Adjectives ◽

Linguistic Reality

SummaryThis paper investigates the position of Polish color adjectives in their attributive function in the noun phrase. In general, Polish attributive adjectives may precede the noun (AN) or follow it (NA). There is rich literature on this issue, especially on the motivation for AN or NA order in particular semantic classes of adjectives or types of adjective-noun constructions. However, most of the contributions are theoretical in nature and account for only a part of linguistic reality but fail to capture the entire scope of data. One of the reasons for this might be that, so far, no systematic empirical analysis of this specific syntactical phenomenon has been conducted. This paper presents the results from a corpus analysis (NKJP) of 203 noun-with-color-adjective constructions and their AN/NA distributions. These constructions were classified based on the color adjective’s function (qualifying, classificatory, or part of an idiom). The results show that, regardless of its respective function, Polish color adjectives typically tend to appear in the AN order.

Download Full-text

Induction of Semantic Classes Based on Coordinate Patterns

2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology ◽

10.1109/wi-iat.2011.66 ◽

2011 ◽

Author(s):

Likun Qiu ◽

Yunfang Wu ◽

Jing Shi ◽

Yanqiu Shao ◽

Zhiyi Long

Keyword(s):

Semantic Classes

Download Full-text

Productivity of French and Dutch (semi-)copular constructions and the adverse impact of high token frequency

International Journal of Corpus Linguistics ◽

10.1075/ijcl.19043.van ◽

2021 ◽

Author(s):

Niek Van Wettere

Keyword(s):

Frequency Distribution ◽

The Other ◽

Adverse Impact ◽

Frequent Type ◽

Token Frequency ◽

Semantic Classes ◽

Other Hand ◽

The Subject ◽

The One

Abstract This paper examines the productivity of the subject complement slot in a set of French and Dutch (semi-)copular micro-constructions. The presumed counterpart of productivity, conventionalization in the form of high token frequency, will also be taken into account in the analysis of the productivity complex. On the one hand, it will be shown that prototypical copulas generally have a higher productivity than semi-copulas, although there are some semi-copulas that can rival the productivity of prototypical copulas. On the other hand, it will be demonstrated that high token frequency is in general detrimental to productivity, on the level of the entire subject complement slot and on the level of the different semantic classes. However, the shape of the frequency distribution also seems to play a role: multiple highly frequent types are in my data more detrimental to productivity than one extremely frequent type, although the semantic connectedness of the types in the distribution might also be an explanatory factor.

Download Full-text

Semantic Classes and Relevant Domains on WSD

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/978-3-319-10816-2_21 ◽

2014 ◽

pp. 166-172

Author(s):

Rubén Izquierdo ◽

Sonia Vázquez ◽

Andrés Montoyo

Keyword(s):

Semantic Classes

Download Full-text

Uzbek re-modeled

Journal of Language and Politics ◽

10.1075/jlp.15025.cat ◽

2017 ◽

Vol 16 (2) ◽

pp. 313-333

Author(s):

Lydia Catedral

Keyword(s):

Language Use ◽

Language Planning ◽

Russian Language ◽

Semantic Classes ◽

Unique Approach ◽

Lexical Items ◽

The Relationship

Abstract This study investigates the relationship between Russian language use and language planning in the context of newly independent, post-soviet Uzbekistan (1991–1992). It is guided by the question: In what ways does the use of Russian loanwords in Uzbek language newspapers accomplish language planning in newly independent Uzbekistan? The main finding from this analysis is that post-independence use of Russian loanwords from particular semantic classes in particular contexts reinforce overtly stated ideologies about Russian and construct difference between soviet Uzbekistan and independent Uzbekistan. These findings demonstrate the need to reexamine the role of Russian language in post-soviet contexts, and they contribute a unique approach to analyzing links between lexical items and ideology in language planning.

Download Full-text

Corpora and intuition: a study of Mandarin Chinese adverbial clauses and subjecthood

Corpora ◽

10.3366/cor.2006.1.2.187 ◽

2006 ◽

Vol 1 (2) ◽

pp. 187-216 ◽

Cited By ~ 1

Author(s):

May L-Y Wong

Keyword(s):

Mandarin Chinese ◽

Semantic Function ◽

Text Type ◽

Corpus Linguistic ◽

Semantic Classes ◽

New Information ◽

Theoretical Assumptions ◽

Adverbial Clauses ◽

Adverbial Clause ◽

Linguistic Methods

This paper presents a corpus-based approach to investigating the distribution of adverbial clauses and their subjects (overt vs. non-overt) in spoken and written Mandarin Chinese. It argues that the choice of subject type is determined by three variables, namely, given-new information, semantic function of adverbial clause and text type. In written Chinese, the distribution of subject types varies across semantic classes of adverbial clauses, but not across text categories. The influence of semantic classes on the distribution of subject types, however, depends on text type. For the same semantic function, the decision as to whether to include a subject is governed by given and new information. In contrasting the distribution of subject types of adverbial clauses across speech and writing, it was found that both spoken and written Chinese use more overt subjects in clauses of reason. Methodologically, this study demonstrates how quantitative corpus-linguistic methods can be used to supplement introspective theoretical assumptions with authentic, observable evidence in order to gain better insights into the behaviour of adverbial clauses in speech and writing.

Download Full-text

Leveraging Selectional Preferences for Anomaly Detection in Newswire Events

Applied Cloud Deep Semantic Recognition ◽

10.1201/9781351119023-2 ◽

2018 ◽

pp. 25-35

Author(s):

Pradeep Dasigi ◽

Eduard Hovy

Keyword(s):

Anomaly Detection ◽

Selectional Preferences

Download Full-text