scholarly journals Building Corpus-Based Semantic Classifications of Some Tatar Affixes

10.29007/kcnh ◽  
2018 ◽  
Author(s):  
Olga Nevzorova ◽  
Alfiia Galieva ◽  
Dzhavdet Suleymanov

This study is aimed at exploring the semantic properties of Tatar affixes. Turkic languages have complicated morphology and syntax, which is a challenge for language processing.The fundamental principle of inflection and derivation in Tatar, as well as in other Turkic languages, is agglutination, when the stem joins postpositive affixes in a strictly determined order.The Tatar language has affixes of different types:a) derivational affixes expressing only lexical meaning and forming new words;b) inflectional affixes changing the word form (for example, case affixes);c) affixes serving as means of derivation as well as inflection.The current study is devoted to the ambiguous Tatar –lık polyfunctional affix which may be joined to nominal, adjectival and verbal stems and form derivatives of different types depending on contextual environment, the meaning of the stem and the composition of the affixal chain of a derivative. -Lık affix is a productive affix in modern Tatar which builds nominal, adjectival and verbal derivatives.The answer to the question of the number of the types of derivatives and word forms produced with -lık affix is not trivial, and different researchers distinguish different types of derivatives.Based on a thorough analysis of Tatar derivatives containing - lık affix we identified some empirical features of these constructs and then performed their manual and automatic classification. Four classes were distinguished. For our experiments we used data from the Tatar National Corpus “Tugan Tel” (http://corpus.antat.ru).The results obtained may be used for disambiguation in Tatar National Corpus and for analyzing other Tatar ambiguous affixes.

Author(s):  
Boris A. Musukov

The article is devoted to a comprehensive study of the lexico-semantic, derivational, word-composition, inflectional features of the attributive component of descriptive phrases, the indicator of variegated color designation, the term-forming unit of the word ala «potty, spotted, piebald» (about the color of animals) (Bashkir); «motley, multicolored, spotted; light» (Karachay-Balkarian) in Bashkir and Karachay-Balkarian languages, participating in the categorization of mixed segments of the lingua-color space. It examines the features of free and lexicalized phrases, idioms of a phraseological type, paired-repeated constructions formed as a result of combining the word «ala» with denotative and abstract nouns, with asemantic analogues that repeat the second syllable of the reference word and are its phonetically modified version without lexical meaning. It examines the distinctive and integrating features of the noted syntactic structures from the point of view of functioning in various language levels and lexicographic interpretation. The article analyzes the features of a selective combination of «double names» with inventory units of morphology in the process of further morphologization of two-syllable attributive-nominal bases, which, depending on the contextual environment, express an adverbial feature and belong to the category of «rolling» words.


2021 ◽  
pp. 161-173
Author(s):  
Paulina Pycia-Košćak

The article explores semantics and the use of two lexemes: periphery and margin. Both lexemes in dictionaries are explicitly or implicitly defined in opposition to the center and denote the surface, the area, the space that is away from it, which is ‘outside’. The first part analyzes their definitions in Croatian language dictionaries, primary and secondary meanings and similarities and differences in meanings. The second part covers the study of contexts in which they have been recorded and the correspondence of lexical meaning with a specified situation. The analyzed lexemes have similar range of meaning, so the article also questions their possible substitutability. Both lexemes are of foreign origin and in the original languages they refer to neutral categories, they have a denotative meaning. However, in the Croatian language, they also have a secondary, marked meaning, therefore the research takes into account (in)direct evaluation that indicates how these lexemes work in the mind of language user. The searching covers the problem of their marking and tries to answer the question whether they are always stigmatized as a negative sign of concepts that indicate what Croatian phrases can suggest (for example to be on / at the periphery of something, to be on the margins) or they can also be relied to positive features and affirm certain phenomena. The analysis is carried out on examples from the Croatian Language Corpus and the Croatian National Corpus, which allowed an overview of different types of discourses and texts.


2020 ◽  
Author(s):  
Julia Villalva ◽  
Belén Nieto-Ortega ◽  
Manuel Melle-Franco ◽  
Emilio Pérez

The motion of molecular fragments in close contact with atomically flat surfaces is still not fully understood. Does a more favourable interaction imply a larger barrier towards motion even if there are no obvious minima? Here, we use mechanically interlocked rotaxane-type derivatives of SWNTs (MINTs) featuring four different types of macrocycles with significantly different affinities for the SWNT thread as models to study this problem. Using molecular dynamics, we find that there is no direct correlation between the interaction energy of the macrocycle with the SWNT and its ability to move along or around it. Density functional tight-binding calculations reveal small (<2.5 Kcal·mol-1) activation barriers, the height of which correlates with the commensurability of the aromatic moieties in the macrocycle with the SWNT. Our results show that macrocycles in MINTs rotate and translate freely around and along SWNTs at room temperature, with an energetic cost lower than the rotation around the C−C bond in ethane.<br>


2021 ◽  
Vol 10 (7) ◽  
pp. 474
Author(s):  
Bingqing Wang ◽  
Bin Meng ◽  
Juan Wang ◽  
Siyu Chen ◽  
Jian Liu

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.


Author(s):  
Dominika Kováříková ◽  
Michal Škrabal ◽  
Václav Cvrček ◽  
Lucie Lukešová ◽  
Jiří Milička

Abstract When compiling a list of headwords, every lexicographer comes across words with an unattested representative dictionary form in the data. This study focuses on how to distinguish between the cases when this form is missing due to a lack of data and when there are some systemic or linguistic reasons. We have formulated lexicographic recommendations for different types of such ‘lacunas’ based on our research carried out on Czech written corpora. As a prerequisite, we calculated a frequency threshold to find words that should have the representative form attested in the data. Based on a manual analysis of 2,700 nouns, adjectives and verbs that do not, we drew up a classification of lacunas. The reasons for a missing dictionary form are often associated with limited collocability and non-preference for the representative grammatical category. Findings on unattested word forms also have significant implications for language potentiality.


2021 ◽  
Vol 12 ◽  
Author(s):  
Rundi Guo ◽  
Nick C. Ellis

A large body of psycholinguistic research demonstrates that both language processing and language acquisition are sensitive to the distributions of linguistic constructions in usage. Here we investigate how statistical distributions at different linguistic levels – morphological and lexical (Experiments 1 and 2), and phrasal (Experiment 2) – contribute to the ease with which morphosyntax is processed and produced by second language learners. We analyze Chinese ESL learners’ knowledge of four English inflectional morphemes: -ed, -ing, and third-person -s on verbs, and plural -s on nouns. In Elicited Imitation Tasks, participants listened to length- and difficulty-matched sentences each containing one target morpheme and typed the whole sentence as accurately as they could after a short delay. Experiment 1 investigated lexical and morphemic levels, testing the hypotheses that a morpheme is expected to be more easily processed when it is (1) highly available (i.e., occurring in frequent word-forms), and (2) highly reliable (i.e., occurring in lemma words that are consistently conjugated in the form containing this morpheme). Thirty sentences were made for each morpheme, divided into three Availability-Reliability Distribution (ARD) groups on the basis of corpus analysis in the Corpus of Contemporary American English (COCA; Davies, 2008-): 10 target words high in availability, 10 high in reliability, and 10 low in both reliability and availability. Responses were scored on whether the target morpheme was accurately reproduced given the provision of the correct lemma. A generalized linear mixed-effects logit model (GLMM) revealed fixed effects of morpheme type, availability, and reliability on the accuracy of morpheme provision. There were no effects of lemma frequency. Experiment 2 successfully replicated these results and extended the investigation to explore phrasal formulaicity by manipulating the frequency of the four-word strings in which the morpheme was embedded. GLMMs replicated the effects of word-form availability and reliability and additionally revealed independent phrase-superiority effects where morphemes were better reproduced in contexts of higher string-frequency. Taken together, these findings demonstrate that morpheme acquisition reflects the distributional properties of learners’ experience and the mappings therein between lexis, morphology, phraseology, and semantics. These conclusions support an emergentist view of the statistical symbolic learning of morphology where language acquisition involves the satisfaction of competing constraints across multiple grain-sizes of units.


Author(s):  
Martin Maiden

The historical morphology of the verb ‘snow’ in Francoprovençal presents a conundrum, in that it is clearly analogically influenced by the verb ‘rain’, for obvious reasons of lexical semantic similarity, but the locus of that influence is not the ‘root’ (the ostensible bearer of lexical meaning) but desinential inflexion-class members, which are in principle independent of any lexical meaning. Similar morphological changes are also identified for other Gallo-Romance verbs. It seems, in effect, that speakers can identify exponents of the lexical meaning of word-forms in linear sequences larger than the apparent ‘morphemic’ composition of those word-forms, even when such a composition may seem prima facie transparent and obvious. It is argued that these facts are inherently incompatible with ‘constructivist’, morpheme-based, models of morphology, and strongly compatible with what have been called ‘abstractivist’ (‘word-and-paradigm’) approaches, which generally take entire word-forms as the primary units of morphological analysis.


2017 ◽  
Vol 61 (1) ◽  
pp. 3-30 ◽  
Author(s):  
Odile Bagou ◽  
Ulrich Hans Frauenfelder

This study examines how French listeners segment and learn new words of artificial languages varying in the presence of different combinations of sublexical segmentation cues. The first experiment investigated the contribution of three different types of sublexical cues (acoustic-phonetic, phonological and prosodic cues) to word learning. The second experiment explored how participants specifically exploited sublexical prosodic cues. Whereas complementary cues signaling word-initial and word-final boundaries had synergistic effects on word learning in the first experiment, the two manipulated prosodic cues redundantly signaling word-final boundaries in the second experiment were rank-ordered with final pitch variations being more weighted than final lengthening. These results are discussed in light of the notions of cue type, cue position and cue efficiency.


Sign in / Sign up

Export Citation Format

Share Document