scholarly journals Electronic dictionary of variants based on the text corpus

Author(s):  
Svetlana O. Savchuk ◽  

Electronic dictionary of variants is being created on the basis of the texts of the 18th -20th century in the Russian National Corpus. For its formation the list of the unidentified word-forms supplied with hypothetic lemmas is used. The database of these word-forms gives valuable material for the analysis of the orthographic, phonetic and grammar variation.

Author(s):  
Iryna Bundza

This article discusses the peculiarities of the category of number of Polish and Ukrainian nouns. To indicate the problem areas related to the teaching of the category of number to Ukrainian-speaking persons, the author analysed Polish and Ukrainian lexemes in terms of their fulfilments of the grammatical category of number. The article presents the contexts which may trigger errors, which in turn may cause a comical effect or distort communication. The data were collected from Polish and Ukrainian dictionaries, as well as the National Corpus of Polish and the Ukrainian Text Corpus.


2020 ◽  
Vol 11 (3) ◽  
pp. 72-84
Author(s):  
S. T. Zolyan

The concept “sootechestvenniki” is one of the key tools for self-description of society; it is an instrument for drawing borderlines between “we” and “they”. The article describes the development of the meaning of this word since its coinage. The word appeared in the 18th cen­tury as a merger of the Old Slavic and Old Russian ‘otechestvo’ (fatherland, understood as one’s place of origin) and the French ‘compatriot’. This merger resulted in the formation of two new prototypical meanings: one is civic, collective and elevated, and the other gravitates to ethnicity since it is used to refer to Russians. With the strengthening of state institutions in Russia, the first meaning was bound to dominate and it did at the beginning of the 19th century. However, one should speak not about the synthesis, but rather about the discordance of the two meanings. In the 19th century, another meaning developed in the semantic struc­ture of the word: ethnic Russians living abroad. Gradually, the word acquired new evaluative meanings, while negative connotations still prevailed. The basic oppositions (we — they, here — there, ours — alien) interacted in an ambiguous way, substituting each other. A variety of hy­brid “compatriots” arose: we are there, they are here, etc. The heterogeneity of the seman­tics of the word reflects collisions within society, which faced a tragic internal split in the 20th century.


10.29007/kcnh ◽  
2018 ◽  
Author(s):  
Olga Nevzorova ◽  
Alfiia Galieva ◽  
Dzhavdet Suleymanov

This study is aimed at exploring the semantic properties of Tatar affixes. Turkic languages have complicated morphology and syntax, which is a challenge for language processing.The fundamental principle of inflection and derivation in Tatar, as well as in other Turkic languages, is agglutination, when the stem joins postpositive affixes in a strictly determined order.The Tatar language has affixes of different types:a) derivational affixes expressing only lexical meaning and forming new words;b) inflectional affixes changing the word form (for example, case affixes);c) affixes serving as means of derivation as well as inflection.The current study is devoted to the ambiguous Tatar –lık polyfunctional affix which may be joined to nominal, adjectival and verbal stems and form derivatives of different types depending on contextual environment, the meaning of the stem and the composition of the affixal chain of a derivative. -Lık affix is a productive affix in modern Tatar which builds nominal, adjectival and verbal derivatives.The answer to the question of the number of the types of derivatives and word forms produced with -lık affix is not trivial, and different researchers distinguish different types of derivatives.Based on a thorough analysis of Tatar derivatives containing - lık affix we identified some empirical features of these constructs and then performed their manual and automatic classification. Four classes were distinguished. For our experiments we used data from the Tatar National Corpus “Tugan Tel” (http://corpus.antat.ru).The results obtained may be used for disambiguation in Tatar National Corpus and for analyzing other Tatar ambiguous affixes.


Author(s):  
Anna A. Zalizniak ◽  
◽  

The article considers the semantics of the Russian word kak by. It demonstrates that there are three main types of use of this word that are relevant for the modern Russian language: 1) as an approximation indicator, i.e. the marker of an approximative, indirect or metaphorical use of the linguistic unit it introduces (cf. lёd na reke sluzhil kak by mostom ‘ice on the river served as a kind of bridge’; on kak by veduschij specialist v dannoj oblasti ‘he is sort of leading specialist in this field’); 2) as an indicator of epistemic indefiniteness (cf. infljatsii kak by net ‘there is <kak by> no inflation’); 3) as an illocutionary operator (“illocutionary mitigator”), mitigating the illocutionary force of the assertive speech act (cf. Ja kak by ispolnitel’nyj director kompanii ‘I am <kak by> the chief executive officer of the company’, uttered by the actual CEO of the company). We suggest that the initial meaning of kak by is that of a marker of descriptive indefiniteness (in an outdated use after the verbs of fuzzy perception), which has served as a source for both the approximation meaning, which is the main function of this word in contemporary Russian and that of epistemic indefiniteness. In its function as an “illocutionary mitigator” that emerged at the very end of the 20th century in the course of pragmaticalisation, the word kak by belongs to the class of discourse markers that ensure the success of a communicative act. The study was based on the Russian National Corpus (www.ruscorpora.ru), including its oral and parallel subcorpora.


2019 ◽  
Vol 37 (2) ◽  
pp. 47-63
Author(s):  
T. N. Amiryan

In this article, a number of works by Sergei Parajanov are considered within the context of such type of writing and movement in fiction and arts as autofiction. The objective of this article is to identify various types of autofictionality, common in both Parajanov's feature films (“Shadows of Forgotten Ancestors”, “The Color of Pomegranates”, “Kyiv Frescoes”), and his performative projects, collages, assemblages, scenarios (“The Confession”), the epistolary text corpus, etc. The analysis of Parajanov’s artistic heritage through the prism of visual autofiction contributes to a more precise definition of the place and significance of the filmmaker’s works in the world artistic culture of the second half of the 20th century.


Author(s):  
O.Yu. Vasilyeva ◽  
M.V. Komarova

In the article the naming units denoting means of transportation are considered in the framework of the linguoculturological approach, which allowed to identify the specifics of the formation of both the regional nominative fund and national thinking in a certain epoch in the development of Russian literary language, which is facilitated by the comparative characteristics of an occasional word usage with the data of the National Corpus of the Russian language. Lexical-semantic, historical-etymological, word-forming and functional features of the analysed language units are revealed. The material for the study was texts of different genres, created and published on the territory of the Siberian region in the late 19th century — the first half of the 20th century.


2014 ◽  
Vol 49 ◽  
pp. 59-71
Author(s):  
Sonja Wölkowa

The Upper Sorbian text corpus and further sources of information with regard to Upper Sorbian in the InternetIn the present era of globalisation and the omnipresence of the Internet, Sorbian linguistics faces new challenges along the lines “What is not in the Internet, does not exist”. The demand for digital sources of information with regard to Upper and Lower Sorbian and those accessible online as working tools and reference points for language practice and as a source for academic research increases. As a result of this ongoing development, the Foundation for the Sorbian People established a workgroup called “Sorbian in the new media” at the end of 2012, which has pointed out the creation of an online German­Upper Sorbian dictionary as the major task in this field of activities. The focus of this article, however, is the Upper-­Sorbian text corpus HoTKo, which has been created by the Sorbian Institute and which has been made available in co-­operation with the Institute of the Czech National Corpus at the Charles University in Prague. The article presents the history and development of the corpus, its extent and shape as well as its link to or incorporation into further planned digital projects of the Sorbian Institute with regard to the Upper Sorbian language.


Corpora ◽  
2008 ◽  
Vol 3 (1) ◽  
pp. 1-29 ◽  
Author(s):  
Michael Pearce

In this paper, I examine the representation of men and women in the British National Corpus (BNC) by focussing on the collocational and grammatical behaviour of the noun lemmas man and woman (i.e., the nouns man/men and woman/women ). Using Sketch Engine (a powerful corpus query tool, which is described) I explore the functional distribution of the target lemmas, and reveal the structured and systematic nature of the differences in the way these terms for adult male and female human beings pattern with other word forms in different grammatical relations.


Author(s):  
Nataliia Bober ◽  
Yan Kapranov ◽  
Anna Kukarina ◽  
Tetiana Tron ◽  
Tamara Nasalevych

The article deals with the application of corpus-based direction in English language teaching of university students, suggested by Ukrainian scholars. The most representative corpus for English language teaching (ELT) is the British National Corpus (BNC), which offers many opportunities (e.g. search for specific word forms, search for word forms by lemmas, search for groups of word forms in the form of syntagms, etc.). The article presents the methodological algorithm of university students' work with the BNC during English classes based on the verbs denoting human emotional states. The methodology of work with BNC consists of three stages: 1) a student has to compile the initial lexicographic register of basic verb denoting emotional states; 2) a student has to measure the frequency of each unit in the corpus usage; and 3) a student has to analyse, described and record all corpus calculations. The main benefits of the findings for the future relevant studies may be described in the following way: the work with corpus tools in ELT is aimed at students performing the following successive steps: 1) processing concordances, 2) calculating the absolute frequency, 3) analysing the left and right valence, and 4) modelling clusters to build cognitive-semantic profiles of the studied units, which will allow university students to understand the essence of every grammatical, lexical, and syntactical unit.


2011 ◽  
Vol 20 (03) ◽  
pp. 401-424
Author(s):  
THEOLOGOS ATHANASELIS ◽  
KONSTANTINOS MAMOURAS ◽  
STELIOS BAKAMIDIS ◽  
IOANNIS DOLOGLOU

There are several reasons to expect that recognising word order errors in a text will be a difficult problem, and recognition rates reported in the literature are in fact low. Although grammatical rules constructed by computational linguists improve the performance of a grammar checker in word order diagnosis, the repairing task is still very difficult. This paper describes a method to repair any sentence with wrong word order using a statistical language model (LM). A good indicator of whether a person really knows a language is the ability to use the appropriate words in a sentence in correct word order. The "scrambled" words in a sentence produce a meaningless sentence. Most languages have a fairly fixed word order. This paper introduces a method, which is language independent, for repairing word order errors in sentences using the probabilities of most typical trigrams and bigrams extracted from a large text corpus such as the British National Corpus (BNC).


Sign in / Sign up

Export Citation Format

Share Document