Journal of Linguistics/Jazykovedný casopis
Latest Publications


TOTAL DOCUMENTS

335
(FIVE YEARS 130)

H-INDEX

2
(FIVE YEARS 1)

Published By Walter De Gruyter Gmbh

1338-4287, 0021-5597

2021 ◽  
Vol 72 (2) ◽  
pp. 520-530
Author(s):  
Marie Kopřivová ◽  
Zuzana Laubeová ◽  
David Lukeš

Abstract ORATOR v2 is a new 1.5M word corpus of Czech monologues, delivered to a live audience in semi-formal to formal settings. It was designed to chart the space of naturally occurring monologues which can be obtained for corpus processing. As such, it aims for diversity but does not attempt any balancing of subcategories, recognizing that some types of data are inherently easier to obtain in high volume than others. The transcription guidelines and annotation tools employed are the same as other recent spoken corpora published by the CNC, which facilitates interesting comparisons between various types of spoken Czech. The present paper sketches out three case studies, comparing ORATOR to the informal conversations of ORTOFON v2 in terms of the frequencies of demonstratives and hesitations, as well as lexical richness.


2021 ◽  
Vol 72 (2) ◽  
pp. 545-555
Author(s):  
Zuzana Laubeová ◽  
Michal Škrabal

Abstract The paper introduces a new section separated from journalistic texts in Czech corpora, namely interviews. This genre is highly specific; from among the texts that can be found in newspapers and magazines, it is probably the closest to spoken language. In two case studies, we present the possible application of the interviews subcorpus in linguistic research. The first one deals with the role of paralinguistic behaviour, especially laughter in written interviews vs. spoken dialogues. The second one investigates the specifics of the demonstrative ten in the function of a nominal attribute, again in both written and spoken data.


2021 ◽  
Vol 72 (2) ◽  
pp. 319-329
Author(s):  
Aleksei Dobrov ◽  
Maria Smirnova

Abstract This article presents the current results of an ongoing study of the possibilities of fine-tuning automatic morphosyntactic and semantic annotation by means of improving the underlying formal grammar and ontology on the example of one Tibetan text. The ultimate purpose of work at this stage was to improve linguistic software developed for natural-language processing and understanding in order to achieve complete annotation of a specific text and such state of the formal model, in which all linguistic phenomena observed in the text would be explained. This purpose includes the following tasks: analysis of error cases in annotation of the text from the corpus; eliminating these errors in automatic annotation; development of formal grammar and updating of dictionaries. Along with the morpho-syntactic analysis, the current approach involves simultaneous semantic analysis as well. The article describes semantic annotation of the corpus, required by grammar revision and development, which was made with the use of computer ontology. The work is carried out with one of the corpus texts – a grammatical poetic treatise Sum-cu-pa (VII c.).


2021 ◽  
Vol 72 (2) ◽  
pp. 353-370
Author(s):  
Martina Ivanová ◽  
Miroslava Kyseľová ◽  
Anna Gálisová

Abstract The paper deals with the acquisition of Slovak word order in written texts of students of Slovak as a foreign language. Its attention is focused on identifying the correct and incorrect placement of enclitic components, and their erroneous usage is analysed with respect to different investigated variables (types of enclitic components, types of syntactic construction, distance from lexical/syntactic anchor, and realization in pre- or post-verbal position). The paper also pays attention to the error rate regarding individual proficiency levels of students, and error distribution in two language groups, Slavic and Non-Slavic learners, is compared.


2021 ◽  
Vol 72 (2) ◽  
pp. 705-718
Author(s):  
Miroslav Zumrík

Abstract The paper follows the tradition of research in legal linguistics and into formulaic language, specifically into lexical bundles. The aim of the paper is to describe lexical bundles in samples from the corpus of Slovak judicial decisions OD-JUSTICE by means of quantitative characteristics of the identified bundles and by their comparison with bundles found in two other specialized corpora: the corpus of Slovak legal regulations and the corpus of annual reports by Slovak public institutions. For the identification of bundles, the concept of the h-point was used. Identified bundles are described with respect to their maximal, minimal, average, median and mode values, distributions and ratios. The aim of the paper is to outline an interpretation of these bundle characteristics with regard to communicative function(s) of compared document genres.


2021 ◽  
Vol 72 (2) ◽  
pp. 690-704
Author(s):  
Jana Lokajová

Abstract The phenomenon of political evasiveness in the genre of a political interview has been the focus of several discourse studies employing conversation analysis, critical discourse analysis and the social psychology approach. Most of the above-mentioned studies focus on a detailed qualitative analysis of political discourse identifying a wide range of communication strategies that permit politicians to ambiguate their agency and at the same time boost their positive face. Since these strategies may change over time and also be subject to a culture specific environment, the aim of this paper is to discover a) which evasive communicative strategies were employed by Slovak politicians in 2012–2016, b) which lexical substitutions were most frequently used by them to avoid negative connotations of face-threatening questions, and finally, c) which cognitive frames formed a frequent conceptual background of their evasive political argumentation. The paper will draw on a combination of quantitative and qualitative approach to the analysis of non-replies devised by Bull and Mayer (1993) and critical discourse analysis in the sample of five Slovak radio interviews aired on the Rádio Express. The selection of interviews was not random- in each interview the politician was asked highly conflictual questions about bribery, embezzlement or disputes in the coalition. Based on qualitative research of Russian-Slovak political discourse (2009) by Dulebová it is hypothesized that a) the evasive strategy of ‘attack’ on the opposition and ‘attack on the interviewer’ would occur in our sample with the highest prominence in the speech of the former Prime Minister Fico, and b) the politicians accused of direct involvement in scandals would be the most evasive ones.


2021 ◽  
Vol 72 (2) ◽  
pp. 556-567
Author(s):  
Olga Lyashevskaya ◽  
Ilia Afanasev

Abstract We present a hybrid HMM-based PoS tagger for Old Church Slavonic. The training corpus is a portion of one text, Codex Marianus (40k) annotated with the Universal Dependencies UPOS tags in the UD-PROIEL treebank. We perform a number of experiments in within-domain and out-of-domain settings, in which the remaining part of Codex Marianus serves as a within-domain test set, and Kiev Folia is used as an out-of-domain test set. Analysing by-PoS-class precision and sensitivity in each run, we combine a simple context-free n-gram-based approach and Hidden Markov method (HMM), and added linguistic rules for specific cases such as punctuation and digits. While the model achieves a rather non-impressive accuracy of 81% in in-domain settings, we observe an accuracy of 51% in out-of-domain evaluation, which is comparable to the results of large neural architectures based on pre-trained contextual embeddings.


2021 ◽  
Vol 72 (2) ◽  
pp. 342-352
Author(s):  
Jakob Horsch

Abstract Inspired by earlier work on typological profiling of English by Benedikt Szmrecsányi and Bernd Kortmann ([1], [2], [3]), this paper investigates typological profiles of English, Spanish, German, and Slovak, applying Szmrecsányi and Kortmann’s methodology of calculating the SYNTHETICITY INDEx and the ANALYTICITY INDEx based on 1,000-word corpus samples. The results show that Szmrecsányi and Kortmann’s methodology is replicable, and confirm claims in the literature about degrees of analyticity and syntheticity of these languages. Instead of a simple analytic-synthetic continuum, Szmrecsányi and Kortmann’s “typological space” [3] is used to visualize results, showing that languages can be both synthetic and analytic to varying degrees.


2021 ◽  
Vol 72 (2) ◽  
pp. 502-509
Author(s):  
Hana Goláňová ◽  
Martina Waclawičová

Abstract A new interactive map-based web application named Mapka was published by the Institute of the Czech National Corpus in 2020. It aims to serve linguists, as well as schools and the general public, and it features various functions described in this paper. Mapka was designed as a supplement to the CNC spoken corpora, starting with the DIALEKT corpus (more to come in the future). Its main function is to display various types of territorial division (primarily in terms of dialect, but also administrative) and networks of localities associated with the corpus. The main dialect regions are provided with overviews of their typical dialectal features and two samples of dialectal discourse – one slightly historical and one contemporary. The application offers the possibility of searching for municipalities, plotting the points on the map and creating a custom map. The paper concludes with future prospects concerning an enhanced and improved version of the application.


2021 ◽  
Vol 72 (2) ◽  
pp. 477-487
Author(s):  
Klára Bendová

Abstract Text readability metrics assess how much effort a reader must put into comprehending a given text. They are, e.g., used to choose appropriate readings for different student proficiency levels, or to make sure that crucial information is efficiently conveyed (e.g., in an emergency). Flesch Reading Ease is such a globally used formula that it is even integrated into the MS Word Processor. However, its constants are language-dependent. The original formula was created for English. So far it has been adapted to several European languages, Bangla, and Hindi. This paper describes the Czech adaptation, with the language-dependent constants optimized by a machine-learning algorithm working on parallel corpora of Czech and English, Russian, Italian, and French, respectively.


Sign in / Sign up

Export Citation Format

Share Document