comparable corpora
Recently Published Documents


TOTAL DOCUMENTS

260
(FIVE YEARS 43)

H-INDEX

17
(FIVE YEARS 1)

2021 ◽  
Vol 18 (2) ◽  
pp. 51-68
Author(s):  
Dragana Vuković Vojnović

In this paper, we investigate the main characteristics underlying noun + noun collocations in the English and Serbian language of tourism. Their morpho-syntactic, semantic and communicative features are contrasted and compared in the two languages. Firstly, we compiled two comparable corpora in English and Serbian from the tourism websites of Great Britain and Serbia. Based on their normalized frequencies per 10,000 words, key noun + noun collocations were extracted, using TermoStat Web 3.0 and AntConc. The results showed certain similarities in terms of the prevailing topics in the two corpora, based on the analysis of key noun + noun collocations. However, we found major differences in the two languages in terms of their morpho-syntactic features, communicative focus and the relationship of the collocates. The results of the study have implications for English for Tourism education, tourism discourse studies, language typology and lexicography.


Tradterm ◽  
2021 ◽  
Vol 40 ◽  
pp. 347-377
Author(s):  
Carlos Eduardo Piazentine Costa

Collocations have a great potencial to the understanding of meanings, senses and concepts of words or terms they compound. In this article, we studied the collocations of the terms “segurança” in Portuguese, and “safety” and “security” in English, in aviation language. Our aims were to search for a better understanding of the studied conceptual units, and to offer a discussion about the role of collocations in Terminology. We built two comparable corpora, one with Portuguese texts from the Brazilian National Civil Aviation Agency and the Brazilian Air Force, and another one with English texts from the International Civil Aviation Organization. We focused on the main collocations of the three terms in Portuguese and in English. The theoretical frame we adopted was the Communicative Theory of Terminology (CABRÉ, 1999) and we used the techniques of Corpus Linguistics (TAGNIN, 2010, 2013; BERBER SARDINHA, 2000) to design our methodology, assisted by the software WordSmith Tools, version 6.0. (SCOTT, 2012). We found noun and adjective collocations. The study is addressed to professionals, teachers and tranlators who make use of aviation language.   KEY-WORDS: Terms of Aviation; Terminology; Corpus Linguistics; Collocations.


2021 ◽  
pp. 85-92
Author(s):  
Sigita Rackevičienė ◽  
Liudmila Mockienė ◽  
Andrius Utka ◽  
Aivaras Rokas

The aim of the paper is to present a methodological framework for the development of an English-Lithuanian bilingual termbase in the cybersecurity domain, which can be applied as a model for other language pairs and other specialised domains. It is argued that the presented methodological approach can ensure creation of high-quality bilingual termbases even with limited available resources. The paper touches upon the methods and problems of dataset (corpora) compilation, terminology annotation, automatic bilingual term extraction (BiTE) and alignment, knowledge-rich context extraction, and linguistic linked open data (LLOD) technologies. The paper presents theoretical considerations as well as the arguments on the effectiveness of the described methods. The theoretical analysis and a pilot study allow arguing that: 1) a combination of parallel and comparable corpora enable to considerably expand the amount and variety of data sources that can be used for terminology extraction; this methodology is especially important for less-resourced languages which often lack parallel data; 2) deep learning systems trained by using manually annotated data (gold standard corpora) allow effective automatization of extraction of terminological data and metadata, which enables to regularly update termbases with minimised manual input; 3) LLOD technologies enable to integrate the terminological data into the global linguistic data ecosystem and make it reusable, searchable and discoverable across the Web.


2021 ◽  
pp. 1-24
Author(s):  
Mohamed Chebel ◽  
Chiraz Latiri ◽  
Eric Gaussier

Abstract Bilingual corpora are an essential resource used to cross the language barrier in multilingual natural language processing tasks. Among bilingual corpora, comparable corpora have been the subject of many studies as they are both frequent and easily available. In this paper, we propose to make use of formal concept analysis to first construct concept vectors which can be used to enhance comparable corpora through clustering techniques. We then show how one can extract bilingual lexicons of improved quality from these enhanced corpora. We finally show that the bilingual lexicons obtained can complement existing bilingual dictionaries and improve cross-language information retrieval systems.


2021 ◽  
Author(s):  
Jana Kegalj ◽  
Mirjana Borucinsky

Translation research focuses mainly on parallel and comparable corpora, whereby it is constantly faced with issues of representativeness, balance and comparability as its main constraints. This research aims to introduce the concept of genre as a way of observing linguistic features under controlled conditions. The study analyses the application of external and internal criteria with particular focus on the genre criterion in selecting texts for the compilation of a highly-specialized bilingual maritime legal corpus, consisting of source texts in English and their translations into Croatian. The main advantages and constraints of genre as a criterion are discussed. The main benefits of such an approach are found in its application in translator training and practice. In addition, genre-based approaches to corpus analysis may raise awareness of generic features specific to a target language, ultimately improving the quality of translation.


PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0253454
Author(s):  
Kanglong Liu ◽  
Muhammad Afzaal

This study approaches the investigation of the simplification hypotheses in corpus-based translation studies from a syntactic complexity perspective. The research is based on two comparable corpora, the English monolingual part of COCE (Corpus of Chinese-English) and the native English corpus of FLOB (Freiburg-LOB Corpus of British English). Using the 13 syntactic complexity measures falling into five subconstructs (i.e. length of production unit, amount of subordination, amount of coordination, phrasal complexity and overall sentence complexity), our results show that translation as a whole is less complex compared to non-translation, reflected most prominently in the amount of subordination and overall sentence complexity. Further pairwise comparison of the four subgenres of the corpora shows mixed results. Specifically, the translated news is homogenous to native news as evidenced by the complexity measures; the translated genres of general prose and academic writing are less complex compared to their native counterparts while translated fiction is more complex than non-translated fiction. It was found that mean sentence length always produced a significant effect on syntactic complexity, with higher syntactic complexity for longer sentence lengths in both corpora. ANOVA test shows a highly significant main effect of translation status, with higher syntactic complexity in the non-translated texts (FLOB) than the translated texts (COCE), which provides support for the simplification hypothesis in translation. It is also found that, apart from translation status, genre is an important variable in affecting the complexity level of translated texts. Our study offers new insights into the investigation of simplification hypothesis from the perspective of translation from English into Chinese.


Author(s):  
Brahim Khartite ◽  
Bendaoud Nadif ◽  
Ismail Benfilali

This study investigates the extent to which the results of rhetorical comparisons of persuasive essays by US English native speakers and others by Moroccan advanced EFL students will provide empirical evidence for Kaplan‘s (1966) contrastive rhetoric hypothesis. This is especially regarding the fact that EFL students-writing problems are a byproduct of the negative transfer of rhetorical strategies from their first language (L1). This hypothesis is tested by comparing 20 EFL and Arabic L1 persuasive essays by the same EFL students to essays in English as L1 by native speakers to identify the extent to which the language of composing and one’s cultural background affects the writing quality of their essays. The study hypothesizes that if Kaplan’s contrastive rhetoric claims were accurate, then Moroccan advanced EFL writers would produce essays that tend to be rhetorically less accurate when judged by standard English rhetorical criteria. Moreno’s (2005) approach to match comparable corpora of persuasive essays from two different cultural and linguistic backgrounds was adopted. As for the study participants, 40 advanced student-writers from two discrepant language and cultural backgrounds were recruited to take part in the study. While the results of a stepwise multiple regression analysis provides further evidence corroborating the validity of the rhetorical measures used in the study, group mean scores comparisons and a Multiple Discriminant analysis of the data indicates that those writers from various cultural backgrounds seem to face far more similar than different rhetorical problems and their writing inadequacies are equally distributed regardless of which language the study participants used to write their essays.


2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Sonia Vaupot

Considering the lack of specialised dictionaries in certain fields, a creative way of teaching through corpora-based work was proposed in a seminar for master’s students of translation studies (University of Ljubljana, Slovenia). Since phraseology and terminology play an important role both in specialised translation and in the learning path of students of translation studies, this article presents an active approach aimed at creating an online lexicographic resource in languages for specific purposes by using the didactic tool and database ARTES (Aide à la Rédaction de TExtes Scientifiques/Dictionary-assisted writing tool for scientific communication) previously developed at the Université de Paris (France). About thirty Slovene students enrolled in the first year of master’s study have been participating in the bilateral project since 2018. The aims of such an activity are multiple: students learn in a practical way how to compile comparable corpora from the internet, using the online corpus software Sketch Engine, to find similar linguistic constructions in the source and target languages. They also learn to create an online bilingual phraseological and terminological dictionary to facilitate the translation of specialised texts. In this way, they acquire skills and develop some knowledge in translation, terminology, and discourse phraseology. The article first describes the ARTES online database. Then, we present the teaching methodology and the students’ work, which consists of compiling corpora, extracting and translating collocations for the language pair French-Slovene, and entering them in the ARTES database. Finally, we propose an analysis of the most frequent collocation structures in both languages. The language pair considered here is French and Slovene, but the methodology can be applied to any other language pair.


Author(s):  
Darya Filippova ◽  
◽  
Burcu Can ◽  
Gloria Corpas Pastor ◽  
◽  
...  

Sign in / Sign up

Export Citation Format

Share Document