scholarly journals Using Small Parallel Corpora to Develop Collocation-Centred Activities in Specialized Translation Classes

Linguaculture ◽  
2016 ◽  
Vol 2016 (2) ◽  
pp. 53-72
Author(s):  
Sorina Postolea ◽  
Teodora Ghivirigă

Abstract The research devoted to special languages as well as the activities carried out in specialized translation classes tend to focus primarily on one-word or multi-word terminological units. However, a very important part in the making of specialist registers and texts is played by specialised collocations, i.e. relatively stable word combinations that do not designate concepts but are nevertheless of frequent use in a given field of activity. This is why helping students acquire competences relative to the identification and processing of collocations should become an important objective in specialised translation classes. An easily accessible and dependable resource that may be successfully used to this purpose is represented by corpora and corpus analysis tools, whose usefulness in translator training has been highlighted by numerous studies. This article proposes a series of practical, task-based activities-developed with the help of a small-size parallel corpus of specialised texts-that aim to raise the translation trainees′ awareness of the collocations present in specialised texts and to provide suggestions about their processing in translation.

Tradterm ◽  
2021 ◽  
Vol 37 (2) ◽  
pp. 370-396
Author(s):  
Silvia Bernardini

This contribution describes the potential of parallel corpus analysis for the development of research skills in the translation classroom. Using culinary texts as a case in point, and more specifically a culturally salient Italian text from the turn of the 19th century (Pellegrino Artusi's La scienza in cucina e l'arte del mangiar bene/Science in the kitchen and the art of eating well, in its original Italian and translated English version), a detailed analysis is offered of the Italian equivalents of the verb lemma COOK and of the English equivalents of the Italian verb lemma CUOCERE, showing how the paradigmatic and syntagmatic insights thus obtained could be used to construct bilingual lexical/terminological profiles of units of meaning, and/or to shed light on translation strategies and norms, depending on one's research hypotheses and variables. Two more unorthodox uses of parallel corpora to tap into translators' domain-specific knowledge, and as aids in the interpretation of culturally salient historical and literary texts, are also discussed.


2021 ◽  
Vol 11 (1) ◽  
pp. 29
Author(s):  
Qingliang Meng

With the advancement of corpus linguistics, there has been an increasing interest in using corpora as a tool for translator training and translation practice. Despite the usefulness of corpora in translation pedagogy, the more and more reliance on parallel corpora in translating activities has diminished the ability to determine the meaning of words within different contexts using dictionaries. However, it has hampered the enhancement of translation competence of trainee translators. This study investigates the necessity of adopting critical and creative thinking in the teaching of corpus-aided English-Chinese translation. It first examines the increasing importance of corpora in aiding translator training and translating practice. A critical analysis was adopted to analyze a translation case using a parallel corpus. Thirteen Chinese versions of Pride and Prejudice's opening remark were compared and analyzed critically and creatively with the aid of different corpora. Pedagogical implications for translation teaching were summarized.


2014 ◽  
pp. 85-100
Author(s):  
Violetta Koseska

Semantics, contrastive linguistics and parallel corporaIn view of the ambiguity of the term “semantics”, the author shows the differences between the traditional lexical semantics and the contemporary semantics in the light of various semantic schools. She examines semantics differently in connection with contrastive studies where the description must necessary go from the meaning towards the linguistic form, whereas in traditional contrastive studies the description proceeded from the form towards the meaning. This requirement regarding theoretical contrastive studies necessitates construction of a semantic interlanguage, rather than only singling out universal semantic categories expressed with various language means. Such studies can be strongly supported by parallel corpora. However, in order to make them useful for linguists in manual and computer translations, as well as in the development of dictionaries, including online ones, we need not only formal, often automatic, annotation of texts, but also semantic annotation - which is unfortunately manual. In the article we focus on semantic annotation concerning time, aspect and quantification of names and predicates in the whole semantic structure of the sentence on the example of the “Polish-Bulgarian-Russian parallel corpus”.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Michael Adjeisah ◽  
Guohua Liu ◽  
Douglas Omwenga Nyabuga ◽  
Richard Nuetey Nortey ◽  
Jinling Song

Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.


2017 ◽  
Author(s):  
Arab World English Journal ◽  
Hind M. Alotaibi

Parallel corpora can be defined as collections of aligned, translated texts of two or more languages. They play a major role in translation and contrastive studies, and are also becoming popular in translation training and language teaching, with the advent of the data-driven learning (DDL) approach. Despite their significance, however, Arabic seems to lack a satisfactory general-use parallel corpus resource. The literature describes few Arabic–English parallel corpora, and these few are usually inaccurate and/or expensive. Some are small in size, while others are restricted in terms of genre, failing to meet the requirements of many academics and researchers. This paper describes an ongoing project at the College of Languages and Translation, King Saud University, to compile a 10-million-word Arabic–English parallel corpus to be used as a resource for translation training and language teaching. The bidirectional corpus can be used to compare translated and source language and identify differences. The corpus has been manually verified at different stages, including translation, text segmentation, alignment, and file preparation; it is available as full-text in XML format and through a user-friendly web interface that provides a concordancer to support bilingual search queries and several filtering options.


Author(s):  
Sanne Van Vuuren ◽  
Janine Berns

Abstract This paper examines the use of clause-initial adverbials in English novice writing. Previous research has identified frequent use of such adverbials as characteristic of Dutch EFL writing. Our contrastive corpus analysis of novice writing by Dutch and Francophone learners as well as native speakers allows us to determine whether this use of initial adverbials is (a) a V2 transfer effect, (b) a general interlanguage feature, independent of learners’ L1, or (c) a characteristic of novice writing in general, holding true for both native and non-native writers. We will show that both learner groups are ‘equally different’ from the native-speaker novice writers in their frequent use of initial adverbials, but appear to have distinct underlying reasons for this linguistic behaviour: Francophone writers place adverbials in initial position more often for stylistic purposes, while Dutch writers have a stronger tendency to use initial adverbials for local discourse linking.


Literator ◽  
2016 ◽  
Vol 37 (1) ◽  
Author(s):  
Ketiwe Ndhlovu

The development of African languages into languages of science and technology is dependent on action being taken to promote the use of these languages in specialised fields such as technology, commerce, administration, media, law, science and education among others. One possible way of developing African languages is the compilation of specialised dictionaries (Chabata 2013). This article explores how parallel corpora can be interrogated using a bilingual concordancer (ParaConc) to extract bilingual terminology that can be used to create specialised bilingual dictionaries. An English–Ndebele Parallel Corpus was used as a resource and through ParaConc, an alphabetic list was compiled from which headwords and possible translations were sought. These translations provided possible terms for entry in a bilingual dictionary. The frequency feature and ‘hot words’ tool in ParaConc were used to determine the suitability of terms for inclusion in the dictionary and for identifying possible synonyms, respectively. Since parallel corpora are aligned and data are presented in context (Key Word in Context), it was possible to draw examples showing how headwords are used. Using this approach produced results quickly and accurately, whilst minimising the process of translating terms manually. It was noted that the quality of the dictionary is dependent on the quality of the corpus, hence the need for creating a representative and clean corpus needs to be emphasised. Although technology has multiple benefits in dictionary making, the research underscores the importance of collaboration between lexicographers, translators, subject experts and target communities so that representative dictionaries are created.


2003 ◽  
Vol 29 (3) ◽  
pp. 349-380 ◽  
Author(s):  
Philip Resnik ◽  
Noah A. Smith

Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.


2000 ◽  
Vol 5 (1) ◽  
pp. 17-52 ◽  
Author(s):  
Lynne Bowker

Specialized target language (TL) corpora constitute an extremely valuable resource for translators, and although no specialized tools have been developed for extracting translation data from such corpora, this paper argues that translators would be remiss not to consult such resources. We describe the advantages of using specialized TL corpora and outline a number of techniques that translators can use in order to extract translation data from such corpora with the aid of generic corpus analysis tools. These advantages and techniques are demonstrated with reference to two translations, one of which was done using only conventional resources and the other with the help of a corpus.


2005 ◽  
Vol 31 (4) ◽  
pp. 477-504 ◽  
Author(s):  
Dragos Stefan Munteanu ◽  
Daniel Marcu

We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.


Sign in / Sign up

Export Citation Format

Share Document