Functionally-defined recurrent multi-word units in English-to-Polish translation

Author(s):  
Łukasz Grabowski ◽  
Nicholas Groom

Abstract This study uses both parallel and comparable reference corpora in the English-Polish language pair to explore how translators deal with recurrent multi-word items performing specific discoursal functions. We also consider whether the observed tendencies overlap with those found in native texts, and the extent to which the discoursal functions realised by the multi-word items under scrutiny are “preserved” in translation. Capitalizing on findings from earlier research (Granger, 2014; Grabar & Lefer, 2015), we analyzed a pre-selected set of phrases signaling stance-taking and those functioning as textual, discourse-structuring devices originally found in the European Parliament proceedings corpus (Koehn, 2005) and included in the English-Polish parallel corpus Paralela (Pęzik, 2016). Since our goal was to explore whether and to what extent English functionally-defined phrases reflect the same level of formulaicity and regularity in both Polish translations and native Polish texts, the findings provided insights into the translation tendencies of such items, and revealed – using inter-rater agreement metrics – that the discoursal functions of recurrent n-grams may change in translation.

2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


1999 ◽  
Vol 2 (1) ◽  
pp. 115-130 ◽  
Author(s):  
Diana Santos ◽  
Signe Oksefjell

This paper examines the results from two corpus-based contrastive studies. Both studies offer cross-linguistic claims about the language pair English-Portuguese. We attempt to replicate the studies and check the findings against a different corpus, viz. the English—Portuguese part of the English—Norwegian Parallel Corpus, to see whether the regularities observed in the original corpora can be confirmed. After a brief presentation of each study, we describe how we gathered equivalent data, present our findings in the new corpus, and discuss some possible reasons for discrepancies in relation to the earlier studies. The topics investigated are boundary-crossing movement descriptions (after Slobin 1997) and perception verbs (after Santos 1998).


2017 ◽  
Vol 16 (3) ◽  
pp. 412-433 ◽  
Author(s):  
Maria Calzada Pérez

The present paper proposes a CADS-based analysis of European Parliament speeches, by merging (C)DA theoretical constructs (inspired by Laclau and Mouffe 1985) and CL tools. In this fashion, the European Comparable and Parallel Corpus of Parliamentary Speeches Archive (ECPC) is examined along synchronic and diachronic, quantitative and qualitative lines, in an inductive study that commutes from the micro-text to the macro-context.


2006 ◽  
Vol 51 (2) ◽  
pp. 343-367 ◽  
Author(s):  
Ho-Jeong Cheong

Abstract This paper contradicts the prevailing assumptions among the advocates of translation universals (TU’s) that explicitation, a translation behavior which consists of spelling things out rather than leaving them implicit in translation, is a potential TU, irrespective of the specific language pairs involved in the process of translation. Specifically, via a study employing a newly built 517,609-word parallel corpus, it is shown that implicitation and the subsequent TT contraction as well as explicitation and TT expansion entailed were both observed in translations involving Korean and English. The significance of the direction of language combinations in translations employing the same language pair was identified, together with the introduction and verification of the validity of the four measurement units devised for this study to capture diverse aspects of explicitation/implicitation which in turn entail TT expansion/contraction.


2021 ◽  
Vol 14 (2) ◽  
pp. 494-508
Author(s):  
Francina Sole-Mauri ◽  
Pilar Sánchez-Gijón ◽  
Antoni Oliver

This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to re­flect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.


10.29007/gxv3 ◽  
2018 ◽  
Author(s):  
Shuyuan Cao ◽  
Iria Da-Cunha ◽  
Mikel Iruskieta

Spanish and Chinese are two very different languages in all language levels. Therefore, translation (both human and machine translation) from one to another and learning one of them as a foreign language are challenging tasks. Some automatic translation systems exist for this pair of languages, but there is enough room to improve the translation quality between Spanish and Chinese. In addition, the accessible sources, such as a parallel corpus for studying and understanding this language pair, are still few. In this paper, we present how we have created a Spanish-Chinese parallel corpus designed for language learning and translation tasks at the discourse level. This corpus has been enriched automatically with part-of-speech (POS) and several queries based on morpho-syntactic information can be realized. We have made available the parallel corpus to the academic community.


Target ◽  
2018 ◽  
Vol 30 (1) ◽  
pp. 112-136
Author(s):  
Noelia Ramón ◽  
Camino Gutiérrez-Lanza

Abstract This paper presents a corpus-based descriptive research procedure for the identification of significant divergences between original Spanish and Spanish translated from English. When considering the language pair English-Spanish, personal pronouns seem to be good markers of significant differences (anchor phenomena), since they must obligatorily occur in English, but not in Spanish. To test this hypothesis, empirical data have been extracted from a large reference corpus in Spanish (CREA) and from an English-Spanish parallel corpus (P-ACTRES), in both cases from the fiction subcorpora. Statistically significant differences have been found in some of the uses of personal pronouns, having textual and pragmatic implications in the target texts. The aim is to use the results obtained in the case of personal pronouns, together with results from other linguistic areas, to build a semi-automated tool for the post-editing of Spanish translations of texts written originally in English.


2017 ◽  
Vol 62 (1) ◽  
pp. 19-44
Author(s):  
María Azahara Veroz

In this paper we tackle the study of European Parliament technical texts in three languages, namely English-Spanish-French, focusing on their discursive features and, more concretely, on the way in which the ideational function is expressed in them. To achieve this end, we have followed the frame of the Systemic Functional Grammar compiling a trilingual parallel corpus composed of technical texts downloaded from the European Parliament (EP) Website. In accordance with the analysis proposed it is proved that these texts are characterized by a predominance of material processes, in particular, actions linked to the legal, administrative and economic world like adopt, approve, modify, create, transmit, publish, establish and sign, with their respective equivalents in Spanish and French. Although there are certain processes – mental (cognition) and verbal ones – that could have a mixed nature, as we have observed that equivalents are exchanged with each other in the different linguistic versions studied. Regarding the participants (agent, affected, recipient, beneficiary and sayer), most of them are institutions and acts/documents, when these participants in these processes are usually humans. We conclude that knowledge of these features could be useful for EU translators and should be used in the training of future translators as a guide in the translation process.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 02) ◽  
pp. 208-222
Author(s):  
Vikas Pandey ◽  
Dr.M.V. Padmavati ◽  
Dr. Ramesh Kumar

Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.


2015 ◽  
Vol 47 ◽  
pp. 247-261
Author(s):  
Elżbieta Kaczmarska

The Czech Verb zdát se in Translation into Polish Language (Based on Studies Using the Parallel Corpus „InterCorp”)The article presents the possibilities of translating the Czech verb zdát se into the Polish language and introduces the parallel corpus (InterCorp) as a tool for searching equivalents. The analysis of the data from a parallel corpus shows a series of possibilities of understanding and translating the verb zdát se (wydawać się, zdawać się, mieć wrażenie, wyglądać, widzieć, widać, myśleć, mniemać, podejrzewać, pomyśleć, rozumieć, sądzić, uświadamiać sobie, uważać, uznać, czuć, poczuć, doznać uczucia, mieć uczucie, wynikać, okazywać się, chyba, najwyraźniej, pewnie, prawdopodobnie, śnić się, przyśnić się, przywidzieć się, podobać się, być zadowolonym). The verb zdát se seems to be polysemantic and to cause lexical and stylistic problems. The results of the analyses based on the InterCorp may also open the discussion about the contents of modern dictionaries.


Sign in / Sign up

Export Citation Format

Share Document