scholarly journals Target Text Contraction in English-into-Korean Translations: A Contradiction of Presumed Translation Universals?*

2006 ◽  
Vol 51 (2) ◽  
pp. 343-367 ◽  
Author(s):  
Ho-Jeong Cheong

Abstract This paper contradicts the prevailing assumptions among the advocates of translation universals (TU’s) that explicitation, a translation behavior which consists of spelling things out rather than leaving them implicit in translation, is a potential TU, irrespective of the specific language pairs involved in the process of translation. Specifically, via a study employing a newly built 517,609-word parallel corpus, it is shown that implicitation and the subsequent TT contraction as well as explicitation and TT expansion entailed were both observed in translations involving Korean and English. The significance of the direction of language combinations in translations employing the same language pair was identified, together with the introduction and verification of the validity of the four measurement units devised for this study to capture diverse aspects of explicitation/implicitation which in turn entail TT expansion/contraction.

2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


Author(s):  
Łukasz Grabowski ◽  
Nicholas Groom

Abstract This study uses both parallel and comparable reference corpora in the English-Polish language pair to explore how translators deal with recurrent multi-word items performing specific discoursal functions. We also consider whether the observed tendencies overlap with those found in native texts, and the extent to which the discoursal functions realised by the multi-word items under scrutiny are “preserved” in translation. Capitalizing on findings from earlier research (Granger, 2014; Grabar & Lefer, 2015), we analyzed a pre-selected set of phrases signaling stance-taking and those functioning as textual, discourse-structuring devices originally found in the European Parliament proceedings corpus (Koehn, 2005) and included in the English-Polish parallel corpus Paralela (Pęzik, 2016). Since our goal was to explore whether and to what extent English functionally-defined phrases reflect the same level of formulaicity and regularity in both Polish translations and native Polish texts, the findings provided insights into the translation tendencies of such items, and revealed – using inter-rater agreement metrics – that the discoursal functions of recurrent n-grams may change in translation.


1999 ◽  
Vol 2 (1) ◽  
pp. 115-130 ◽  
Author(s):  
Diana Santos ◽  
Signe Oksefjell

This paper examines the results from two corpus-based contrastive studies. Both studies offer cross-linguistic claims about the language pair English-Portuguese. We attempt to replicate the studies and check the findings against a different corpus, viz. the English—Portuguese part of the English—Norwegian Parallel Corpus, to see whether the regularities observed in the original corpora can be confirmed. After a brief presentation of each study, we describe how we gathered equivalent data, present our findings in the new corpus, and discuss some possible reasons for discrepancies in relation to the earlier studies. The topics investigated are boundary-crossing movement descriptions (after Slobin 1997) and perception verbs (after Santos 1998).


Author(s):  
Elżbieta Sękowska

This article discusses a thematic dictionary of a specific language pair (Chinese-Polish in this case) as a teaching aid useful in learning a language and its culture. In the search for a place for the thematic dictionary in language teaching, the author refers to the notion of cultural competence and notions associated with introducing it into the teaching process as cultural competence covers both vocabulary and knowledge (linguaculture), and the interpretative rules and elements of the study of the reality of people and institutions within the area of socioculture. The contents of those areas of culture of a community are discussed in the theoretical studies devoted to teaching Polish as a non-native language and in the curricula of teaching Polish as a foreign language. Those activities become focussed, e.g. in the required standards which refer to individual levels of language proficiency.A thematic dictionary may serve as a source for introducing elements of culture considering the stock of vocabulary which illustrates the differences in languages and cultures. In the discussion of the Chinese-Polish thematic dictionary, the author focusses on the description of macrostructures, the inclusion of selected thematic fields, and indicates its utility in increasing one’s proficiency in the lexis which illustrates the extra-linguistic reality.


2021 ◽  
Vol 14 (2) ◽  
pp. 494-508
Author(s):  
Francina Sole-Mauri ◽  
Pilar Sánchez-Gijón ◽  
Antoni Oliver

This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to re­flect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.


10.29007/gxv3 ◽  
2018 ◽  
Author(s):  
Shuyuan Cao ◽  
Iria Da-Cunha ◽  
Mikel Iruskieta

Spanish and Chinese are two very different languages in all language levels. Therefore, translation (both human and machine translation) from one to another and learning one of them as a foreign language are challenging tasks. Some automatic translation systems exist for this pair of languages, but there is enough room to improve the translation quality between Spanish and Chinese. In addition, the accessible sources, such as a parallel corpus for studying and understanding this language pair, are still few. In this paper, we present how we have created a Spanish-Chinese parallel corpus designed for language learning and translation tasks at the discourse level. This corpus has been enriched automatically with part-of-speech (POS) and several queries based on morpho-syntactic information can be realized. We have made available the parallel corpus to the academic community.


Target ◽  
2018 ◽  
Vol 30 (1) ◽  
pp. 112-136
Author(s):  
Noelia Ramón ◽  
Camino Gutiérrez-Lanza

Abstract This paper presents a corpus-based descriptive research procedure for the identification of significant divergences between original Spanish and Spanish translated from English. When considering the language pair English-Spanish, personal pronouns seem to be good markers of significant differences (anchor phenomena), since they must obligatorily occur in English, but not in Spanish. To test this hypothesis, empirical data have been extracted from a large reference corpus in Spanish (CREA) and from an English-Spanish parallel corpus (P-ACTRES), in both cases from the fiction subcorpora. Statistically significant differences have been found in some of the uses of personal pronouns, having textual and pragmatic implications in the target texts. The aim is to use the results obtained in the case of personal pronouns, together with results from other linguistic areas, to build a semi-automated tool for the post-editing of Spanish translations of texts written originally in English.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 02) ◽  
pp. 208-222
Author(s):  
Vikas Pandey ◽  
Dr.M.V. Padmavati ◽  
Dr. Ramesh Kumar

Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.


2020 ◽  
Vol 17 (1) ◽  
pp. 54-60
Author(s):  
B. S. Sowmya Lakshmi ◽  
B. R. Shambhavi

Visvesvaraya Technological University, Belagavi, Karnataka, India One of the promising resources to extract dictionaries are said to be parallel corpora. Majority of the substantial works are based on parallel corpora, whereas for the resource scarce language pairs building a parallel corpus is a challenging task. To prevail over this issue, researchers found comparable corpora could be an alternative to extract dictionary. Proposed approach is to extract dictionary for a low resource language pair English and Kannada using comparable corpora obtained from Wikipedia dumps and corpus received from Indian Language Corpus Initiative (ILCI). Dictionary constructed comprises of both translation and transliteration entities with term level associations from English to Kannada. Resultant dictionary is of size 77545 tokens with precision score of 0.79. Proposed work is independent of language and could be expanded to other language pairs.


Sign in / Sign up

Export Citation Format

Share Document