Target Text Contraction in English-into-Korean Translations: A Contradiction of Presumed Translation Universals?*

Abstract This paper contradicts the prevailing assumptions among the advocates of translation universals (TU’s) that explicitation, a translation behavior which consists of spelling things out rather than leaving them implicit in translation, is a potential TU, irrespective of the specific language pairs involved in the process of translation. Specifically, via a study employing a newly built 517,609-word parallel corpus, it is shown that implicitation and the subsequent TT contraction as well as explicitation and TT expansion entailed were both observed in translations involving Korean and English. The significance of the direction of language combinations in translations employing the same language pair was identified, together with the introduction and verification of the validity of the four measurement units devised for this study to capture diverse aspects of explicitation/implicitation which in turn entail TT expansion/contraction.

Download Full-text

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

Functionally-defined recurrent multi-word units in English-to-Polish translation

Revista Española de Lingüística Aplicada/Spanish Journal of Applied Linguistics ◽

10.1075/resla.19037.gra ◽

2021 ◽

Author(s):

Łukasz Grabowski ◽

Nicholas Groom

Keyword(s):

European Parliament ◽

Rater Agreement ◽

Parallel Corpus ◽

Polish Language ◽

Language Pair

Abstract This study uses both parallel and comparable reference corpora in the English-Polish language pair to explore how translators deal with recurrent multi-word items performing specific discoursal functions. We also consider whether the observed tendencies overlap with those found in native texts, and the extent to which the discoursal functions realised by the multi-word items under scrutiny are “preserved” in translation. Capitalizing on findings from earlier research (Granger, 2014; Grabar & Lefer, 2015), we analyzed a pre-selected set of phrases signaling stance-taking and those functioning as textual, discourse-structuring devices originally found in the European Parliament proceedings corpus (Koehn, 2005) and included in the English-Polish parallel corpus Paralela (Pęzik, 2016). Since our goal was to explore whether and to what extent English functionally-defined phrases reflect the same level of formulaicity and regularity in both Polish translations and native Polish texts, the findings provided insights into the translation tendencies of such items, and revealed – using inter-rater agreement metrics – that the discoursal functions of recurrent n-grams may change in translation.

Download Full-text

Using a Parallel Corpus to Validate Independent Claims

Languages in Contrast ◽

10.1075/lic.2.1.07san ◽

1999 ◽

Vol 2 (1) ◽

pp. 115-130 ◽

Cited By ~ 2

Author(s):

Diana Santos ◽

Signe Oksefjell

Keyword(s):

Boundary Crossing ◽

Parallel Corpus ◽

Perception Verbs ◽

Language Pair

This paper examines the results from two corpus-based contrastive studies. Both studies offer cross-linguistic claims about the language pair English-Portuguese. We attempt to replicate the studies and check the findings against a different corpus, viz. the English—Portuguese part of the English—Norwegian Parallel Corpus, to see whether the regularities observed in the original corpora can be confirmed. After a brief presentation of each study, we describe how we gathered equivalent data, present our findings in the new corpus, and discuss some possible reasons for discrepancies in relation to the earlier studies. The topics investigated are boundary-crossing movement descriptions (after Slobin 1997) and perception verbs (after Santos 1998).

Download Full-text

A thematic dictionary as a source of cultural competence (using the example of a Chinese-Polish dictionary)

Acta Universitatis Lodziensis Kształcenie Polonistyczne Cudzoziemców ◽

10.18778/0860-6587.27.19 ◽

2020 ◽

Vol 27 ◽

pp. 339-348

Author(s):

Elżbieta Sękowska

Keyword(s):

Cultural Competence ◽

Foreign Language ◽

Native Language ◽

Language Teaching ◽

Theoretical Studies ◽

Teaching Process ◽

Specific Language ◽

Teaching Aid ◽

Language Pair ◽

Linguistic Reality

This article discusses a thematic dictionary of a specific language pair (Chinese-Polish in this case) as a teaching aid useful in learning a language and its culture. In the search for a place for the thematic dictionary in language teaching, the author refers to the notion of cultural competence and notions associated with introducing it into the teaching process as cultural competence covers both vocabulary and knowledge (linguaculture), and the interpretative rules and elements of the study of the reality of people and institutions within the area of socioculture. The contents of those areas of culture of a community are discussed in the theoretical studies devoted to teaching Polish as a non-native language and in the curricula of teaching Polish as a foreign language. Those activities become focussed, e.g. in the required standards which refer to individual levels of language proficiency.A thematic dictionary may serve as a source for introducing elements of culture considering the stock of vocabulary which illustrates the differences in languages and cultures. In the discussion of the Chinese-Polish thematic dictionary, the author focusses on the description of macrostructures, the inclusion of selected thematic fields, and indicates its utility in increasing one’s proficiency in the lexis which illustrates the extra-linguistic reality.

Download Full-text

Cadlaws – An English–French Parallel Corpus of Legally Equivalent Documents

Mutatis Mutandis Revista Latinoamericana de Traducción ◽

10.17533/udea.mut.v14n2a10 ◽

2021 ◽

Vol 14 (2) ◽

pp. 494-508

Author(s):

Francina Sole-Mauri ◽

Pilar Sánchez-Gijón ◽

Antoni Oliver

Keyword(s):

Machine Translation ◽

Translation System ◽

Neural Machine Translation ◽

Parallel Corpus ◽

Legal Documents ◽

Legal Traditions ◽

Corpus Construction ◽

Machine Translation System ◽

French Corpus ◽

Language Pair

This article presents Cadlaws, a new English–French corpus built from Canadian legal documents, and describes the corpus construction process and preliminary statistics obtained from it. The corpus contains over 16 million words in each language and includes unique features since it is composed of documents that are legally equivalent in both languages but not the result of a translation. The corpus is built upon enactments co-drafted by two jurists to ensure legal equality of each version and to reflect the concepts, terms and institutions of two legal traditions. In this article the corpus definition as a parallel corpus instead of a comparable one is also discussed. Cadlaws has been pre-processed for machine translation and baseline Bilingual Evaluation Understudy (bleu), a score for comparing a candidate translation of text to a gold-standard translation of a neural machine translation system. To the best of our knowledge, this is the largest parallel corpus of texts which convey the same meaning in this language pair and is freely available for non-commercial use.

Download Full-text

Toward the Elaboration of a Spanish-Chinese Parallel Annotated Corpus

10.29007/gxv3 ◽

2018 ◽

Author(s):

Shuyuan Cao ◽

Iria Da-Cunha ◽

Mikel Iruskieta

Keyword(s):

Language Learning ◽

Academic Community ◽

Parallel Corpus ◽

Translation Quality ◽

Automatic Translation ◽

Part Of Speech ◽

Challenging Tasks ◽

Syntactic Information ◽

Translation Systems ◽

Language Pair

Spanish and Chinese are two very different languages in all language levels. Therefore, translation (both human and machine translation) from one to another and learning one of them as a foreign language are challenging tasks. Some automatic translation systems exist for this pair of languages, but there is enough room to improve the translation quality between Spanish and Chinese. In addition, the accessible sources, such as a parallel corpus for studying and understanding this language pair, are still few. In this paper, we present how we have created a Spanish-Chinese parallel corpus designed for language learning and translation tasks at the discourse level. This corpus has been enriched automatically with part-of-speech (POS) and several queries based on morpho-syntactic information can be realized. We have made available the parallel corpus to the academic community.

Download Full-text

Translation description for assessment and post-editing

Target ◽

10.1075/target.15098.ram ◽

2018 ◽

Vol 30 (1) ◽

pp. 112-136

Author(s):

Noelia Ramón ◽

Camino Gutiérrez-Lanza

Keyword(s):

Empirical Data ◽

Personal Pronouns ◽

Parallel Corpus ◽

Descriptive Research ◽

Reference Corpus ◽

Research Procedure ◽

Automated Tool ◽

Language Pair

Abstract This paper presents a corpus-based descriptive research procedure for the identification of significant divergences between original Spanish and Spanish translated from English. When considering the language pair English-Spanish, personal pronouns seem to be good markers of significant differences (anchor phenomena), since they must obligatorily occur in English, but not in Spanish. To test this hypothesis, empirical data have been extracted from a large reference corpus in Spanish (CREA) and from an English-Spanish parallel corpus (P-ACTRES), in both cases from the fiction subcorpora. Statistically significant differences have been found in some of the uses of personal pronouns, having textual and pragmatic implications in the target texts. The aim is to use the results obtained in the case of personal pronouns, together with results from other linguistic areas, to build a semi-automated tool for the post-editing of Spanish translations of texts written originally in English.

Download Full-text

Hindi Chhattisgarhi Machine Translation System Using Statistical Approach

Webology ◽

10.14704/web/v18si02/web18067 ◽

2021 ◽

Vol 18 (Special Issue 02) ◽

pp. 208-222

Author(s):

Vikas Pandey ◽

Dr.M.V. Padmavati ◽

Dr. Ramesh Kumar

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Approach ◽

Statistical Machine Translation ◽

Target Language ◽

Translation System ◽

Parallel Corpus ◽

Machine Translation System ◽

Unknown Words ◽

Language Pair

Machine Translation is a subfield of Natural language Processing (NLP) which uses to translate source language to target language. In this paper an attempt has been made to make a Hindi Chhattisgarhi machine translation system which is based on statistical approach. In the state of Chhattisgarh there is a long awaited need for Hindi to Chhattisgarhi machine translation system for converting Hindi into Chhattisgarhi especially for non Chhattisgarhi speaking people. In order to develop Hindi Chhattisgarhi statistical machine translation system an open source software called Moses is used. Moses is a statistical machine translation system and used to automatically train the translation model for Hindi Chhattisgarhi language pair called as parallel corpus. A collection of structured text to study linguistic properties is called corpus. This machine translation system works on parallel corpus of 40,000 Hindi-Chhattisgarhi bilingual sentences. In order to overcome translation problem related to proper noun and unknown words, a transliteration system is also embedded in it. These sentences are extracted from various domains like stories, novels, text books and news papers etc. This system is tested on 1000 sentences to check the grammatical correctness of sentences and it was found that an accuracy of 75% is achieved.

Download Full-text

Bilingual lexicon extraction for a distant language pair using a small parallel corpus

10.3115/v1/n15-2021 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ximena Gutierrez-Vasques

Keyword(s):

Parallel Corpus ◽

Bilingual Lexicon ◽

Language Pair

Download Full-text

Extraction of Bilingual Dictionary from Comparable Corpora for Resource Scarce Languages

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.8629 ◽

2020 ◽

Vol 17 (1) ◽

pp. 54-60

Author(s):

B. S. Sowmya Lakshmi ◽

B. R. Shambhavi

Keyword(s):

Indian Language ◽

Parallel Corpora ◽

Parallel Corpus ◽

Comparable Corpora ◽

Bilingual Dictionary ◽

Low Resource ◽

Technological University ◽

Language Corpus ◽

Language Pair

Visvesvaraya Technological University, Belagavi, Karnataka, India One of the promising resources to extract dictionaries are said to be parallel corpora. Majority of the substantial works are based on parallel corpora, whereas for the resource scarce language pairs building a parallel corpus is a challenging task. To prevail over this issue, researchers found comparable corpora could be an alternative to extract dictionary. Proposed approach is to extract dictionary for a low resource language pair English and Kannada using comparable corpora obtained from Wikipedia dumps and corpus received from Indian Language Corpus Initiative (ILCI). Dictionary constructed comprises of both translation and transliteration entities with term level associations from English to Kannada. Resultant dictionary is of size 77545 tokens with precision score of 0.79. Proposed work is independent of language and could be expanded to other language pairs.

Download Full-text