Machine translation using bilingual term entries extracted from parallel texts

AbstractWe use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.

Download Full-text

Improving Statistical Machine Translation by Adapting Translation Models to Translationese

Computational Linguistics ◽

10.1162/coli_a_00159 ◽

2013 ◽

Vol 39 (4) ◽

pp. 999-1023 ◽

Cited By ~ 5

Author(s):

Gennadi Lembersky ◽

Noam Ordan ◽

Shuly Wintner

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Common Assumption ◽

Parallel Corpora ◽

The Common ◽

The Right ◽

Translation Systems ◽

Parallel Texts

Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translationese) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the “wrong” direction. We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the “right” and the “wrong” directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation. We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.

Download Full-text

Making the Most of Synthetic Parallel Texts: Portuguese-Chinese Neural Machine Translation Enhanced with Back-Translation

Lecture Notes in Computer Science - Computational Processing of the Portuguese Language ◽

10.1007/978-3-030-41505-1_12 ◽

2020 ◽

pp. 121-130

Author(s):

Rodrigo Santos ◽

João Silva ◽

António Branco

Keyword(s):

Machine Translation ◽

Neural Machine Translation ◽

Back Translation ◽

Parallel Texts

Download Full-text

Aligning Turkish and English Parallel Texts for Statistical Machine Translation

Computer and Information Sciences - ISCIS 2005 - Lecture Notes in Computer Science ◽

10.1007/11569596_64 ◽

2005 ◽

pp. 616-625

Author(s):

İlknur D. El-Kahlout ◽

Kemal Oflazer

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Parallel Texts

Download Full-text

Factored Statistical Machine Translation for German-English

Journal of Applied Information, Communication and Technology ◽

10.33555/ejaict.v5i1.47 ◽

2018 ◽

Vol 5 (1) ◽

pp. 37-45

Author(s):

Darryl Yunus Sulistyan

Keyword(s):

Machine Translation ◽

English Language ◽

Statistical Machine Translation ◽

New Model ◽

Language Pair

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.

Download Full-text

On (Not) Translating Lacan: Barbara Cassin's Sophistico-Analytical Performances

Paragraph ◽

10.3366/para.2020.0323 ◽

2020 ◽

Vol 43 (1) ◽

pp. 98-113

Author(s):

Michael Syrotinski

Keyword(s):

Machine Translation ◽

Reading And Writing ◽

The Relationship ◽

The Way

Barbara Cassin's Jacques the Sophist: Lacan, Logos, and Psychoanalysis, recently translated into English, constitutes an important rereading of Lacan, and a sustained commentary not only on his interpretation of Greek philosophers, notably the Sophists, but more broadly the relationship between psychoanalysis and sophistry. In her study, Cassin draws out the sophistic elements of Lacan's own language, or the way that Lacan ‘philosophistizes’, as she puts it. This article focuses on the relation between Cassin's text and her better-known Dictionary of Untranslatables, and aims to show how and why both ‘untranslatability’ and ‘performativity’ become keys to understanding what this book is not only saying, but also doing. It ends with a series of reflections on machine translation, and how the intersubjective dynamic as theorized by Lacan might open up the possibility of what is here termed a ‘translatorly’ mode of reading and writing.

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

Relation of the Saccasa?khepa??k? Called S?ratthas?lin? to the Vinayavinicchaya??k? Called Vinayas?ratthasand?pan?

Buddhist Studies Review ◽

10.1558/bsrv.36760 ◽

2018 ◽

Vol 35 (1-2) ◽

pp. 189-223

Author(s):

Petra Kieffer-Pülz

Keyword(s):

Thirteenth Century ◽

Present Contribution ◽

The Common ◽

Literary Histories ◽

High Degree ◽

Parallel Texts

The present contribution suggests the common authorship of three P?li commentaries of the twelfth/thirteenth centuries CE, namely the Vinayavinicchaya??k? called Vinayas?ratthasand?pan? (less probably Vinayatthas?rasand?pan?), the Uttaravinicchaya??k? called L?natthappak?san?, and the Saccasa?khepa??k? called S?ratthas?lin?. The information collected from these three commentaries themselves and from P?li literary histories concerning these three texts leads to the second quarter of the thirteenth century CE as the period of their origination. The data from parallel texts explicitly stated to having been written by V?cissara Thera in the texts themselves render it possible to establish with a high degree of probability V?cissara Thera as their author.

Download Full-text

A study on the ambiguity problems in Korean machine translation

Journal of Korean Linguistics ◽

10.15811/jkl.2007..50.009 ◽

2007 ◽

Vol null (50) ◽

pp. 241-267

Author(s):

Hong,Jongseon

Keyword(s):

Machine Translation

Download Full-text