scholarly journals The Source-Target Domain Mismatch Problem in Machine Translation

Author(s):  
Jiajun Shen ◽  
Peng-Jen Chen ◽  
Matthew Le ◽  
Junxian He ◽  
Jiatao Gu ◽  
...  
2017 ◽  
Vol 108 (1) ◽  
pp. 283-294 ◽  
Author(s):  
Álvaro Peris ◽  
Mara Chinea-Ríos ◽  
Francisco Casacuberta

AbstractCorpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.


Author(s):  
Hoang Cuong ◽  
Khalil Sima’an ◽  
Ivan Titov

Existing work on domain adaptation for statistical machine translation has consistently assumed access to a small sample from the test distribution (target domain) at training time. In practice, however, the target domain may not be known at training time or it may change to match user needs. In such situations, it is natural to push the system to make safer choices, giving higher preference to domain-invariant translations, which work well across domains, over risky domain-specific alternatives. We encode this intuition by (1) inducing latent subdomains from the training data only; (2) introducing features which measure how specialized phrases are to individual induced sub-domains; (3) estimating feature weights on out-of-domain data (rather than on the target domain). We conduct experiments on three language pairs and a number of different domains. We observe consistent improvements over a baseline which does not explicitly reward domain invariance.


Author(s):  
Xiangpeng Wei ◽  
Yue Hu ◽  
Luxi Xing ◽  
Yipeng Wang ◽  
Li Gao

The dominant neural machine translation (NMT) models that based on the encoder-decoder architecture have recently achieved the state-of-the-art performance. Traditionally, the NMT models only depend on the representations learned during training for mapping a source sentence into the target domain. However, the learned representations often suffer from implicit and inadequately informed properties. In this paper, we propose a novel bilingual topic enhanced NMT (BLTNMT) model to improve translation performance by incorporating bilingual topic knowledge into NMT. Specifically, the bilingual topic knowledge is included into the hidden states of both encoder and decoder, as well as the attention mechanism. With this new setting, the proposed BLT-NMT has access to the background knowledge implied in bilingual topics which is beyond the sequential context, and enables the attention mechanism to attend to topic-level attentions for generating accurate target words during translation. Experimental results show that the proposed model consistently outperforms the traditional RNNsearch and the previous topic-informed NMT on Chinese-English and EnglishGerman translation tasks. We also introduce the bilingual topic knowledge into the newly emerged Transformer base model on English-German translation and achieve a notable improvement.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Bin Li ◽  
Jianmin Yao

The performance of a machine translation system (MTS) depends on the quality and size of the training data. How to extend the training dataset for the MTS in specific domains with effective methods to enhance the performance of machine translation needs to be explored. A method for selecting in-domain bilingual sentence pairs based on the topic information is proposed. With the aid of the topic relevance of the bilingual sentence pairs to the target domain, subsets of sentence pairs related to the texts to be translated are selected from a large-scale bilingual corpus to train the translation system in specific domains to improve the translation quality for in-domain texts. Through the test, the bilingual sentence pairs are selected by using the proposed method, and further the MTS is trained. In this way, the translation performance is greatly enhanced.


2018 ◽  
Vol 5 (1) ◽  
pp. 37-45
Author(s):  
Darryl Yunus Sulistyan

Machine Translation is a machine that is going to automatically translate given sentences in a language to other particular language. This paper aims to test the effectiveness of a new model of machine translation which is factored machine translation. We compare the performance of the unfactored system as our baseline compared to the factored model in terms of BLEU score. We test the model in German-English language pair using Europarl corpus. The tools we are using is called MOSES. It is freely downloadable and use. We found, however, that the unfactored model scored over 24 in BLEU and outperforms the factored model which scored below 24 in BLEU for all cases. In terms of words being translated, however, all of factored models outperforms the unfactored model.


Paragraph ◽  
2020 ◽  
Vol 43 (1) ◽  
pp. 98-113
Author(s):  
Michael Syrotinski

Barbara Cassin's Jacques the Sophist: Lacan, Logos, and Psychoanalysis, recently translated into English, constitutes an important rereading of Lacan, and a sustained commentary not only on his interpretation of Greek philosophers, notably the Sophists, but more broadly the relationship between psychoanalysis and sophistry. In her study, Cassin draws out the sophistic elements of Lacan's own language, or the way that Lacan ‘philosophistizes’, as she puts it. This article focuses on the relation between Cassin's text and her better-known Dictionary of Untranslatables, and aims to show how and why both ‘untranslatability’ and ‘performativity’ become keys to understanding what this book is not only saying, but also doing. It ends with a series of reflections on machine translation, and how the intersubjective dynamic as theorized by Lacan might open up the possibility of what is here termed a ‘translatorly’ mode of reading and writing.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Sign in / Sign up

Export Citation Format

Share Document