scholarly journals Dynamic Models in Moses for Online Adaptation

2014 ◽  
Vol 101 (1) ◽  
pp. 7-28 ◽  
Author(s):  
Nicola Bertoldi

Abstract A very hot issue for research and industry is how to effectively integrate machine translation (MT) within computer assisted translation (CAT) software. This paper focuses on this issue, and more generally how to dynamically adapt phrase-based statistical machine translation (SMT) by exploiting external knowledge, like the post-editions from professional translators. We present an enhancement of the Moses SMT toolkit dynamically adaptable to external information, which becomes available during the translation process, and which can depend on the previously translated text. We have equipped Moses with two new elements: a new phrase table implementation and a new LM-like feature. Both the phrase table and the LM-like feature can be dynamically modified by adding and removing entries and re-scoring them according to a time-decaying scoring function. The final goal of these two dynamically adaptable features is twofold: to create additional translation alternatives and to reward those which are composed of entries previously inserted therein. The implemented dynamic system is highly configurable, flexible and applicable to many tasks, like for instance online MT adaptation, interactive MT, and context-aware MT. When exploited in a real-world CAT scenario where online adaptation is applied to repetitive texts, it has proven itself very effective in improving translation quality and reducing post-editing effort.

2017 ◽  
Vol 2 (119) ◽  
pp. 142-24
Author(s):  
Benjamin Screen

AbstractThis article reports on a controlled study carried out to examine the possible benefits of editing Machine Translation and Translation Memory outputs when translating from English to Welsh. Using software capable of timing the translation process per segment, 8 professional translators each translated 75 sentences of differing match percentage, and post- edited a further 25 segments of Machine Translation. Basing the final analysis on 800 sentences and 17,440 words, the use of Fuzzy Matches in the 70-99% match range, Exact Matches and Statistical Machine Translation was found to significantly speed up the translation process. Significant correlations were also found between the processing time data of Exact Matches and Machine Translation post-editing, rather than between Fuzzy Matches and Machine Translation as expected. Two experienced translators were then asked to rate all translations for fidelity, grammaticality and style, whereby it was found that the use of translation technology either did not negatively affect translation quality compared to manual translation, or its use actually improved final quality in some cases. As well as confirming the findings of research in relation to translation technology, these findings also contradict supposed similarities between translation quality in terms of style and post-editing Machine Translation.


2015 ◽  
Vol 41 (2) ◽  
pp. 185-214 ◽  
Author(s):  
Nadir Durrani ◽  
Helmut Schmid ◽  
Alexander Fraser ◽  
Philipp Koehn ◽  
Hinrich Schütze

In this article, we present a novel machine translation model, the Operation Sequence Model (OSM), which combines the benefits of phrase-based and N-gram-based statistical machine translation (SMT) and remedies their drawbacks. The model represents the translation process as a linear sequence of operations. The sequence includes not only translation operations but also reordering operations. As in N-gram-based SMT, the model is: (i) based on minimal translation units, (ii) takes both source and target information into account, (iii) does not make a phrasal independence assumption, and (iv) avoids the spurious phrasal segmentation problem. As in phrase-based SMT, the model (i) has the ability to memorize lexical reordering triggers, (ii) builds the search graph dynamically, and (iii) decodes with large translation units during search. The unique properties of the model are (i) its strong coupling of reordering and translation where translation and reordering decisions are conditioned on n previous translation and reordering decisions, and (ii) the ability to model local and long-range reorderings consistently. Using BLEU as a metric of translation accuracy, we found that our system performs significantly better than state-of-the-art phrase-based systems (Moses and Phrasal) and N-gram-based systems (Ncode) on standard translation tasks. We compare the reordering component of the OSM to the Moses lexical reordering model by integrating it into Moses. Our results show that OSM outperforms lexicalized reordering on all translation tasks. The translation quality is shown to be improved further by learning generalized representations with a POS-based OSM.


Proceedings ◽  
2020 ◽  
Vol 63 (1) ◽  
pp. 56
Author(s):  
Bianca Han

This paper reflects the technology-induced novelty of translation, which is perceived as a bridge between languages and cultures. We debate the extent to which the translation process maintains its specificity in the light of the new technology-enhanced working methods ensured by a large variety of Computer-Assisted Translation (CAT) and Machine Translation (MT) tools that aim to enhance the process, which includes the translation itself, the translator, the translation project manager, the linguist, the terminologist, the reviewer, and the client. This paper also hints at the topic from the perspective of the translation teacher, who needs to provide students with transversal competencies that are suitable for the digital area, supported by the ability to tackle Cloud-based translation tools, in view of Industry 4.0 requirements.


Electronics ◽  
2021 ◽  
Vol 10 (13) ◽  
pp. 1589
Author(s):  
Yongkeun Hwang ◽  
Yanghoon Kim ◽  
Kyomin Jung

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-36
Author(s):  
Sameen Maruf ◽  
Fahimeh Saleh ◽  
Gholamreza Haffari

Machine translation (MT) is an important task in natural language processing (NLP), as it automates the translation process and reduces the reliance on human translators. With the resurgence of neural networks, the translation quality surpasses that of the translations obtained using statistical techniques for most language-pairs. Up until a few years ago, almost all of the neural translation models translated sentences independently , without incorporating the wider document-context and inter-dependencies among the sentences. The aim of this survey article is to highlight the major works that have been undertaken in the space of document-level machine translation after the neural revolution, so researchers can recognize the current state and future directions of this field. We provide an organization of the literature based on novelties in modelling and architectures as well as training and decoding strategies. In addition, we cover evaluation strategies that have been introduced to account for the improvements in document MT, including automatic metrics and discourse-targeted test sets. We conclude by presenting possible avenues for future exploration in this research field.


2015 ◽  
Author(s):  
Jinsong Su ◽  
Deyi Xiong ◽  
Yang Liu ◽  
Xianpei Han ◽  
Hongyu Lin ◽  
...  

2011 ◽  
Vol 95 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Bushra Jawaid ◽  
Daniel Zeman

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.


Author(s):  
Maxim Roy

Machine Translation (MT) from Bangla to English has recently become a priority task for the Bangla Natural Language Processing (NLP) community. Statistical Machine Translation (SMT) systems require a significant amount of bilingual data between language pairs to achieve significant translation accuracy. However, being a low-density language, such resources are not available in Bangla. In this chapter, the authors discuss how machine learning approaches can help to improve translation quality within as SMT system without requiring a huge increase in resources. They provide a novel semi-supervised learning and active learning framework for SMT, which utilizes both labeled and unlabeled data. The authors discuss sentence selection strategies in detail and perform detailed experimental evaluations on the sentence selection methods. In semi-supervised settings, reversed model approach outperformed all other approaches for Bangla-English SMT, and in active learning setting, geometric 4-gram and geometric phrase sentence selection strategies proved most useful based on BLEU score results over baseline approaches. Overall, in this chapter, the authors demonstrate that for low-density language like Bangla, these machine-learning approaches can improve translation quality.


2017 ◽  
Vol 108 (1) ◽  
pp. 283-294 ◽  
Author(s):  
Álvaro Peris ◽  
Mara Chinea-Ríos ◽  
Francisco Casacuberta

AbstractCorpora are precious resources, as they allow for a proper estimation of statistical machine translation models. Data selection is a variant of the domain adaptation field, aimed to extract those sentences from an out-of-domain corpus that are the most useful to translate a different target domain. We address the data selection problem in statistical machine translation as a classification task. We present a new method, based on neural networks, able to deal with monolingual and bilingual corpora. Empirical results show that our data selection method provides slightly better translation quality, compared to a state-of-the-art method (cross-entropy), requiring substantially less data. Moreover, the results obtained are coherent across different language pairs, demonstrating the robustness of our proposal.


2013 ◽  
Vol 39 (4) ◽  
pp. 1067-1108 ◽  
Author(s):  
Sara Stymne ◽  
Nicola Cancedda ◽  
Lars Ahrenberg

In this article we investigate statistical machine translation (SMT) into Germanic languages, with a focus on compound processing. Our main goal is to enable the generation of novel compounds that have not been seen in the training data. We adopt a split-merge strategy, where compounds are split before training the SMT system, and merged after the translation step. This approach reduces sparsity in the training data, but runs the risk of placing translations of compound parts in non-consecutive positions. It also requires a postprocessing step of compound merging, where compounds are reconstructed in the translation output. We present a method for increasing the chances that components that should be merged are translated into contiguous positions and in the right order and show that it can lead to improvements both by direct inspection and in terms of standard translation evaluation metrics. We also propose several new methods for compound merging, based on heuristics and machine learning, which outperform previously suggested algorithms. These methods can produce novel compounds and a translation with at least the same overall quality as the baseline. For all subtasks we show that it is useful to include part-of-speech based information in the translation process, in order to handle compounds.


Sign in / Sign up

Export Citation Format

Share Document