iBLEU: Interactively Debugging and Scoring Statistical Machine Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text

Bagging and Boosting statistical machine translation systems

Artificial Intelligence ◽

10.1016/j.artint.2012.11.005 ◽

2013 ◽

Vol 195 ◽

pp. 496-527 ◽

Cited By ~ 10

Author(s):

Tong Xiao ◽

Jingbo Zhu ◽

Tongran Liu

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Systems

Download Full-text

Extracting parallel phrases from comparable data for machine translation

Natural Language Engineering ◽

10.1017/s1351324916000139 ◽

2016 ◽

Vol 22 (4) ◽

pp. 549-573 ◽

Cited By ~ 3

Author(s):

SANJIKA HEWAVITHARANA ◽

STEPHAN VOGEL

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Word Alignment ◽

Data Set ◽

Comparable Corpora ◽

Alignment Algorithms ◽

Extraction Algorithm ◽

Phrase Alignment ◽

Translation Systems

AbstractMining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other natural language processing applications. In this paper, we address the task of detecting parallel phrase pairs embedded in comparable sentence pairs. We present a novel phrase alignment approach that is designed to only align parallel sections bypassing non-parallel sections of the sentence. We compare the proposed approach with two other alignment methods: (1) the standard phrase extraction algorithm, which relies on the Viterbi path of the word alignment, (2) a binary classifier to detect parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the accuracy of these approaches using a manually aligned data set, and show that the proposed approach outperforms the other two approaches. Finally, we demonstrate the effectiveness of the extracted phrase pairs by using them in Arabic–English and Urdu–English translation systems, which resulted in improvements upto 1.2 Bleu over the baseline. The main contributions of this paper are two-fold: (1) novel phrase alignment algorithms to extract parallel phrase pairs from comparable sentences, (2) evaluating the utility of the extracted phrases by using them directly in the MT decoder.

Download Full-text

The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics ◽

10.1162/0891201042544884 ◽

2004 ◽

Vol 30 (4) ◽

pp. 417-449 ◽

Cited By ~ 212

Author(s):

Franz Josef Och ◽

Hermann Ney

Keyword(s):

Machine Translation ◽

Word Order ◽

Search Algorithm ◽

Statistical Machine Translation ◽

Target Language ◽

Linear Modeling ◽

Translation Model ◽

Log Linear ◽

Translation Systems ◽

English Canadian

A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.

Download Full-text

Making sense of neural machine translation

Translation Spaces ◽

10.1075/ts.6.2.06for ◽

2017 ◽

Vol 6 (2) ◽

pp. 291-309 ◽

Cited By ~ 11

Author(s):

Mikel L. Forcada

Keyword(s):

Machine Translation ◽

New Technology ◽

Statistical Machine Translation ◽

Software Requirements ◽

Word Embeddings ◽

Neural Machine Translation ◽

Training Time ◽

Making Sense ◽

Translation Systems ◽

New Machine

Abstract The last few years have witnessed a surge in the interest of a new machine translation paradigm: neural machine translation (NMT). Neural machine translation is starting to displace its corpus-based predecessor, statistical machine translation (SMT). In this paper, I introduce NMT, and explain in detail, without the mathematical complexity, how neural machine translation systems work, how they are trained, and their main differences with SMT systems. The paper will try to decipher NMT jargon such as “distributed representations”, “deep learning”, “word embeddings”, “vectors”, “layers”, “weights”, “encoder”, “decoder”, and “attention”, and build upon these concepts, so that individual translators and professionals working for the translation industry as well as students and academics in translation studies can make sense of this new technology and know what to expect from it. Aspects such as how NMT output differs from SMT, and the hardware and software requirements of NMT, both at training time and at run time, on the translation industry, will be discussed.

Download Full-text

Hybrid System Combination Framework for Uyghur–Chinese Machine Translation

Information ◽

10.3390/info12030098 ◽

2021 ◽

Vol 12 (3) ◽

pp. 98 ◽

Cited By ~ 1

Author(s):

Yajuan Wang ◽

Xiao Li ◽

Yating Yang ◽

Azmat Anwar ◽

Rui Dong

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Chinese Translation ◽

Multiple Systems ◽

System Combination ◽

Combination Methods ◽

Individual System ◽

Final Layer ◽

Combination Approach ◽

Translation Systems

Both the statistical machine translation (SMT) model and neural machine translation (NMT) model are the representative models in Uyghur–Chinese machine translation tasks with their own merits. Thus, it will be a promising direction to combine the advantages of them to further improve the translation performance. In this paper, we present a hybrid framework of developing a system combination for a Uyghur–Chinese machine translation task that works in three layers to achieve better translation results. In the first layer, we construct various machine translation systems including SMT and NMT. In the second layer, the outputs of multiple systems are combined to leverage the advantage of SMT and NMT models by using a multi-source-based system combination approach and the voting-based system combination approaches. Moreover, instead of selecting an individual system’s combined outputs as the final results, we transmit the outputs of the first layer and the second layer into the final layer to make a better prediction. Experiment results on the Uyghur–Chinese translation task show that the proposed framework can significantly outperform the baseline systems in terms of both the accuracy and fluency, which achieves a better performance by 1.75 BLEU points compared with the best individual system and by 0.66 BLEU points compared with the conventional system combination methods, respectively.

Download Full-text

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems

10.3115/v1/w14-3303 ◽

2014 ◽

Cited By ~ 4

Author(s):

Ergun Bicici ◽

Qun Liu ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Systems

Download Full-text

iBLEU: Interactively Debugging and Scoring Statistical Machine Translation Systems

Word Reordering Alignment for Combination of Statistical Machine Translation Systems

ParFDA for Fast Deployment of Accurate Statistical Machine Translation Systems, Benchmarks, and Statistics

NICT’s Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task

Translation of Medical Texts using Neural Networks

Bagging and Boosting statistical machine translation systems

Extracting parallel phrases from comparable data for machine translation

The Alignment Template Approach to Statistical Machine Translation

Making sense of neural machine translation

Hybrid System Combination Framework for Uyghur–Chinese Machine Translation

Parallel FDA5 for Fast Deployment of Accurate Statistical Machine Translation Systems

Export Citation Format