Mining Parallel Knowledge from Comparable Patents

The extracted parallel sentences and technical terms could be a good basis for further acquisition of term relations and the translation of monolingual ontologies, as well as for statistical machine translation systems and other cross-lingual information access applications.

Download Full-text

A classification approach for detecting cross-lingual biomedical term translations

Natural Language Engineering ◽

10.1017/s1351324915000431 ◽

2015 ◽

Vol 23 (1) ◽

pp. 31-51 ◽

Cited By ~ 3

Author(s):

H. HAKAMI ◽

D. BOLLEGALA

Keyword(s):

Machine Translation ◽

Feature Space ◽

Target Language ◽

Average Precision ◽

Common Features ◽

Target Languages ◽

Cross Lingual ◽

Translation Accuracy ◽

Translation Systems ◽

Technical Terms

AbstractFinding translations for technical terms is an important problem in machine translation. In particular, in highly specialized domains such as biology or medicine, it is difficult to find bilingual experts to annotate sufficient cross-lingual texts in order to train machine translation systems. Moreover, new terms are constantly being generated in the biomedical community, which makes it difficult to keep the translation dictionaries up to date for all language pairs of interest. Given a biomedical term in one language (source language), we propose a method for detecting its translations in a different language (target language). Specifically, we train a binary classifier to determine whether two biomedical terms written in two languages are translations. Training such a classifier is often complicated due to the lack of common features between the source and target languages. We propose several feature space concatenation methods to successfully overcome this problem. Moreover, we study the effectiveness of contextual and character n-gram features for detecting term translations. Experiments conducted using a standard dataset for biomedical term translation show that the proposed method outperforms several competitive baseline methods in terms of mean average precision and top-k translation accuracy.

Download Full-text

Source Language Adaptation Approaches for Resource-Poor Machine Translation

Computational Linguistics ◽

10.1162/coli_a_00248 ◽

2016 ◽

Vol 42 (2) ◽

pp. 277-306 ◽

Cited By ~ 8

Author(s):

Pidong Wang ◽

Preslav Nakov ◽

Hwee Tou Ng

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Source Language ◽

World Languages ◽

Word Level ◽

Resource Poor ◽

Morphological Variants ◽

Cross Lingual ◽

Translation Systems

Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.

Download Full-text

Why Translation is Difficult: A Corpus-Based Study of Non-Literality in Post-Editing and From-Scratch Translation

HERMES - Journal of Language and Communication in Business ◽

10.7146/hjlcb.v0i56.97201 ◽

2017 ◽

pp. 43 ◽

Cited By ~ 5

Author(s):

Michael Carl ◽

Moritz Jonas Schaeffer

Keyword(s):

Empirical Evidence ◽

Machine Translation ◽

Semantic Similarity ◽

Statistical Machine Translation ◽

Semantic Distance ◽

Multilingual Corpus ◽

Definition Of ◽

Cross Lingual ◽

Translation Systems

The paper develops a definition of translation literality that is based on the syntactic and semantic similarity of the source and the target texts. We provide theoretical and empirical evidence that absolute literal translations are easy to produce. Based on a multilingual corpus of alternative translations we investigate the effects of cross-lingual syntactic and semantic distance on translation production times and find that non-literality makes from-scratch translation and post-editing difficult. We show that statistical machine translation systems encounter even more difficulties with non-literality.

Download Full-text

Word Reordering Alignment for Combination of Statistical Machine Translation Systems

2008 6th International Symposium on Chinese Spoken Language Processing ◽

10.1109/chinsl.2008.ecp.80 ◽

2008 ◽

Cited By ~ 1

Author(s):

Maoxi Li ◽

Chengqing Zong

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Systems

Download Full-text

ParFDA for Fast Deployment of Accurate Statistical Machine Translation Systems, Benchmarks, and Statistics

10.18653/v1/w15-3005 ◽

2015 ◽

Cited By ~ 4

Author(s):

Ergun Bicici ◽

Qun Liu ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Systems

Download Full-text

NICT’s Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task

10.18653/v1/w18-6419 ◽

2018 ◽

Cited By ~ 3

Author(s):

Benjamin Marie ◽

Rui Wang ◽

Atsushi Fujita ◽

Masao Utiyama ◽

Eiichiro Sumita

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Systems

Download Full-text

Translation of Medical Texts using Neural Networks

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2016100104 ◽

2016 ◽

Vol 5 (4) ◽

pp. 51-66 ◽

Cited By ~ 5

Author(s):

Krzysztof Wolk ◽

Krzysztof P. Marasek

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

European Medicines Agency ◽

Translation System ◽

Training Methods ◽

Neural Machine Translation ◽

Machine Translation System ◽

Source Sentence ◽

Parallel Text ◽

Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text

Bagging and Boosting statistical machine translation systems

Artificial Intelligence ◽

10.1016/j.artint.2012.11.005 ◽

2013 ◽

Vol 195 ◽

pp. 496-527 ◽

Cited By ~ 10

Author(s):

Tong Xiao ◽

Jingbo Zhu ◽

Tongran Liu

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Systems

Download Full-text

iBLEU: Interactively Debugging and Scoring Statistical Machine Translation Systems

2011 IEEE Fifth International Conference on Semantic Computing ◽

10.1109/icsc.2011.36 ◽

2011 ◽

Cited By ~ 4

Author(s):

Nitin Madnani

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation Systems

Download Full-text

Extracting parallel phrases from comparable data for machine translation

Natural Language Engineering ◽

10.1017/s1351324916000139 ◽

2016 ◽

Vol 22 (4) ◽

pp. 549-573 ◽

Cited By ~ 3

Author(s):

SANJIKA HEWAVITHARANA ◽

STEPHAN VOGEL

Keyword(s):

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Word Alignment ◽

Data Set ◽

Comparable Corpora ◽

Alignment Algorithms ◽

Extraction Algorithm ◽

Phrase Alignment ◽

Translation Systems

AbstractMining parallel data from comparable corpora is a promising approach for overcoming the data sparseness in statistical machine translation and other natural language processing applications. In this paper, we address the task of detecting parallel phrase pairs embedded in comparable sentence pairs. We present a novel phrase alignment approach that is designed to only align parallel sections bypassing non-parallel sections of the sentence. We compare the proposed approach with two other alignment methods: (1) the standard phrase extraction algorithm, which relies on the Viterbi path of the word alignment, (2) a binary classifier to detect parallel phrase pairs when presented with a large collection of phrase pair candidates. We evaluate the accuracy of these approaches using a manually aligned data set, and show that the proposed approach outperforms the other two approaches. Finally, we demonstrate the effectiveness of the extracted phrase pairs by using them in Arabic–English and Urdu–English translation systems, which resulted in improvements upto 1.2 Bleu over the baseline. The main contributions of this paper are two-fold: (1) novel phrase alignment algorithms to extract parallel phrase pairs from comparable sentences, (2) evaluating the utility of the extracted phrases by using them directly in the MT decoder.

Download Full-text