Are Ellipses important for Machine Translation?

Different Types ◽

Theoretical Linguistics ◽

Disparate Treatment ◽

Manual Methods ◽

Target Languages ◽

Abstract This article describes an experiment to evaluate the impact of different types of ellipses discussed in theoretical linguistics on Neural Machine Translation (NMT), using English to Hindi/Telugu as source and target languages. Evaluation with manual methods shows that most of the errors made by Google NMT are located in the clause containing the ellipsis, the frequency of such errors is slightly more in Telugu than Hindi, and the translation adequacy shows improvement when ellipses are reconstructed with their antecedents. These findings not only confirm the importance of ellipses and their resolution for MT, but also hint towards a possible correlation between the translation of discourse devices like ellipses with the morphological incongruity of the source and target. We also observe that not all ellipses are translated poorly and benefit from reconstruction, advocating for a disparate treatment of different ellipses in MT research.

The impact of Neural Machine Translation on Target Languages

Tradumàtica tecnologies de la traducció ◽

10.5565/rev/tradumatica.277 ◽

2020 ◽

pp. 1

Author(s):

Pilar Sánchez-Gijón ◽

Ramon Piqué Huerta

Keyword(s):

Machine Translation ◽

Target Languages ◽

Evaluation of the impact of controlled language on neural machine translation compared to other MT architectures

Machine Translation ◽

10.1007/s10590-019-09233-w ◽

2019 ◽

Vol 33 (1-2) ◽

pp. 179-203 ◽

Cited By ~ 2

Author(s):

Shaimaa Marzouk ◽

Silvia Hansen-Schirra

Keyword(s):

Machine Translation ◽

The Impact ◽

Controlled Language

Not All Contexts are Important: The Impact of Effective Context in Conversational Neural Machine Translation

10.1109/ijcnn52387.2021.9534444 ◽

2021 ◽

Author(s):

Baban Gain ◽

Rejwanul Haque ◽

Asif Ekbal

Keyword(s):

Machine Translation ◽

The impact of some linguistic features on the quality of neural machine translation

Journal of Applied Linguistics and Lexicography ◽

10.33910/2687-0215-2019-1-2-365-370 ◽

2019 ◽

Vol 1 (2) ◽

pp. 365-370

Author(s):

Elena A. Shukshina

Keyword(s):

Machine Translation ◽

Linguistic Features ◽

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

Correct-and-Memorize: Learning to Translate from Interactive Revisions

10.24963/ijcai.2019/730 ◽

2019 ◽

Cited By ~ 1

Author(s):

Rongxiang Weng ◽

Hao Zhou ◽

Shujian Huang ◽

Lei Li ◽

Yifan Xia ◽

...

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Translation Process ◽

Human Interactions ◽

Interactive Environment ◽

Novel Method ◽

Critical Revision ◽

Target Languages ◽

Translation Errors

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods.

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Informatics ◽

10.3390/informatics8010007 ◽

2021 ◽

Vol 8 (1) ◽

pp. 7

Author(s):

Arda Tezcan ◽

Bram Bulté ◽

Bram Vanroy

Keyword(s):

Machine Translation ◽

Data Augmentation ◽

Sentence Length ◽

Added Value ◽

Combination Technique ◽

Translation Quality ◽

Fuzzy Match ◽

The Impact ◽

Matching Techniques

We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0018 ◽

2017 ◽

Vol 108 (1) ◽

pp. 171-182 ◽

Cited By ~ 5

Author(s):

Jinhua Du ◽

Andy Way

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Word Class ◽

Word Embeddings ◽

Parts Of Speech ◽

Translation Quality ◽

The Impact ◽

Japanese English ◽

Target Side

AbstractPre-reordering, a preprocessing to make the source-side word orders close to those of the target side, has been proven very helpful for statistical machine translation (SMT) in improving translation quality. However, is it the case in neural machine translation (NMT)? In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored NMT). The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT. Pre-reordering experiments conducted on Japanese↔English and Chinese↔English show that pre-reordering the source-side data for NMT is redundant and NMT models trained on pre-reordered data deteriorate translation performance. However, factored NMT using SMT-based pre-reordering features on Japanese→English and Chinese→English is beneficial and can further improve by 4.48 and 5.89 relative BLEU points, respectively, compared to the baseline NMT system.

Synthetic Treebanking for Cross-Lingual Dependency Parsing

Journal of Artificial Intelligence Research ◽

10.1613/jair.4785 ◽

2016 ◽

Vol 55 ◽

pp. 209-248 ◽

Cited By ~ 7

Author(s):

Jörg Tiedemann ◽

Zeljko Agić

Keyword(s):

Machine Translation ◽

Target Language ◽

Dependency Parsing ◽

Practical Applications ◽

Source Language ◽

Part Of Speech ◽

Statistical Dependency ◽

Target Languages ◽

Cross Lingual ◽

How do we parse the languages for which no treebanks are available? This contribution addresses the cross-lingual viewpoint on statistical dependency parsing, in which we attempt to make use of resource-rich source language treebanks to build and adapt models for the under-resourced target languages. We outline the benefits, and indicate the drawbacks of the current major approaches. We emphasize synthetic treebanking: the automatic creation of target language treebanks by means of annotation projection and machine translation. We present competitive results in cross-lingual dependency parsing using a combination of various techniques that contribute to the overall success of the method. We further include a detailed discussion about the impact of part-of-speech label accuracy on parsing results that provide guidance in practical applications of cross-lingual methods for truly under-resourced languages.

Machine Translation in Low-Resource Languages by an Adversarial Neural Network

Applied Sciences ◽

10.3390/app112210860 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10860

Author(s):

Mengtao Sun ◽

Hao Wang ◽

Mark Pasquine ◽

Ibrahim A. Hameed

Keyword(s):

Machine Translation ◽

Transfer Learning ◽

Grammatical Structure ◽

Low Resource ◽

High Resource ◽

Learning Techniques ◽

Good Potential ◽

Target Languages ◽

Cross Lingual

Existing Sequence-to-Sequence (Seq2Seq) Neural Machine Translation (NMT) shows strong capability with High-Resource Languages (HRLs). However, this approach poses serious challenges when processing Low-Resource Languages (LRLs), because the model expression is limited by the training scale of parallel sentence pairs. This study utilizes adversary and transfer learning techniques to mitigate the lack of sentence pairs in LRL corpora. We propose a new Low resource, Adversarial, Cross-lingual (LAC) model for NMT. In terms of the adversary technique, LAC model consists of a generator and discriminator. The generator is a Seq2Seq model that produces the translations from source to target languages, while the discriminator measures the gap between machine and human translations. In addition, we introduce transfer learning on LAC model to help capture the features in rare resources because some languages share the same subject-verb-object grammatical structure. Rather than using the entire pretrained LAC model, we separately utilize the pretrained generator and discriminator. The pretrained discriminator exhibited better performance in all experiments. Experimental results demonstrate that the LAC model achieves higher Bilingual Evaluation Understudy (BLEU) scores and has good potential to augment LRL translations.

On the Impact of Various Types of Noise on Neural Machine Translation

10.18653/v1/w18-2709 ◽

2018 ◽

Cited By ~ 2

Author(s):

Huda Khayrallah ◽

Philipp Koehn

Keyword(s):

Machine Translation ◽