Graded Acceptance in Corpus-Based English-to-Spanish Machine Translation Evaluation

Mapping Intimacies ◽

10.29007/r819 ◽

2018 ◽

Author(s):

Mario Crespo Miguel ◽

Marta Sanchez-Saus Laserna

Keyword(s):

Machine Translation ◽

Native Speakers ◽

Spanish Translation ◽

Machine Translation Evaluation ◽

Parallel Corpora ◽

Clear Cut ◽

Different Levels

Traditionally, texts provided by machine translation have been evaluated with a binary criterion: right or wrong. However, in certain cases it is difficult to provide a clear-cut division between fully acceptable and fully unacceptable texts. In this paper we have selected group of different bilingual, human-translated English-to-Spanish pairs of sentences from parallel corpora (Linguee) and a group of machine translated texts with problematic linguistic phenomena in English-to-Spanish translation: polysemy, semantic equivalents, passive, anaphora, etc. We presented the translations to a group of native speakers that evaluated them in different levels of acceptability. Results show the degree of applicability of this approach.

Download Full-text

Accurate semantic textual similarity for cleaning noisy parallel corpora using semantic machine translation evaluation metric: The NRC supervised submissions to the Parallel Corpus Filtering task

10.18653/v1/w18-6481 ◽

2018 ◽

Author(s):

Chi-kiu Lo ◽

Michel Simard ◽

Darlene Stewart ◽

Samuel Larkin ◽

Cyril Goutte ◽

...

Keyword(s):

Machine Translation ◽

Machine Translation Evaluation ◽

Parallel Corpora ◽

Parallel Corpus ◽

Evaluation Metric ◽

Semantic Textual Similarity

Download Full-text

UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study

Applied Sciences ◽

10.3390/app10113904 ◽

2020 ◽

Vol 10 (11) ◽

pp. 3904

Author(s):

Van-Hai Vu ◽

Quang-Phuoc Nguyen ◽

Joon-Choul Shin ◽

Cheol-Young Ock

Keyword(s):

Deep Learning ◽

Machine Translation ◽

Ambiguous Word ◽

High Rate ◽

Word Sense ◽

Language Resources ◽

Parallel Corpora ◽

Knowledge Based ◽

Translation Error ◽

Translation Study

Machine translation (MT) has recently attracted much research on various advanced techniques (i.e., statistical-based and deep learning-based) and achieved great results for popular languages. However, the research on it involving low-resource languages such as Korean often suffer from the lack of openly available bilingual language resources. In this research, we built the open extensive parallel corpora for training MT models, named Ulsan parallel corpora (UPC). Currently, UPC contains two parallel corpora consisting of Korean-English and Korean-Vietnamese datasets. The Korean-English dataset has over 969 thousand sentence pairs, and the Korean-Vietnamese parallel corpus consists of over 412 thousand sentence pairs. Furthermore, the high rate of homographs of Korean causes an ambiguous word issue in MT. To address this problem, we developed a powerful word-sense annotation system based on a combination of sub-word conditional probability and knowledge-based methods, named UTagger. We applied UTagger to UPC and used these corpora to train both statistical-based and deep learning-based neural MT systems. The experimental results demonstrated that using UPC, high-quality MT systems (in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score) can be built. Both UPC and UTagger are available for free download and usage.

Download Full-text

Machine Translation Evaluation using Textual Entailment for Arabic

2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS) ◽

10.1109/snams52053.2020.9336580 ◽

2020 ◽

Author(s):

Mohamed El Marouani ◽

Tarik Boudaa ◽

Nourddine Enneya

Keyword(s):

Machine Translation ◽

Machine Translation Evaluation ◽

Textual Entailment

Download Full-text

Data selection and smoothing in an open-source system for the 2008 NIST machine translation evaluation

10.21437/interspeech.2008-676 ◽

2008 ◽

Author(s):

Holger Schwenk ◽

Yannick Esteve

Keyword(s):

Open Source ◽

Machine Translation ◽

Data Selection ◽

Machine Translation Evaluation

Download Full-text

End-to-end statistical machine translation with zero or small parallel texts

Natural Language Engineering ◽

10.1017/s1351324916000127 ◽

2016 ◽

Vol 22 (4) ◽

pp. 517-548 ◽

Cited By ~ 5

Author(s):

ANN IRVINE ◽

CHRIS CALLISON-BURCH

Keyword(s):

Detailed Analysis ◽

Machine Translation ◽

Statistical Machine Translation ◽

Parallel Corpora ◽

Low Resource ◽

Bilingual Lexicon ◽

Orthographic Similarity ◽

Discriminative Model ◽

End To End ◽

Parallel Texts

AbstractWe use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.

Download Full-text

Online Learning Meets Machine Translation Evaluation: Finding the Best Systems with the Least Human Effort

10.18653/v1/2021.acl-long.242 ◽

2021 ◽

Author(s):

Vânia Mendonça ◽

Ricardo Rei ◽

Luisa Coheur ◽

Alberto Sardinha ◽

Ana Lúcia Santos

Keyword(s):

Online Learning ◽

Machine Translation ◽

Machine Translation Evaluation ◽

Human Effort

Download Full-text

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016292 ◽

2019 ◽

Vol 33 ◽

pp. 6292-6299 ◽

Cited By ~ 2

Author(s):

Raj Dabre ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Single Layer ◽

Training Data ◽

Neural Machine Translation ◽

Parallel Corpora ◽

Translation Quality ◽

Sequence Generation ◽

Sequence Modeling ◽

Back Translation

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.

Download Full-text

Auswirkungen restriktiver Ernährung auf die Wachstumsleistung von Ziegenlämmern

Archives Animal Breeding ◽

10.5194/aab-42-281-1999 ◽

1999 ◽

Vol 42 (3) ◽

pp. 281-294

Author(s):

Y. Toukourou ◽

K.-J. Peters

Keyword(s):

Growth Performance ◽

Energy Demand ◽

Daily Weight Gain ◽

Feed Restriction ◽

Control Group ◽

Control Groups ◽

Experimental Animals ◽

Clear Cut ◽

And Control ◽

Different Levels

Abstract. Title of the paper: Impaet of feed restriction on the growth performance of goat kids The influence of differential feeding levels on growth performance in 72 goat kids "Bunte Deutsche Edelziege" during the pre-weaning period was examined. The 72 animals were assigned to a control group and two experimental groups that received respectively 20% and 40% less milk/less concentrate compared to the control (fed at 2.4 times energy demand for maintenance). The experimental gained animals significantly less relative to the control group. However, during the subsequent realimentation period when all animals were fed at a energy level of 2.4 times maintenance same treatment, the daily weight gain among the kids was in inverse proportion to the level ofmilk deprivation in the pre-weaning phase. The rapid growth among the experimental animals was such that the initial differences in body weight between the experimental and control groups were fully compensated. Growth performance of kids with respect to different levels of concentrated feed was less clear cut and d.ffered significantly only behveen the group that received the lowest feed level relative to all the other groups.

Download Full-text

On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation

10.18653/v1/2020.acl-main.151 ◽

2020 ◽

Author(s):

Wei Zhao ◽

Goran Glavaš ◽

Maxime Peyrard ◽

Yang Gao ◽

Robert West ◽

...

Keyword(s):

Machine Translation ◽

Machine Translation Evaluation ◽

Cross Lingual ◽

Free Machine

Download Full-text

Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation

Computational Intelligence and Neuroscience ◽

10.1155/2021/6682385 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Michael Adjeisah ◽

Guohua Liu ◽

Douglas Omwenga Nyabuga ◽

Richard Nuetey Nortey ◽

Jinling Song

Keyword(s):

Machine Translation ◽

Language Processing ◽

Training Data ◽

Target Language ◽

Similarity Metrics ◽

Mahalanobis Distances ◽

Parallel Corpora ◽

Parallel Corpus ◽

Low Resource ◽

Sentence Level

Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.

Download Full-text