scholarly journals Graded Acceptance in Corpus-Based English-to-Spanish Machine Translation Evaluation

10.29007/r819 ◽  
2018 ◽  
Author(s):  
Mario Crespo Miguel ◽  
Marta Sanchez-Saus Laserna

Traditionally, texts provided by machine translation have been evaluated with a binary criterion: right or wrong. However, in certain cases it is difficult to provide a clear-cut division between fully acceptable and fully unacceptable texts. In this paper we have selected group of different bilingual, human-translated English-to-Spanish pairs of sentences from parallel corpora (Linguee) and a group of machine translated texts with problematic linguistic phenomena in English-to-Spanish translation: polysemy, semantic equivalents, passive, anaphora, etc. We presented the translations to a group of native speakers that evaluated them in different levels of acceptability. Results show the degree of applicability of this approach.

2020 ◽  
Vol 10 (11) ◽  
pp. 3904
Author(s):  
Van-Hai Vu ◽  
Quang-Phuoc Nguyen ◽  
Joon-Choul Shin ◽  
Cheol-Young Ock

Machine translation (MT) has recently attracted much research on various advanced techniques (i.e., statistical-based and deep learning-based) and achieved great results for popular languages. However, the research on it involving low-resource languages such as Korean often suffer from the lack of openly available bilingual language resources. In this research, we built the open extensive parallel corpora for training MT models, named Ulsan parallel corpora (UPC). Currently, UPC contains two parallel corpora consisting of Korean-English and Korean-Vietnamese datasets. The Korean-English dataset has over 969 thousand sentence pairs, and the Korean-Vietnamese parallel corpus consists of over 412 thousand sentence pairs. Furthermore, the high rate of homographs of Korean causes an ambiguous word issue in MT. To address this problem, we developed a powerful word-sense annotation system based on a combination of sub-word conditional probability and knowledge-based methods, named UTagger. We applied UTagger to UPC and used these corpora to train both statistical-based and deep learning-based neural MT systems. The experimental results demonstrated that using UPC, high-quality MT systems (in terms of the Bi-Lingual Evaluation Understudy (BLEU) and Translation Error Rate (TER) score) can be built. Both UPC and UTagger are available for free download and usage.


2016 ◽  
Vol 22 (4) ◽  
pp. 517-548 ◽  
Author(s):  
ANN IRVINE ◽  
CHRIS CALLISON-BURCH

AbstractWe use bilingual lexicon induction techniques, which learn translations from monolingual texts in two languages, to build an end-to-end statistical machine translation (SMT) system without the use of any bilingual sentence-aligned parallel corpora. We present detailed analysis of the accuracy of bilingual lexicon induction, and show how a discriminative model can be used to combine various signals of translation equivalence (like contextual similarity, temporal similarity, orthographic similarity and topic similarity). Our discriminative model produces higher accuracy translations than previous bilingual lexicon induction techniques. We reuse these signals of translation equivalence as features on a phrase-based SMT system. These monolingually estimated features enhance low resource SMT systems in addition to allowing end-to-end machine translation without parallel corpora.


2021 ◽  
Author(s):  
Vânia Mendonça ◽  
Ricardo Rei ◽  
Luisa Coheur ◽  
Alberto Sardinha ◽  
Ana Lúcia Santos

Author(s):  
Raj Dabre ◽  
Atsushi Fujita

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.


1999 ◽  
Vol 42 (3) ◽  
pp. 281-294
Author(s):  
Y. Toukourou ◽  
K.-J. Peters

Abstract. Title of the paper: Impaet of feed restriction on the growth performance of goat kids The influence of differential feeding levels on growth performance in 72 goat kids "Bunte Deutsche Edelziege" during the pre-weaning period was examined. The 72 animals were assigned to a control group and two experimental groups that received respectively 20% and 40% less milk/less concentrate compared to the control (fed at 2.4 times energy demand for maintenance). The experimental gained animals significantly less relative to the control group. However, during the subsequent realimentation period when all animals were fed at a energy level of 2.4 times maintenance same treatment, the daily weight gain among the kids was in inverse proportion to the level ofmilk deprivation in the pre-weaning phase. The rapid growth among the experimental animals was such that the initial differences in body weight between the experimental and control groups were fully compensated. Growth performance of kids with respect to different levels of concentrated feed was less clear cut and d.ffered significantly only behveen the group that received the lowest feed level relative to all the other groups.


2020 ◽  
Author(s):  
Wei Zhao ◽  
Goran Glavaš ◽  
Maxime Peyrard ◽  
Yang Gao ◽  
Robert West ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Michael Adjeisah ◽  
Guohua Liu ◽  
Douglas Omwenga Nyabuga ◽  
Richard Nuetey Nortey ◽  
Jinling Song

Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.


Sign in / Sign up

Export Citation Format

Share Document