scholarly journals Post-editing Machine Translation in MateCat: a classroom experiment

2021 ◽  
Author(s):  
Katrin Herget

Advances in machine translation resulted in an increase of both volume and quality of machine-translated texts. However, machine translation still requires humans to post-edit the translation. This paper proposes a product-based approach of a post-editing (PE) experiment that was carried out with a total of 10 MA translation students. The goal of this study comprised both the analysis of the post-editing results performed by student translators involving a machine-translated text in MateCat and the subsequent error markup. By comparing the quality reports obtained at the end of the post-editing process, we analysed the linguistic quality results and observed a heterogeneous error distribution, considerable divergence in severity level ratings and a huge span of TTE (time to edit). This study aims at making a contribution to the integration of post-editing activities into the translation technology classroom for students without prior experience in PE.

Author(s):  
Arief Gilang Ramadhan ◽  
Teguh Susyanto ◽  
Iwan Ady Prabowo

Ducks are widely farmed poultry animals, because of its large population, the disease is also caused a lot. In avian influenza disease is an infectious animal disease caused by a virus and can infect humans. Avian invluenza is also a cause of decreased quality of meat and eggs, can even cause death in ducks. Disease in ducks is difficult to know because people have no prior experience. Communities and breeders find it difficult to take appropriate action on ducks affected by avian influenza so that they can be fatal. During 2012 - 2017 there were 1856 cases of bird flu in Indonesia. In October 2017 there were 3 cases of avian invluenza. Where 900 ducks died. From the explanation above, an expert system is needed to diagnose avian influenza. This system can help ordinary people and farmers who do not have experience in dealing with avian influenza. Disease data and symptoms of avian influenza that will be entered into this expert system can be obtained from interviews with experts and see references of knowledge about duck symptoms. So that this expert system can help in dealing with AI disease. In testing the validity, the calculation of the system manually and from expert diagnosis are the same. So that this application is feasible to use for the community and breeders.


2011 ◽  
Vol 95 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Bushra Jawaid ◽  
Daniel Zeman

Word-Order Issues in English-to-Urdu Statistical Machine Translation We investigate phrase-based statistical machine translation between English and Urdu, two Indo-European languages that differ significantly in their word-order preferences. Reordering of words and phrases is thus a necessary part of the translation process. While local reordering is modeled nicely by phrase-based systems, long-distance reordering is known to be a hard problem. We perform experiments using the Moses SMT system and discuss reordering models available in Moses. We then present our novel, Urdu-aware, yet generalizable approach based on reordering phrases in syntactic parse tree of the source English sentence. Our technique significantly improves quality of English-Urdu translation with Moses, both in terms of BLEU score and of subjective human judgments.


Author(s):  
Raj Dabre ◽  
Atsushi Fujita

In encoder-decoder based sequence-to-sequence modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in the encoder and decoder. While the addition of each new layer improves the sequence generation quality, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked sequence-to-sequence model. We report on an extensive case study on neural machine translation (NMT) using our proposed method, experimenting with a variety of datasets. We empirically show that the translation quality of a model that recurrently stacks a single-layer 6 times, despite its significantly fewer parameters, approaches that of a model that stacks 6 different layers. We also show how our method can benefit from a prevalent way for improving NMT, i.e., extending training data with pseudo-parallel corpora generated by back-translation. We then analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not. Finally, we explore the limits of parameter sharing where we share even the parameters between the encoder and decoder in addition to recurrent stacking of layers.


Author(s):  
Sapta Sari

In  line  with  times,  television  news  center  has  been  equipped  with  a  variety  of sophisticated tools, such as the ones used in the production and editing process of the Company Bureau (PERJAN) TVRI Bengkulu. Some of the tools used were: ENG (Electronic News Gathering), a large movie camera (Sound On Film) electronic editing tools, broadcasting space, VTR space, and various other supporting tools that are used to improve the quality of broadcasting. In this research, researchers used a qualitative descriptive approach by interview, observation and documentation, followed by data analysis. The result of this research found that the process of delivering news on TVRI Bengkulu, particularly on program Bengkulu dalam Berita, has been carried out by adjusting the news delivery technique through two ways: ie the way the British were serious and formal and casual events of the United States. Both methods have been implemented in the event of this program.Keywords: editing process, news 


2021 ◽  
Vol 18 (1) ◽  
pp. 217-234
Author(s):  
Katarina Welnitzova ◽  
Barbara Jakubickova ◽  
Roman Králik

Digitalization is one of the key distinctive features of modern environment and social life. Nowadays more and more functions are transferred to the artificial mind. How effective is the replacement of human activity with computer activity? In the given article, this problem is solved by an example of integration of digital technologies into translation activities. It this paper, emphasis is placed on the quality of machine translation (MT) output of legal texts in the language pair English - Slovak. It studies a Criminal Code formulated in the Slovak language which was translated by a human translator into English and consequently via machine translation system Google Translate (GT) back into Slovak. The back-translation - translation of a translated text back into its original language - as a quality assessment tool to detect discrepancies, mistranslations and inevitable differences between the source text and the target text was used. The quality of MT output was evaluated according to Multidimensional Quality Metrics (MQM) standards with the focus on the dimension of Fluency. The multiple comparisons were applied to determine which issues (errors) in Fluency dimension differ from the others. A statistically significant difference is noticed between Agreement and other issues, as well as between Ambiguity and other issues. The errors in Agreement are related to the differences between the languages: English is considered mostly an analytic language, Slovak represents a synthetic language. The issues in the Ambiguity dimension correlate with the type of the text being examined, since legal texts are characterized by relatively complicated wording and numerous terms; moreover, accuracy and unambiguity need to be preserved. Generally, the MT output is able to provide users with basic information about the text. On the other hand, most of the segments need revision and/or correction; in such cases, human intervention and post-editing is necessary.


2018 ◽  
Vol 8 (6) ◽  
pp. 3512-3514
Author(s):  
D. Chopra ◽  
N. Joshi ◽  
I. Mathur

Machine translation (MT) has been a topic of great research during the last sixty years, but, improving its quality is still considered an open problem. In the current paper, we will discuss improvements in MT quality by the use of the ensemble approach. We performed MT from English to Hindi using 6 MT different engines described in this paper. We found that the quality of MT is improved by using a combination of various approaches as compared to the simple baseline approach for performing MT from source to target text.


2020 ◽  
Vol 34 (05) ◽  
pp. 9282-9289
Author(s):  
Qingyang Wu ◽  
Lei Li ◽  
Hao Zhou ◽  
Ying Zeng ◽  
Zhou Yu

Many social media news writers are not professionally trained. Therefore, social media platforms have to hire professional editors to adjust amateur headlines to attract more readers. We propose to automate this headline editing process through neural network models to provide more immediate writing support for these social media news writers. To train such a neural headline editing model, we collected a dataset which contains articles with original headlines and professionally edited headlines. However, it is expensive to collect a large number of professionally edited headlines. To solve this low-resource problem, we design an encoder-decoder model which leverages large scale pre-trained language models. We further improve the pre-trained model's quality by introducing a headline generation task as an intermediate task before the headline editing task. Also, we propose Self Importance-Aware (SIA) loss to address the different levels of editing in the dataset by down-weighting the importance of easily classified tokens and sentences. With the help of Pre-training, Adaptation, and SIA, the model learns to generate headlines in the professional editor's style. Experimental results show that our method significantly improves the quality of headline editing comparing against previous methods.


Author(s):  
Yang Zhao ◽  
Jiajun Zhang ◽  
Yu Zhou ◽  
Chengqing Zong

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.


2019 ◽  
Vol 28 (3) ◽  
pp. 447-453 ◽  
Author(s):  
Sainik Kumar Mahata ◽  
Dipankar Das ◽  
Sivaji Bandyopadhyay

Abstract Machine translation (MT) is the automatic translation of the source language to its target language by a computer system. In the current paper, we propose an approach of using recurrent neural networks (RNNs) over traditional statistical MT (SMT). We compare the performance of the phrase table of SMT to the performance of the proposed RNN and in turn improve the quality of the MT output. This work has been done as a part of the shared task problem provided by the MTIL2017. We have constructed the traditional MT model using Moses toolkit and have additionally enriched the language model using external data sets. Thereafter, we have ranked the phrase tables using an RNN encoder-decoder module created originally as a part of the GroundHog project of LISA lab.


Author(s):  
A.V. Kozina ◽  
Yu.S. Belov

Automatically assessing the quality of machine translation is an important yet challenging task for machine translation research. Translation quality assessment is understood as predicting translation quality without reference to the source text. Translation quality depends on the specific machine translation system and often requires post-editing. Manual editing is a long and expensive process. Since the need to quickly determine the quality of translation increases, its automation is required. In this paper, we propose a quality assessment method based on ensemble supervised machine learning methods. The bilingual corpus WMT 2019 for the EnglishRussian language pair was used as data. The text data volume is 17089 sentences, 85% of the data was used for training, and 15% for testing the model. Linguistic functions extracted from the text in the source and target languages were used as features for training the system, since it is these characteristics that can most accurately characterize the translation in terms of quality. The following tools were used for feature extraction: a free language modeling tool based on SRILM and a Stanford POS Tagger parts of speech tagger. Before training the system, the text was preprocessed. The model was trained using three regression methods: Bagging, Extra Tree, and Random Forest. The algorithms were implemented in the Python programming language using the Scikit learn library. The parameters of the random forest method have been optimized using a grid search. The performance of the model was assessed by the mean absolute error MAE and the root mean square error RMSE, as well as by the Pearsоn coefficient, which determines the correlation with human judgment. Testing was carried out using three machine translation systems: Google and Bing neural systems, Mouses statistical machine translation systems based on phrases and based on syntax. Based on the results of the work, the method of additional trees showed itself best. In addition, for all categories of indicators under consideration, the best results are achieved using the Google machine translation system. The developed method showed good results close to human judgment. The system can be used for further research in the task of assessing the quality of translation.


Sign in / Sign up

Export Citation Format

Share Document