scholarly journals Toward the Elaboration of a Spanish-Chinese Parallel Annotated Corpus

10.29007/gxv3 ◽  
2018 ◽  
Author(s):  
Shuyuan Cao ◽  
Iria Da-Cunha ◽  
Mikel Iruskieta

Spanish and Chinese are two very different languages in all language levels. Therefore, translation (both human and machine translation) from one to another and learning one of them as a foreign language are challenging tasks. Some automatic translation systems exist for this pair of languages, but there is enough room to improve the translation quality between Spanish and Chinese. In addition, the accessible sources, such as a parallel corpus for studying and understanding this language pair, are still few. In this paper, we present how we have created a Spanish-Chinese parallel corpus designed for language learning and translation tasks at the discourse level. This corpus has been enriched automatically with part-of-speech (POS) and several queries based on morpho-syntactic information can be realized. We have made available the parallel corpus to the academic community.

2017 ◽  
Author(s):  
AWEJ for Translation & Literary Studies ◽  
Zakaryia Mustafa Almahasees

Machine translation (MT) systems are widely used throughout the world freely or at low cost. The spread of MT entails a thorough analysis of translation produced by such translation systems. The present study evaluates the capacity of two MT systems-Google Translate and Microsoft Bing translator- in translation from Arabic into English of Khalil Gibran’s literary masterpiece - The Prophet (2000). The question that arises in the study is could we trust MT in the translation of literary masterpieces across languages and particularly from Arabic to English? How close does MT output to human translation? To conduct that, the study is adopted Bilingual Evaluation Understudy (BLEU) of Papineni (2000). MT output analysis showed that MT is not accurate, intelligible and natural in translating literary texts due to the difficulty of literary texts, as they are full of metaphors and cultural specifications. Besides, there are some linguistic errors: lexical, syntactic and misinformation. The study also found that both systems provided similar translation for the same input due to either the use of similar MT approach or learning from previous translated texts. Moreover, both systems in some instances, achieve good results at the word level, but bad results at collocation units. The study also showed that automatic translation is insufficient for providing a full analysis of MT output because all automatic metrics are misleading due to dependence on text similarity to a reference human translation. For future research, the study recommended conducting a correlative study that combines manual and automatic evaluation methods to ensure best analysis of MT output. Machine Translation (MT) is still far from reaching fully automatic translation of a quality obtained by human translators.


2021 ◽  
Vol 284 ◽  
pp. 08001
Author(s):  
Ilya Ulitkin ◽  
Irina Filippova ◽  
Natalia Ivanova ◽  
Alexey Poroykov

We report on various approaches to automatic evaluation of machine translation quality and describe three widely used methods. These methods, i.e. methods based on string matching and n-gram models, make it possible to compare the quality of machine translation to reference translation. We employ modern metrics for automatic evaluation of machine translation quality such as BLEU, F-measure, and TER to compare translations made by Google and PROMT neural machine translation systems with translations obtained 5 years ago, when statistical machine translation and rule-based machine translation algorithms were employed by Google and PROMT, respectively, as the main translation algorithms [6]. The evaluation of the translation quality of candidate texts generated by Google and PROMT with reference translation using an automatic translation evaluation program reveal significant qualitative changes as compared with the results obtained 5 years ago, which indicate a dramatic improvement in the work of the above-mentioned online translation systems. Ways to improve the quality of machine translation are discussed. It is shown that modern systems of automatic evaluation of translation quality allow errors made by machine translation systems to be identified and systematized, which will enable the improvement of the quality of translation by these systems in the future.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Aaron L.-F. Han ◽  
Derek F. Wong ◽  
Lidia S. Chao ◽  
Liangye He ◽  
Yi Lu

With the rapid development of machine translation (MT), the MT evaluation becomes very important to timely tell us whether the MT system makes any progress. The conventional MT evaluation methods tend to calculate the similarity between hypothesis translations offered by automatic translation systems and reference translations offered by professional translators. There are several weaknesses in existing evaluation metrics. Firstly, the designed incomprehensive factors result in language-bias problem, which means they perform well on some special language pairs but weak on other language pairs. Secondly, they tend to use no linguistic features or too many linguistic features, of which no usage of linguistic feature draws a lot of criticism from the linguists and too many linguistic features make the model weak in repeatability. Thirdly, the employed reference translations are very expensive and sometimes not available in the practice. In this paper, the authors propose an unsupervised MT evaluation metric using universal part-of-speech tagset without relying on reference translations. The authors also explore the performances of the designed metric on traditional supervised evaluation tasks. Both the supervised and unsupervised experiments show that the designed methods yield higher correlation scores with human judgments.


Author(s):  
Yana Fedorko ◽  
Tetiana Yablonskaya

The article is focused on peculiarities of English and Chinese political discourse translation into Ukrainian. The advantages and disadvantages of machine translation are described on the basis of linguistic analysis of online Google Translate and M-Translate systems. The reasons of errors in translation are identified and the need of post-correction to improve the quality of translation is wanted. Key words: political discourse, automatic translation, online machine translation systems, machine translation quality assessment.


2021 ◽  
Vol 1 (1) ◽  
pp. 124-133
Author(s):  
Ani Ananyan ◽  
Roza Avagyan

Along with the development and widespread dissemination of translation by artificial intelligence, it is becoming increasingly important to continuously evaluate and improve its quality and to use it as a tool for the modern translator. In our research, we compared five sentences translated from Armenian into Russian and English by Google Translator, Yandex Translator and two models of the translation system of the Armenian company Avromic to find out how effective these translation systems are when working in Armenian. It was necessary to find out how effective it would be to use them as a translation tool and in the learning process by further editing the translation. As there is currently no comprehensive and successful method of human metrics for machine translation, we have developed our own evaluation method and criteria by studying the world's most well-known methods of evaluation for automatic translation. We have used the post-editorial distance evaluation criterion as well. In the example of one sentence in the article, we have presented in detail the evaluation process according to the selected and developed criteria. At the end we have presented the results of the research and made appropriate conclusions.


2016 ◽  
Vol 1 (1) ◽  
pp. 45-49
Author(s):  
Avinash Singh ◽  
Asmeet Kour ◽  
Shubhnandan S. Jamwal

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.


Author(s):  
Iqra Muneer ◽  
Rao Muhammad Adeel Nawab

Cross-Lingual Text Reuse Detection (CLTRD) has recently attracted the attention of the research community due to a large amount of digital text readily available for reuse in multiple languages through online digital repositories. In addition, efficient machine translation systems are freely and readily available to translate text from one language into another, which makes it quite easy to reuse text across languages, and consequently difficult to detect it. In the literature, the most prominent and widely used approach for CLTRD is Translation plus Monolingual Analysis (T+MA). To detect CLTR for English-Urdu language pair, T+MA has been used with lexical approaches, namely, N-gram Overlap, Longest Common Subsequence, and Greedy String Tiling. This clearly shows that T+MA has not been thoroughly explored for the English-Urdu language pair. To fulfill this gap, this study presents an in-depth and detailed comparison of 26 approaches that are based on T+MA. These approaches include semantic similarity approaches (semantic tagger based approaches, WordNet-based approaches), probabilistic approach (Kullback-Leibler distance approach), monolingual word embedding-based approaches siamese recurrent architecture, and monolingual sentence transformer-based approaches for English-Urdu language pair. The evaluation was carried out using the CLEU benchmark corpus, both for the binary and the ternary classification tasks. Our extensive experimentation shows that our proposed approach that is a combination of 26 approaches obtained an F 1 score of 0.77 and 0.61 for the binary and ternary classification tasks, respectively, and outperformed the previously reported approaches [ 41 ] ( F 1 = 0.73) for the binary and ( F 1 = 0.55) for the ternary classification tasks) on the CLEU corpus.


2021 ◽  
pp. 1-10
Author(s):  
Zhiqiang Yu ◽  
Yuxin Huang ◽  
Junjun Guo

It has been shown that the performance of neural machine translation (NMT) drops starkly in low-resource conditions. Thai-Lao is a typical low-resource language pair of tiny parallel corpus, leading to suboptimal NMT performance on it. However, Thai and Lao have considerable similarities in linguistic morphology and have bilingual lexicon which is relatively easy to obtain. To use this feature, we first build a bilingual similarity lexicon composed of pairs of similar words. Then we propose a novel NMT architecture to leverage the similarity between Thai and Lao. Specifically, besides the prevailing sentence encoder, we introduce an extra similarity lexicon encoder into the conventional encoder-decoder architecture, by which the semantic information carried by the similarity lexicon can be represented. We further provide a simple mechanism in the decoder to balance the information representations delivered from the input sentence and the similarity lexicon. Our approach can fully exploit linguistic similarity carried by the similarity lexicon to improve translation quality. Experimental results demonstrate that our approach achieves significant improvements over the state-of-the-art Transformer baseline system and previous similar works.


2020 ◽  
Author(s):  
Alessandro Lopopolo ◽  
Antal van den Bosch

Neural decoding of speech and language refers to the extraction of information regarding the stimulus and the mental state of subjects from recordings of their brain activity while performing linguistic tasks. Recent years have seen significant progress in the decoding of speech from cortical activity. This study instead focuses on decoding linguistic information. We present a deep parallel temporal convolutional neural network (1DCNN) trained on part-of-speech (PoS) classification from magnetoencephalography (MEG) data collected during natural language reading. The network is trained on data from 15 human subjects separately, and yields above-chance accuracies on test data for all of them. The level of PoS was targeted because it offers a clean linguistic benchmark level that represents syntactic information and abstracts away from semantic or conceptual representations.


2014 ◽  
Vol 4 (2) ◽  
pp. 53-65 ◽  
Author(s):  
Kristina HMELJAK SANGAWA

Learning vocabulary is one of the most challenging tasks faced by learners with a non-kanji background when learning Japanese as a foreign language. However, learners are often not aware of the range of different aspects of word knowledge they need in order to successfully use Japanese. This includes not only the spoken and written form of a word and its meaning, but also morphological, grammatical, collocational, connotative and pragmatic knowledge as well as knowledge of social constraints to be observed. In this article, we present some background data on the use of dictionaries among students of Japanese at the University of Ljubljana, a selection of resources and a series of exercises developed with the following aims: a) to foster greater awareness of the different aspects of Japanese vocabulary, both from a monolingual and a contrastive perspective, b) to learn about tools and methods that can be applied in different contexts of language learning and language use, and c) to develop strategies for learning new vocabulary, reinforcing knowledge about known vocabulary, and effectively using this knowledge in receptive and productive language tasks.


Sign in / Sign up

Export Citation Format

Share Document