Translational Mismatches Involving Clitics (Illustrated from Serbian ~ Catalan Language Pair)

Author(s):  
Jasmina Milićević ◽  
Àngels Catena

Translation of sentences featuring clitics often poses a problem to machine translation systems. In this chapter, we illustrate, on the material from a Serbian ~ Catalan parallel corpus, a rule-based approach to solving translational structural mismatches between linguistic representations that underlie source- and target language sentences containing clitics. Unlike most studies in this field, which make use of phrase structure formalisms, ours has been conducted within the dependency framework of the Meaning-Text linguistic theory. We start by providing a brief description of Catalan and Serbian clitic systems, then introduce the basics of our framework to finally illustrate Serbian ~ Catalan translational mismatches involving the operations of clitic doubling, clitic climbing, and clitic possessor raising.

2012 ◽  
Vol 3 (4) ◽  
pp. 66-74 ◽  
Author(s):  
Sanjay K. Dwivedi ◽  
Pramod P. Sukhadeve

The rule based approach to machine translation (MT) confines grammatical rules between the source and the target language with the goal of constructing grammatical translation between the language pair. In this paper, we describe the structural representation of English stemmer, POS tagging and design transfer rules which can generate Hindi sentence from the structural representation of the English sentence. Due to the specific terminology of homoeopathic sentences and the linguistic gap between the two languages the translation of these literatures form English to Hindi is a challenging task. The rule sets are used to plug gap between the two languages. Further, rule sets are described for mapping preposition verbs, nouns, etc. Finally, a system architecture has been proposed for the translation of homoeopathy literature from English to Hindi Language.The system accuracy has been evaluated using Bleu score, which is found out to be 0.7501 and the accuracy percentage of the system is 82.23%.


2018 ◽  
Vol 7 (4.36) ◽  
pp. 542
Author(s):  
T. K. Bijimol ◽  
John T. Abraham

Malayalam is one of the Indian languages and it is a highly agglutinative and morphologically rich. These linguistic specialties of Malayalam determine the quality of all kinds of Malayalam machine translation systems. Causative sentences translations in Malayalam to English and English to Malayalam were analysed using Google Translation System and identified that causative sentence translation in these languages is not up to the mark. This paper discusses the concept and method of causative sentence handling in Malayalam to English and English to Malayalam Machine Translation Systems. A Rule-based system is proposed here to handle the causative sentence in both languages.  


2018 ◽  
Vol 34 (4) ◽  
pp. 752-771
Author(s):  
Chen-li Kuo

Abstract Statistical approaches have become the mainstream in machine translation (MT), for their potential in producing less rigid and more natural translations than rule-based approaches. However, on closer examination, the uses of function words between statistical machine-translated Chinese and the original Chinese are different, and such differences may be associated with translationese as discussed in translation studies. This article examines the distribution of Chinese function words in a comparable corpus consisting of MTs and the original Chinese texts extracted from Wikipedia. An attribute selection technique is used to investigate which types of function words are significant in discriminating between statistical machine-translated Chinese and the original texts. The results show that statistical MT overuses the most frequent function words, even when alternatives exist. To improve the quality of the end product, developers of MT should pay close attention to modelling Chinese conjunctions and adverbial function words. The results also suggest that machine-translated Chinese shares some characteristics with human-translated texts, including normalization and being influenced by the source language; however, machine-translated texts do not exhibit other characteristics of translationese such as explicitation.


2012 ◽  
Vol 13 (1) ◽  
pp. 79-86
Author(s):  
Huda Alhusain Hebresha ◽  
Mohd Juzaiddin Ab Aziz

Author(s):  
Karunesh Kumar Arora ◽  
Shyam Sunder Agrawal

English and Hindi have significantly different word orders. English follows the subject-verb-object (SVO) order, while Hindi primarily follows the subject-object-verb (SOV) order. This difference poses challenges to modeling this pair of languages for translation. In phrase-based translation systems, word reordering is governed by the language model, the phrase table, and reordering models. Reordering in such systems is generally achieved during decoding by transposing words within a defined window. These systems can handle local reorderings, and while some phrase-level reorderings are carried out during the formation of phrases, they are weak in learning long-distance reorderings. To overcome this weakness, researchers have used reordering as a step in pre-processing to render the reordered source sentence closer to the target language in terms of word order. Such approaches focus on using parts-of-speech (POS) tag sequences and reordering the syntax tree by using grammatical rules, or through head finalization. This study shows that mere head finalization is not sufficient for the reordering of sentences in the English-Hindi language pair. It describes various grammatical constructs and presents a comparative evaluation of reorderings with the original and the head-finalized representations. The impact of the reordering on the quality of translation is measured through the BLEU score in phrase-based statistical systems and neural machine translation systems. A significant gain in BLEU score was noted for reorderings in different grammatical constructs.


Author(s):  
Arwa Hatem Alqudsi ◽  
Nazlia Omar ◽  
Rabha W. Ibrahim

<p><strong> </strong>It is practically impossible for pure machine translation approach to process all of translation problems; however, Rule Based Machine Translation and Statistical Machine translation (RBMT and SMT) use different architectures for performing translation task. Lexical analyser and syntactic analyser are solved by Rule Based and some amount of ambiguity is left to be solved by Expectation–Maximization (EM) algorithm, which is an iterative statistic algorithm for finding maximum likelihood. In this paper we have proposed an integrated Hybrid Machine Translation (HMT) system. The goal is to combine the best properties of each approach. Initially, Arabic text is keyed into RBMT; then the output will be edited by EM algorithm to generate the final translation of English text. As we have seen in previous works, the performance and enhancement of EM algorithm, the key of EM algorithm performance is the ability to accurately transform a frequency from one language to another. Results showing that, as proved by BLEU system, the proposed method can substantially outperform standard Rule Based approach and EM algorithm in terms of frequency and accuracy. The results of this study have been showed that the score of HMT system is higher than SMT system in all cases. When combining two approaches, HMT outperformed SMT in Bleu score.</p>


2017 ◽  
Vol 108 (1) ◽  
pp. 221-232
Author(s):  
Francis M. Tyers ◽  
Hèctor Alòs i Font ◽  
Gianfranco Fronteddu ◽  
Adrià Martín-Mor

AbstractThis paper describes the process of creation of the first machine translation system from Italian to Sardinian, a Romance language spoken on the island of Sardinia in the Mediterranean. The project was carried out by a team of translators and computational linguists. The article focuses on the technology used (Rule-Based Machine Translation) and on some of the rules created, as well as on the orthographic model used for Sardinian.


Sign in / Sign up

Export Citation Format

Share Document