Otedama: Fast Rule-Based Pre-Ordering for Machine Translation

Abstract We present Otedama, a fast, open-source tool for rule-based syntactic pre-ordering, a well established technique in statistical machine translation. Otedama implements both a learner for pre-ordering rules, as well as a component for applying these rules to parsed sentences. Our system is compatible with several external parsers and capable of accommodating many source and all target languages in any machine translation paradigm which uses parallel training data. We demonstrate improvements on a patent translation task over a state-of-the-art English-Japanese hierarchical phrase-based machine translation system. We compare Otedama with an existing syntax-based pre-ordering system, showing comparable translation performance at a runtime speedup of a factor of 4.5-10.

Download Full-text

An Open-Source Web-Based Tool for Resource-Agnostic Interactive Translation Prediction

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0015 ◽

2014 ◽

Vol 102 (1) ◽

pp. 69-80 ◽

Cited By ~ 2

Author(s):

Torregrosa Daniel ◽

Forcada Mikel L. ◽

Pérez-Ortiz Juan Antonio

Keyword(s):

Open Source ◽

Machine Translation ◽

Web Application ◽

Statistical Machine Translation ◽

Black Box ◽

Translation System ◽

Web Tool ◽

Web Based ◽

Strongly Coupled ◽

Machine Translation System

Abstract We present a web-based open-source tool for interactive translation prediction (ITP) and describe its underlying architecture. ITP systems assist human translators by making context-based computer-generated suggestions as they type. Most of the ITP systems in literature are strongly coupled with a statistical machine translation system that is conveniently adapted to provide the suggestions. Our system, however, follows a resource-agnostic approach and suggestions are obtained from any unmodified black-box bilingual resource. This paper reviews our ITP method and describes the architecture of Forecat, a web tool, partly based on the recent technology of web components, that eases the use of our ITP approach in any web application requiring this kind of translation assistance. We also evaluate the performance of our method when using an unmodified Moses-based statistical machine translation system as the bilingual resource.

Download Full-text

Joshua 6: A phrase-based and hierarchical statistical machine translation system

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2015-0009 ◽

2015 ◽

Vol 104 (1) ◽

pp. 5-16 ◽

Cited By ~ 1

Author(s):

Matt Post ◽

Yuan Cao ◽

Gaurav Kumar

Keyword(s):

Open Source ◽

Machine Translation ◽

Large Scale ◽

Statistical Machine Translation ◽

End Users ◽

Translation System ◽

Tight Coupling ◽

Single Function ◽

Black Boxes ◽

Machine Translation System

Abstract We describe the version six release of Joshua, an open-source statistical machine translation toolkit. The main difference from release five is the introduction of a simple, unlexicalized, phrase-based stack decoder. This phrase-based decoder shares a hypergraph format with the syntax-based systems, permitting a tight coupling with the existing codebase of feature functions and hypergraph tools. Joshua 6 also includes a number of large-scale discriminative tuners and a simplified sparse feature function interface with reflection-based loading, which allows new features to be used by writing a single function. Finally, Joshua includes a number of simplifications and improvements focused on usability for both researchers and end-users, including the release of language packs — precompiled models that can be run as black boxes.

Download Full-text

State-of-the-art English to Persian Statistical Machine Translation system

The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012) ◽

10.1109/aisp.2012.6313739 ◽

2012 ◽

Cited By ~ 6

Author(s):

Amin Mansouri ◽

Heshaam Faili

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Machine Translation System

Download Full-text

Vélþýðingar á íslensku og Apertium-þýðingarkerfið

Orð og tunga ◽

10.33112/ordogtunga.18.8 ◽

2016 ◽

Vol 18 ◽

pp. 131-143

Author(s):

Ingibjörg Elsa Björnsdóttir

Keyword(s):

Open Source ◽

Machine Translation ◽

Rapid Development ◽

Translation System ◽

Rule Based ◽

Language Technology ◽

Translation Rule ◽

Machine Translation System

There has been rapid development in language technology and machine translation in recent decades. There are three main types of machine translation: statistical ma-chine translation, rule-based machine translation, and example-based machine translation. In this article the Apertium machine translation system is discussed in particular. While Apertium was originally designed to translate between closely related languages, it can now handle languages that are much more different and variable in structure. Anyone can participate in the development of the Apertium system since it is an open source soft ware. Thus Apertium is one of the best options available in order to research and develop a machine translation system for Icelandic. The Apertium system has an easy-to-use interface, and it translates almost instantly from Icelandic into English or Swedish. However, the system still has certain limitations as regards vocabulary and ambiguity.

Download Full-text

Matxin, an open-source rule-based machine translation system for Basque

Machine Translation ◽

10.1007/s10590-011-9092-y ◽

2011 ◽

Vol 25 (1) ◽

pp. 53-82 ◽

Cited By ~ 8

Author(s):

Aingeru Mayor ◽

Iñaki Alegria ◽

Arantza Díaz de Ilarraza ◽

Gorka Labaka ◽

Mikel Lersundi ◽

...

Keyword(s):

Open Source ◽

Machine Translation ◽

Translation System ◽

Rule Based ◽

Machine Translation System

Download Full-text

Phrase Table Induction Using In-Domain Monolingual Data for Domain Adaptation in Statistical Machine Translation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00075 ◽

2017 ◽

Vol 5 ◽

pp. 487-500

Author(s):

Benjamin Marie ◽

Atsushi Fujita

Keyword(s):

Machine Translation ◽

Domain Adaptation ◽

Statistical Machine Translation ◽

Cartesian Product ◽

Translation System ◽

General Domain ◽

Parallel Data ◽

Machine Translation System ◽

Baseline System ◽

Target Languages

We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain. Our method first compiles sets of phrases in source and target languages separately and generates candidate phrase pairs by taking the Cartesian product of the two phrase sets. It then computes inexpensive features for each candidate phrase pair and filters them using a supervised classifier in order to induce an in-domain phrase table. We experimented on the language pair English–French, both translation directions, in two domains and obtained consistently better results than a strong baseline system that uses an in-domain bilingual lexicon. We also conducted an error analysis that showed the induced phrase tables proposed useful translations, especially for words and phrases unseen in the parallel data used to train the general-domain baseline system.

Download Full-text

Česílko Goes Open-source

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2017-0004 ◽

2017 ◽

Vol 107 (1) ◽

pp. 57-66

Author(s):

Jernej Vičič ◽

Vladislav Kuboň ◽

Petr Homola

Keyword(s):

Open Source ◽

Machine Translation ◽

Translation System ◽

Source Language ◽

Machine Translation System ◽

Target Languages

Abstract The Machine Translation system Česílko has been developed as an answer to a growing need of translation and localization from one source language to many target languages. The system belongs to the shallow parse, shallow transfer RBMT paradigm and it is designed primarily for translation of related languages. The paper presents the architecture, the development design and the basic installation instructions of the translation system.

Download Full-text

English to Kurdish Rule-based Machine Translation System

UHD Journal of Science and Technology ◽

10.21928/uhdjst.v2n2y2018.pp32-39 ◽

2018 ◽

Vol 2 (2) ◽

pp. 32

Author(s):

Kanaan Mikael Kaka-Khan

Keyword(s):

Open Source ◽

Machine Translation ◽

Translation System ◽

Simple Sentence ◽

Rule Based ◽

Ongoing Effort ◽

Machine Translation System ◽

Compound Sentence ◽

Free Open Source

In this paper we present a machine translation system developed to translate simple English sentences to Kurdish. The system is based on the (apertuim) free open source engine that provides the environment and the required tools to develop a machine translation system. The developed system is used to translate some as simple sentence, compound sentence, phrases and idioms from English to Kurdish. The resulting translation is then evaluated manually for accuracy and completeness compared to the result produced by the popular (inKurdish) English to Kurdish machine translation system. The result shows that our system is more accurate than inkurdish system. This paper contributes towards the ongoing effort to achieve full machine-based translation in general and English to Kurdish machine translation in specific.

Download Full-text

A Rule Based Approach for Japanese-Uyghur Machine Translation System

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2014010104 ◽

2014 ◽

Vol 6 (1) ◽

pp. 56-69 ◽

Cited By ~ 1

Author(s):

Maimitili Nimaiti ◽

Yamamoto Izumi

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Experimental Result ◽

Rule Base ◽

Translation System ◽

Base System ◽

Rule Based ◽

Machine Translation System ◽

Word Translation ◽

Rule Based Approach

Japanese Uyghur machine translation system has been designed and developed using recent rule based approach. Even though Japanese and Uyghur language has many similarities, but there are also some linguistic differences cause serious problems to the word for word translation. In fact, as straightforward word-for-word Japanese-Uighur translation sometimes yields unnatural Uighur sentences. To raise the translation accuracy, the authors propose a word-for-word translation system using subject verb agreement in Uighur. After a brief introduction to the comparative study of Japanese-Uyghur grammars, morphology and syntax, the authors explain their developing of a word to word rule base system. The coverage of this rule base system, the rules for translation, comparison of experimental result between statistical machine translation system and rule base machine translation system are explained. Some practical suffix translation methods solving problems in Uyghur language are also proposed.

Download Full-text

Hybrid Translation with Classification: Revisiting Rule-Based and Neural Machine Translation

Electronics ◽

10.3390/electronics9020201 ◽

2020 ◽

Vol 9 (2) ◽

pp. 201

Author(s):

Jin-Xia Huang ◽

Kyung-Soon Lee ◽

Young-Kil Kim

Keyword(s):

Machine Translation ◽

Classification Accuracy ◽

Training Data ◽

Translation System ◽

Rule Based ◽

Neural Machine Translation ◽

Machine Translation System ◽

Text Classifiers ◽

Hybrid Machine Translation ◽

Translation Accuracy

This paper proposes a hybrid machine-translation system that combines neural machine translation with well-developed rule-based machine translation to utilize the stability of the latter to compensate for the inadequacy of neural machine translation in rare-resource domains. A classifier is introduced to predict which translation from the two systems is more reliable. We explore a set of features that reflect the reliability of translation and its process, and training data is automatically expanded with a small, human-labeled dataset to solve the insufficient-data problem. A series of experiments shows that the hybrid system’s translation accuracy is improved, especially in out-of-domain translations, and classification accuracy is greatly improved when using the proposed features and the automatically constructed training set. A comparison between feature- and text-based classification is also performed, and the results show that the feature-based model achieves better classification accuracy, even when compared to neural network text classifiers.

Download Full-text