English-Cebuano Parallel Language Resource for Statistical Machine Translation System

English-Dogri Translation System using MOSES

Circulation in Computer Science ◽

10.22632/ccs-2016-251-25 ◽

2016 ◽

Vol 1 (1) ◽

pp. 45-49

Author(s):

Avinash Singh ◽

Asmeet Kour ◽

Shubhnandan S. Jamwal

Keyword(s):

Natural Language Processing ◽

Machine Translation ◽

Language Processing ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpus ◽

English System ◽

Machine Translation System ◽

Translation Machine ◽

Language Pair

The objective behind this paper is to analyze the English-Dogri parallel corpus translation. Machine translation is the translation from one language into another language. Machine translation is the biggest application of the Natural Language Processing (NLP). Moses is statistical machine translation system allow to train translation models for any language pair. We have developed translation system using Statistical based approach which helps in translating English to Dogri and vice versa. The parallel corpus consists of 98,973 sentences. The system gives accuracy of 80% in translating English to Dogri and the system gives accuracy of 87% in translating Dogri to English system.

Download Full-text

An Open-Source Web-Based Tool for Resource-Agnostic Interactive Translation Prediction

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0015 ◽

2014 ◽

Vol 102 (1) ◽

pp. 69-80 ◽

Cited By ~ 2

Author(s):

Torregrosa Daniel ◽

Forcada Mikel L. ◽

Pérez-Ortiz Juan Antonio

Keyword(s):

Open Source ◽

Machine Translation ◽

Web Application ◽

Statistical Machine Translation ◽

Black Box ◽

Translation System ◽

Web Tool ◽

Web Based ◽

Strongly Coupled ◽

Machine Translation System

Abstract We present a web-based open-source tool for interactive translation prediction (ITP) and describe its underlying architecture. ITP systems assist human translators by making context-based computer-generated suggestions as they type. Most of the ITP systems in literature are strongly coupled with a statistical machine translation system that is conveniently adapted to provide the suggestions. Our system, however, follows a resource-agnostic approach and suggestions are obtained from any unmodified black-box bilingual resource. This paper reviews our ITP method and describes the architecture of Forecat, a web tool, partly based on the recent technology of web components, that eases the use of our ITP approach in any web application requiring this kind of translation assistance. We also evaluate the performance of our method when using an unmodified Moses-based statistical machine translation system as the bilingual resource.

Download Full-text

Building a statistical machine translation system for French using the Europarl corpus

10.3115/1626355.1626380 ◽

2007 ◽

Cited By ~ 1

Author(s):

Holger Schwenk

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation System ◽

Machine Translation System

Download Full-text

Translation of Medical Texts using Neural Networks

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2016100104 ◽

2016 ◽

Vol 5 (4) ◽

pp. 51-66 ◽

Cited By ~ 5

Author(s):

Krzysztof Wolk ◽

Krzysztof P. Marasek

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

European Medicines Agency ◽

Translation System ◽

Training Methods ◽

Neural Machine Translation ◽

Machine Translation System ◽

Source Sentence ◽

Parallel Text ◽

Translation Systems

The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. A comparison and implementation of a medical translator is the main focus of our experiments.

Download Full-text

Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics ◽

10.1162/089120105775299168 ◽

2005 ◽

Vol 31 (4) ◽

pp. 477-504 ◽

Cited By ~ 104

Author(s):

Dragos Stefan Munteanu ◽

Daniel Marcu

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation System ◽

Parallel Corpora ◽

Parallel Corpus ◽

Scarce Resources ◽

Parallel Data ◽

Machine Translation System ◽

Novel Method ◽

Arabic And English

We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system. We also show that a good-quality MT system can be built from scratch by starting with a very small parallel corpus (100,000 words) and exploiting a large non-parallel corpus. Thus, our method can be applied with great benefit to language pairs for which only scarce resources are available.

Download Full-text

Consensus Network Decoding for Statistical Machine Translation System Combination

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 ◽

10.1109/icassp.2007.367174 ◽

2007 ◽

Cited By ~ 23

Author(s):

K. C. Sim ◽

W. J. Byrne ◽

M. J. F. Gales ◽

H. Sahbi ◽

P. C. Woodland

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Translation System ◽

Consensus Network ◽

System Combination ◽

Machine Translation System

Download Full-text

Joshua 6: A phrase-based and hierarchical statistical machine translation system

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2015-0009 ◽

2015 ◽

Vol 104 (1) ◽

pp. 5-16 ◽

Cited By ~ 1

Author(s):

Matt Post ◽

Yuan Cao ◽

Gaurav Kumar

Keyword(s):

Open Source ◽

Machine Translation ◽

Large Scale ◽

Statistical Machine Translation ◽

End Users ◽

Translation System ◽

Tight Coupling ◽

Single Function ◽

Black Boxes ◽

Machine Translation System

Abstract We describe the version six release of Joshua, an open-source statistical machine translation toolkit. The main difference from release five is the introduction of a simple, unlexicalized, phrase-based stack decoder. This phrase-based decoder shares a hypergraph format with the syntax-based systems, permitting a tight coupling with the existing codebase of feature functions and hypergraph tools. Joshua 6 also includes a number of large-scale discriminative tuners and a simplified sparse feature function interface with reflection-based loading, which allows new features to be used by writing a single function. Finally, Joshua includes a number of simplifications and improvements focused on usability for both researchers and end-users, including the release of language packs — precompiled models that can be run as black boxes.

Download Full-text

State-of-the-art English to Persian Statistical Machine Translation system

The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012) ◽

10.1109/aisp.2012.6313739 ◽

2012 ◽

Cited By ~ 6

Author(s):

Amin Mansouri ◽

Heshaam Faili

Keyword(s):

Machine Translation ◽

State Of The Art ◽

Statistical Machine Translation ◽

Translation System ◽

Machine Translation System

Download Full-text

Evaluating Machine Translation Quality Using Short Segments Annotations

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2015-0005 ◽

2015 ◽

Vol 103 (1) ◽

pp. 85-110

Author(s):

Matouš Macháček ◽

Ondřej Bojar

Keyword(s):

Machine Translation ◽

Evaluation Method ◽

Statistical Machine Translation ◽

Translation System ◽

Translation Quality ◽

Machine Translation System

Abstract We propose a manual evaluation method for machine translation (MT), in which annotators rank only translations of short segments instead of whole sentences. This results in an easier and more efficient annotation. We have conducted an annotation experiment and evaluated a set of MT systems using this method. The obtained results are very close to the official WMT14 evaluation results. We also use the collected database of annotations to automatically evaluate new, unseen systems and to tune parameters of a statistical machine translation system. The evaluation of unseen systems, however, does not work and we analyze the reasons

Download Full-text

Predicting and Using a Pragmatic Component of Lexical Aspect

Linguistic Issues in Language Technology ◽

10.33011/lilt.v13i.1389 ◽

2016 ◽

Vol 13 ◽

Author(s):

Sharid Loáiciga ◽

Cristina Grisot

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Target Language ◽

Translation System ◽

Automatic Annotation ◽

Parallel Corpora ◽

Linguistic Data ◽

Lexical Aspect ◽

Machine Translation System ◽

Automatic Systems

This paper proposes a method for improving the results of a statistical Machine Translation system using boundedness, a pragmatic component of the verbal phrase’s lexical aspect. First, the paper presents manual and automatic annotation experiments for lexical aspect in English-French parallel corpora. It will be shown that this aspectual property is identified and classified with ease both by humans and by automatic systems. Second, Statistical Machine Translation experiments using the boundedness annotations are presented. These experiments show that the information regarding lexical aspect is useful to improve the output of a Machine Translation system in terms of better choices of verbal tenses in the target language, as well as better lexical choices. Ultimately, this work aims at providing a method for the automatic annotation of data with boundedness information and at contributing to Machine Translation by taking into account linguistic data.

Download Full-text