scholarly journals Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation

2010 ◽  
Vol 93 (1) ◽  
pp. 17-26 ◽  
Author(s):  
Yvette Graham

Sulis: An Open Source Transfer Decoder for Deep Syntactic Statistical Machine Translation In this paper, we describe an open source transfer decoder for Deep Syntactic Transfer-Based Statistical Machine Translation. Transfer decoding involves the application of transfer rules to a SL structure. The N-best TL structures are found via a beam search of TL hypothesis structures which are ranked via a log-linear combination of feature scores, such as translation model and dependency-based language model.


2006 ◽  
Vol 32 (4) ◽  
pp. 527-549 ◽  
Author(s):  
José B. Mariño ◽  
Rafael E. Banchs ◽  
Josep M. Crego ◽  
Adrià de Gispert ◽  
Patrik Lambert ◽  
...  

This article describes in detail an n-gram approach to statistical machine translation. This approach consists of a log-linear combination of a translation model based on n-grams of bilingual units, which are referred to as tuples, along with four specific feature functions. Translation performance, which happens to be in the state of the art, is demonstrated with Spanish-to-English and English-to-Spanish translations of the European Parliament Plenary Sessions (EPPS).



2004 ◽  
Vol 30 (4) ◽  
pp. 417-449 ◽  
Author(s):  
Franz Josef Och ◽  
Hermann Ney

A phrase-based statistical machine translation approach — the alignment template approach — is described. This translation approach allows for general many-to-many relations between words. Thereby, the context of words is taken into account in the translation model, and local changes in word order from source to target language can be learned explicitly. The model is described using a log-linear modeling approach, which is a generalization of the often used source-channel approach. Thereby, the model is easier to extend than classical statistical machine translation systems. We describe in detail the process for learning phrasal translations, the feature functions used, and the search algorithm. The evaluation of this approach is performed on three different tasks. For the German-English speech Verbmobil task, we analyze the effect of various system components. On the French-English Canadian Hansards task, the alignment template system obtains significantly better results than a single-word-based translation model. In the Chinese-English 2002 National Institute of Standards and Technology (NIST) machine translation evaluation it yields statistically significantly better NIST scores than all competing research and commercial translation systems.



Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Yanping Ye

At the level of English resource vocabulary, due to the lack of vocabulary alignment structure, the translation of neural machine translation has the problem of unfaithfulness. This paper proposes a framework that integrates vocabulary alignment structure for neural machine translation at the vocabulary level. Under the proposed framework, the neural machine translation decoder receives external vocabulary alignment information during each step of the decoding process to further alleviate the problem of missing vocabulary alignment structure. Specifically, this article uses the word alignment structure of statistical machine translation as the external vocabulary alignment information and introduces it into the decoding step of neural machine translation. The model is mainly based on neural machine translation, and the statistical machine translation vocabulary alignment structure is integrated on the basis of neural networks and continuous expression of words. In the model decoding stage, the statistical machine translation system provides appropriate vocabulary alignment information based on the decoding information of the neural machine translation and recommends vocabulary based on the vocabulary alignment information to guide the neural machine translation decoder to more accurately estimate its vocabulary in the target language. From the aspects of data processing methods and machine translation technology, experiments are carried out to compare the data processing methods based on language model and sentence similarity and the effectiveness of machine translation models based on fusion principles. Comparative experiment results show that the data processing method based on language model and sentence similarity effectively guarantees data quality and indirectly improves the algorithm performance of machine translation model; the translation effect of neural machine translation model integrated with statistical machine translation vocabulary alignment structure is compared with other models.



2016 ◽  
Vol 106 (1) ◽  
pp. 193-204
Author(s):  
Víctor M. Sánchez-Cartagena ◽  
Juan Antonio Pérez-Ortiz ◽  
Felipe Sánchez-Martínez

Abstract This paper presents ruLearn, an open-source toolkit for the automatic inference of rules for shallow-transfer machine translation from scarce parallel corpora and morphological dictionaries. ruLearn will make rule-based machine translation a very appealing alternative for under-resourced language pairs because it avoids the need for human experts to handcraft transfer rules and requires, in contrast to statistical machine translation, a small amount of parallel corpora (a few hundred parallel sentences proved to be sufficient). The inference algorithm implemented by ruLearn has been recently published by the same authors in Computer Speech & Language (volume 32). It is able to produce rules whose translation quality is similar to that obtained by using hand-crafted rules. ruLearn generates rules that are ready for their use in the Apertium platform, although they can be easily adapted to other platforms. When the rules produced by ruLearn are used together with a hybridisation strategy for integrating linguistic resources from shallow-transfer rule-based machine translation into phrase-based statistical machine translation (published by the same authors in Journal of Artificial Intelligence Research, volume 55), they help to mitigate data sparseness. This paper also shows how to use ruLearn and describes its implementation.



Author(s):  
Herry Sujaini

The statistical machine translation (SMT) is widely used by researchers and practitioners in recent years. SMT works with quality that is determined by several important factors, two of which are language and translation model. Research on improving the translation model has been done quite a lot, but the problem of optimizing the language model for use on machine translators has not received much attention. On translator machines, language models usually use trigram models as standard. In this paper, we conducted experiments with four strategies to analyze the role of the language model used in the Indonesian-Javanese translation machine and show improvement compared to the baseline system with the standard language model. The results of this research indicate that the use of 3-gram language models is highly recommended in SMT.



2020 ◽  
Vol 4 (3) ◽  
pp. 519
Author(s):  
Permata Permata ◽  
Zaenal Abidin

In this research, automatic translation of the Lampung dialect into Indonesian was carried out using the statistical machine translation (SMT) approach. Translation of the Lampung language to Indonesian can be done by using a dictionary. Another alternative is to use the Lampung parallel body corpus and its translation in Indonesian with the SMT approach. The SMT approach is carried out in several phases. Starting from the pre-processing phase which is the initial stage to prepare a parallel corpus. Then proceed with the training phase, namely the parallel corpus processing phase to obtain a language model and translation model. Then the testing phase, and ends with the evaluation phase. SMT testing uses 25 single sentences without out-of-vocabulary (OOV), 25 single sentences with OOV, 25 compound sentences without OOV and 25 compound sentences with OOV. The results of testing the translation of Lampung sentences into Indonesian shows the accuracy of the Bilingual Evaluation Undestudy (BLEU) obtained is 77.07% in 25 single sentences without out-of-vocabulary (OOV), 72.29% in 25 single sentences with OOV, 79.84% at 25 compound sentences without OOV and 80.84% at 25 compound sentences with OOV.



2003 ◽  
Vol 29 (1) ◽  
pp. 97-133 ◽  
Author(s):  
Christoph Tillmann ◽  
Hermann Ney

In this article, we describe an efficient beam search algorithm for statistical machine translation based on dynamic programming (DP). The search algorithm uses the translation model presented in Brown et al. (1993). Starting from a DP-based solution to the traveling-salesman problem, we present a novel technique to restrict the possible word reorderings between source and target language in order to achieve an efficient search algorithm. Word reordering restrictions especially useful for the translation direction German to English are presented. The restrictions are generalized, and a set of four parameters to control the word reordering is introduced, which then can easily be adopted to new translation directions. The beam search procedure has been successfully tested on the Verbmobil task (German to English, 8,000-word vocabulary) and on the Canadian Hansards task (French to English, 100,000-word vocabulary). For the medium-sized Verbmobil task, a sentence can be translated in a few seconds, only a small number of search errors occur, and there is no performance degradation as measured by the word error criterion used in this article.



2017 ◽  
Vol 26 (1) ◽  
pp. 65-72 ◽  
Author(s):  
Jinsong Su ◽  
Zhihao Wang ◽  
Qingqiang Wu ◽  
Junfeng Yao ◽  
Fei Long ◽  
...  


2014 ◽  
Vol 102 (1) ◽  
pp. 69-80 ◽  
Author(s):  
Torregrosa Daniel ◽  
Forcada Mikel L. ◽  
Pérez-Ortiz Juan Antonio

Abstract We present a web-based open-source tool for interactive translation prediction (ITP) and describe its underlying architecture. ITP systems assist human translators by making context-based computer-generated suggestions as they type. Most of the ITP systems in literature are strongly coupled with a statistical machine translation system that is conveniently adapted to provide the suggestions. Our system, however, follows a resource-agnostic approach and suggestions are obtained from any unmodified black-box bilingual resource. This paper reviews our ITP method and describes the architecture of Forecat, a web tool, partly based on the recent technology of web components, that eases the use of our ITP approach in any web application requiring this kind of translation assistance. We also evaluate the performance of our method when using an unmodified Moses-based statistical machine translation system as the bilingual resource.



Sign in / Sign up

Export Citation Format

Share Document