Morphology in Machine Translation Systems: Efficient Integration of Finite State Transducers and Feature Structure Descriptions

Finite-state transducers are models that are being used in different areas of pattern recognition and computational linguistics. One of these areas is machine translation, in which the approaches that are based on building models automatically from training examples are becoming more and more attractive. Finite-state transducers are very adequate for use in constrained tasks in which training samples of pairs of sentences are available. A technique for inferring finite-state transducers is proposed in this article. This technique is based on formal relations between finite-state transducers and rational grammars. Given a training corpus of source-target pairs of sentences, the proposed approach uses statistical alignment methods to produce a set of conventional strings from which a stochastic rational grammar (e.g., an n-gram) is inferred. This grammar is finally converted into a finite-state transducer. The proposed methods are assessed through a series of machine translation experiments within the framework of the E u Trans project.

Download Full-text

Statistical Approaches to Computer-Assisted Translation

Computational Linguistics ◽

10.1162/coli.2008.07-055-r2-06-29 ◽

2009 ◽

Vol 35 (1) ◽

pp. 3-28 ◽

Cited By ~ 59

Author(s):

Sergio Barrachina ◽

Oliver Bender ◽

Francisco Casacuberta ◽

Jorge Civera ◽

Elsa Cubel ◽

...

Keyword(s):

European Union ◽

Iterative Process ◽

Computer Assisted ◽

Search Process ◽

The European Union ◽

Translation Process ◽

Finite State Transducers ◽

Finite State ◽

Statistical Approaches ◽

Translation Systems

Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions): English-Spanish, English-German, and English-French.

Download Full-text

An Investigation into Methodology and Metrics Employed to Evaluate the (Speech-to-Speech) Way in Translation Systems

Modern Applied Science ◽

10.5539/mas.v11n4p55 ◽

2017 ◽

Vol 11 (4) ◽

pp. 55

Author(s):

Parnyan Bahrami Dashtaki

Keyword(s):

Speech Recognition ◽

Machine Translation ◽

Automatic Speech Recognition ◽

Speech Synthesis ◽

Translation System ◽

Speech Translation ◽

Pattern Recognition Problem ◽

Finite State ◽

Training Examples ◽

Translation Systems

Speech-to-speech translation is a challenging problem, due to poor sentence planning typically associated with spontaneous speech, as well as errors caused by automatic speech recognition. Based upon a statistically trained speech translation system, in this study, we try to investigate methodologies and metrics employed to assess the (speech-to-speech) way in translation systems. The speech translation is performed incrementally based on generation of partial hypotheses from speech recognition. Speech-input translation can be properly approached as a pattern recognition problem by means of statistical alignment models and stochastic finite-state transducers. Under this general framework, some specific models are presented. One of the features of such models is their capability of automatically learning from training examples. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. In this research, we want explore methodologies and metrics employed to assess the (speech-to-speech) way in translation systems.

Download Full-text

A phrase-level machine translation approach for disfluency detection using weighted finite state transducers

10.21437/interspeech.2006-262 ◽

2006 ◽

Author(s):

Sameer Maskey ◽

Bowen Zhou ◽

Yuqing Gao

Keyword(s):

Machine Translation ◽

Finite State Transducers ◽

Finite State ◽

Weighted Finite State Transducers

Download Full-text

Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development

Prague Bulletin of Mathematical Linguistics ◽

10.2478/v10108-010-0015-5 ◽

2010 ◽

Vol 93 (1) ◽

pp. 67-76 ◽

Cited By ~ 8

Author(s):

Francis Tyers ◽

Felipe Sánchez-Martínez ◽

Sergio Ortiz-Rojas ◽

Mikel Forcada

Keyword(s):

Research And Development ◽

Open Source ◽

Machine Translation ◽

Morphological Analysis ◽

Translation Research ◽

Part Of Speech ◽

Open Source Framework ◽

Finite State ◽

Translation Systems ◽

Free Open Source

Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and DevelopmentThis paper describes the resources available in the Apertium platform, a free/open-source framework for creating rule-based machine translation systems. Resources within the platform take the form of finite-state morphologies for morphological analysis and generation, bilingual transfer lexica, probabilistic part-of-speech taggers and transfer rule files, all in standardised formats. These resources are described and some examples are given of their reuse and recycling in combination with other machine translation systems.

Download Full-text

Rewriting the orthography of SMS messages

Natural Language Engineering ◽

10.1017/s1351324909990258 ◽

2010 ◽

Vol 16 (2) ◽

pp. 133-159 ◽

Cited By ~ 5

Author(s):

FRANÇOIS YVON

Keyword(s):

Machine Translation ◽

Statistical Machine Translation ◽

Finite State Transducers ◽

Written Texts ◽

Computer Mediated ◽

Translation Techniques ◽

Finite State ◽

Weighted Finite State Transducers ◽

And Training

AbstractElectronic written texts used in computer-mediated interactions (emails, blogs, chats, and the like) contain significant deviations from the norm of the language. This paper presents the detail of a system aiming at normalizing the orthography of French SMS messages: after discussing the linguistic peculiarities of these messages and possible approaches to their automatic normalization, we present, compare, and evaluate various instanciations of a normalization device based on weighted finite-state transducers. These experiments show that using an intermediate phonemic representation and training, our system outperforms an alternative normalization system based on phrase-based statistical machine translation techniques.

Download Full-text

Machine Translation Systems Analysis and Development Prospects

2020 International Multi-Conference on Industrial Engineering and Modern Technologies (FarEastCon) ◽

10.1109/fareastcon50210.2020.9271249 ◽

2020 ◽

Author(s):

A. A. Zhivotova ◽

V. D. Berdonosov ◽

E. V. Redkolis

Keyword(s):

Machine Translation ◽

Systems Analysis ◽

Translation Systems ◽

Development Prospects

Download Full-text

Composition of weighted finite transducers in MapReduce

Journal Of Big Data ◽

10.1186/s40537-020-00397-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Bilal Elghadyry ◽

Faissal Ouardi ◽

Sébastien Verel

Keyword(s):

Speech Processing ◽

Large Scale ◽

Large Scale Data ◽

Finite State Transducers ◽

Wide Range ◽

Finite State ◽

Common Operation ◽

Efficient Representation ◽

Weighted Finite State Transducers ◽

Np Hardness

AbstractWeighted finite-state transducers have been shown to be a general and efficient representation in many applications such as text and speech processing, computational biology, and machine learning. The composition of weighted finite-state transducers constitutes a fundamental and common operation between these applications. The NP-hardness of the composition computation problem presents a challenge that leads us to devise efficient algorithms on a large scale when considering more than two transducers. This paper describes a parallel computation of weighted finite transducers composition in MapReduce framework. To the best of our knowledge, this paper is the first to tackle this task using MapReduce methods. First, we analyze the communication cost of this problem using Afrati et al. model. Then, we propose three MapReduce methods based respectively on input alphabet mapping, state mapping, and hybrid mapping. Finally, intensive experiments on a wide range of weighted finite-state transducers are conducted to compare the proposed methods and show their efficiency for large-scale data.

Download Full-text