scholarly journals An analysis of machine translation and speech synthesis in speech-to-speech translation system

Author(s):  
Kei Hashimoto ◽  
Junichi Yamagishi ◽  
William Byrne ◽  
Simon King ◽  
Keiichi Tokuda
2012 ◽  
Vol 54 (7) ◽  
pp. 857-866 ◽  
Author(s):  
Kei Hashimoto ◽  
Junichi Yamagishi ◽  
William Byrne ◽  
Simon King ◽  
Keiichi Tokuda

2017 ◽  
Vol 11 (4) ◽  
pp. 55
Author(s):  
Parnyan Bahrami Dashtaki

Speech-to-speech translation is a challenging problem, due to poor sentence planning typically associated with spontaneous speech, as well as errors caused by automatic speech recognition. Based upon a statistically trained speech translation system, in this study, we try to investigate methodologies and metrics employed to assess the (speech-to-speech) way in translation systems. The speech translation is performed incrementally based on generation of partial hypotheses from speech recognition. Speech-input translation can be properly approached as a pattern recognition problem by means of statistical alignment models and stochastic finite-state transducers. Under this general framework, some specific models are presented. One of the features of such models is their capability of automatically learning from training examples. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. In this research, we want explore methodologies and metrics employed to assess the (speech-to-speech) way in translation systems.


Author(s):  
Ryo Fukuda ◽  
Sashi Novitasari ◽  
Yui Oka ◽  
Yasumasa Kano ◽  
Yuki Yano ◽  
...  

2011 ◽  
Author(s):  
Jian Xue ◽  
Xiaodong Cui ◽  
Gregg Daggett ◽  
Etienne Marcheret ◽  
Bowen Zhou

2019 ◽  
Vol 7 ◽  
pp. 313-325 ◽  
Author(s):  
Matthias Sperber ◽  
Graham Neubig ◽  
Jan Niehues ◽  
Alex Waibel

Speech translation has traditionally been approached through cascaded models consisting of a speech recognizer trained on a corpus of transcribed speech, and a machine translation system trained on parallel texts. Several recent works have shown the feasibility of collapsing the cascade into a single, direct model that can be trained in an end-to-end fashion on a corpus of translated speech. However, experiments are inconclusive on whether the cascade or the direct model is stronger, and have only been conducted under the unrealistic assumption that both are trained on equal amounts of data, ignoring other available speech recognition and machine translation corpora. In this paper, we demonstrate that direct speech translation models require more data to perform well than cascaded models, and although they allow including auxiliary data through multi-task training, they are poor at exploiting such data, putting them at a severe disadvantage. As a remedy, we propose the use of end- to-end trainable models with two attention mechanisms, the first establishing source speech to source text alignments, the second modeling source to target text alignment. We show that such models naturally decompose into multi-task–trainable recognition and translation tasks and propose an attention-passing technique that alleviates error propagation issues in a previous formulation of a model with two attention stages. Our proposed model outperforms all examined baselines and is able to exploit auxiliary training data much more effectively than direct attentional models.


Sign in / Sign up

Export Citation Format

Share Document