scholarly journals Data Augmentation for Pipeline-Based Speech Translation

Author(s):  
Diego Alves ◽  
Askars Salimbajevs ◽  
Mārcis Pinnis

Pipeline-based speech translation methods may suffer from errors found in speech recognition system output. Therefore, it is crucial that machine translation systems are trained to be robust against such noise. In this paper, we propose two methods for parallel data augmentation for pipeline-based speech translation system development. The first method utilises a speech processing workflow to introduce errors and the second method generates commonly found suffix errors using a rule-based method. We show that the methods in combination allow significantly improving speech translation quality by 1.87 BLEU points over a baseline system.

2018 ◽  
Vol 111 (1) ◽  
pp. 125-135
Author(s):  
Thomas Zenkel ◽  
Matthias Sperber ◽  
Jan Niehues ◽  
Markus Müller ◽  
Ngoc-Quan Pham ◽  
...  

Abstract In this paper we introduce an open source toolkit for speech translation. While there already exists a wide variety of open source tools for the essential tasks of a speech translation system, our goal is to provide an easy to use recipe for the complete pipeline of translating speech. We provide a Docker container with a ready to use pipeline of the following components: a neural speech recognition system, a sentence segmentation system and an attention-based translation system. We provide recipes for training and evaluating models for the task of translating English lectures and TED talks to German. Additionally, we provide pre-trained models for this task. With this toolkit we hope to facilitate the development of speech translation systems and to encourage researchers to improve the overall performance of speech translation systems.


2012 ◽  
Vol 5 ◽  
Author(s):  
Manny Rayner ◽  
Pierrette Bouillon ◽  
Paula Estrella ◽  
Yukie Nakao ◽  
Gwen Christian

We describe a series of experiments in which we start with English to French and English to Japanese versions of a rule-based speech translation system for a medical domain, and bootstrap corresponding statistical systems. Comparative evaluation reveals that the statistical systems are still slightly inferior to the rule-based ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; however, a hybrid system is able to deliver a small but significant improvement in performance. In conclusion, we suggest that the hybrid architecture we describe potentially allows construction of limited-domain speech translation systems which combine substantial source-language coverage with high-precision translation.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Jing Ning ◽  
Haidong Ban

With the development of linguistics and the improvement of computer performance, the effect of machine translation is getting better and better, and it is widely used. The automatic expression translation method based on the Chinese-English machine takes short sentences as the basic translation unit and makes full use of the order of short sentences. Compared with word-based statistical machine translation methods, the effect is greatly improved. The performance of machine translation is constantly improving. This article aims to study the design of phrase-based automatic machine translation systems by introducing machine translation methods and Chinese-English phrase translation, explore the design and testing of machine automatic translation systems based on the combination of Chinese-English phrase translation, and explain the role of machine automatic translation in promoting the development of translation. In this article, through the combination of machine translation experiments and machine automatic translation system design methods, the design and testing of machine automatic translation systems based on Chinese-English phrase translation combinations are studied to cultivate people's understanding of language, knowledge, and intelligence and then help solve other problems. Language processing issues promote the development of corpus linguistics. The experimental results in this article show that when the Chinese-English phrase translation probability table is changed from 82% to 51%, the BLEU translation evaluation system for the combination of Chinese-English phrases is improved. Automatic machine translation saves time and energy of translation work, which shows that machine translation shows its advantages due to its short development cycle and easy processing of large-scale corpora.


2020 ◽  
Vol 8 (5) ◽  
pp. 2118-2124

The naturalness of the speech in any human being comes from his or her emotions. All human beings deliver and construe the messages with heavy use of emotions. So there is a need to develop a speech interface through which emotions embedded in the speech signal can be analyzed and processed. There are many speech translation systems developed with intent to interpret the inherent emotions in the speech signals but lack in processing the embedded emotions in the speech as because there is a lacuna in their modeling and depiction. The main objective of any speech processing system is to retrieve interesting information from speech like features, models so that the retrieved knowledge of interest can be further used in various speech processing applications. The scope of the present paper is to travel around the attributes of speech and its respective models with a goal to distinguish emotions by imprisoning precise information about emotion. This paper also studied various sources like source of excitement, vocal track system’s silhouettes and its sequence, attributes of supra- segment to obtain a rich source of emotional information of a speech. The paper end with a final conclusion saying that source of excitation and its characteristics may be be single handedly enough for efficient acknowledgment of emotions


Author(s):  
Kartik Tiwari

Abstract: This paper introduces a new text-to-speech presentation from end-to-end (E2E-TTS) using toolkit called ESPnet-TTS, which is an open source extension. ESPnet speech processing tools kit. Various models come under ESPnet TTS TacoTron 2, Transformer TTS, and Fast Speech. This also provides recipes recommended by the Kaldi speech recognition tool kit (ASR). Recipes based on the composition combined with the ESPnet ASR recipe, which provides high performance. This toolkit also provides pre-trained models and samples of all recipes for users to use as a base .It works on TTS-STT and translation features for various indicator languages, with a strong focus on English, Marathi and Hindi. This paper also shows that neural sequence-to-sequence models find the state of the art or near the effects of the art state on existing databases. We also analyze some of the key design challenges that contribute to the development of a multilingual business translation system, which includes processing bilingual business data sets and evaluating multiple translation methods. The test result can be obtained using tokens and these test results show that our models can achieve modern performance compared to the latest LJ Speech tool kit data. Terms of Reference — Open source, end-to-end, text-to-speech


2021 ◽  
Vol 2 (4) ◽  
Author(s):  
Bharathi Raja Chakravarthi ◽  
Priya Rani ◽  
Mihael Arcan ◽  
John P. McCrae

AbstractMachine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the linguistic difference and variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography’s influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts using cognate information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction.


2017 ◽  
Vol 11 (4) ◽  
pp. 55
Author(s):  
Parnyan Bahrami Dashtaki

Speech-to-speech translation is a challenging problem, due to poor sentence planning typically associated with spontaneous speech, as well as errors caused by automatic speech recognition. Based upon a statistically trained speech translation system, in this study, we try to investigate methodologies and metrics employed to assess the (speech-to-speech) way in translation systems. The speech translation is performed incrementally based on generation of partial hypotheses from speech recognition. Speech-input translation can be properly approached as a pattern recognition problem by means of statistical alignment models and stochastic finite-state transducers. Under this general framework, some specific models are presented. One of the features of such models is their capability of automatically learning from training examples. The speech translation system consists of three modules: automatic speech recognition, machine translation and text to speech synthesis. Many procedures for incorporation of speech recognition and machine translation have been projected. In this research, we want explore methodologies and metrics employed to assess the (speech-to-speech) way in translation systems.


2007 ◽  
Vol 33 (1) ◽  
pp. 9-40 ◽  
Author(s):  
Nicola Ueffing ◽  
Hermann Ney

This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover,the application of confidence measures for rescoring of translation hypotheses will be investigated.


Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1157 ◽  
Author(s):  
Daria Vazhenina ◽  
Konstantin Markov

Despite the progress of deep neural networks over the last decade, the state-of-the-art speech recognizers in noisy environment conditions are still far from reaching satisfactory performance. Methods to improve noise robustness usually include adding components to the recognition system that often need optimization. For this reason, data augmentation of the input features derived from the Short-Time Fourier Transform (STFT) has become a popular approach. However, for many speech processing tasks, there is an evidence that the combination of STFT-based and Hilbert–Huang transform (HHT)-based features improves the overall performance. The Hilbert spectrum can be obtained using adaptive mode decomposition (AMD) techniques, which are noise-robust and suitable for non-linear and non-stationary signal analysis. In this study, we developed a DeepSpeech2-based recognition system by adding a combination of STFT and HHT spectrum-based features. We propose several ways to combine those features at different levels of the neural network. All evaluations were performed using the WSJ and CHiME-4 databases. Experimental results show that combining STFT and HHT spectra leads to a 5–7% relative improvement in noisy speech recognition.


2019 ◽  
Vol 4 (12) ◽  
pp. 149-154
Author(s):  
Victor Sorochi Uko ◽  
Gachada Benard Dubukumah ◽  
Ibrahim Kafayat Ayosubomi

For the purpose of an effective communication, speech becomes a convenient conduit to convey messages as an important activity in human life. The need for better reception of voice messages between humans and the environment they interact with becomes a trend and an area of interest that needs to be studied for improvements. A speech to text translation system is an embedded based design that convert analogue signals particularly voice from an input into digital signals that a computer or any electronic device can understand and perform a required task or display the equivalent digital signal in text on a screen. Speech translation systems mitigates the bottlenecks to an efficient communication caused by other varieties of communication methods. Even though speech translation and recognition designs haven’t been well explored for electronic integration due to complexity and variation of sound signals from sources, this low cost, simple and portable project was incorporated to serve as a substratum and alternative in speech to text translation designs using microcontroller, in other to bridge the gaps in the world of human communications. This research paper addresses design methodology, limitations, recommendations and applications of the implemented speech to text translation system for improved communication reception.


Sign in / Sign up

Export Citation Format

Share Document