scholarly journals Open Source Toolkit for Speech to Text Translation

2018 ◽  
Vol 111 (1) ◽  
pp. 125-135
Author(s):  
Thomas Zenkel ◽  
Matthias Sperber ◽  
Jan Niehues ◽  
Markus Müller ◽  
Ngoc-Quan Pham ◽  
...  

Abstract In this paper we introduce an open source toolkit for speech translation. While there already exists a wide variety of open source tools for the essential tasks of a speech translation system, our goal is to provide an easy to use recipe for the complete pipeline of translating speech. We provide a Docker container with a ready to use pipeline of the following components: a neural speech recognition system, a sentence segmentation system and an attention-based translation system. We provide recipes for training and evaluating models for the task of translating English lectures and TED talks to German. Additionally, we provide pre-trained models for this task. With this toolkit we hope to facilitate the development of speech translation systems and to encourage researchers to improve the overall performance of speech translation systems.

2021 ◽  
Vol 8 (1) ◽  
pp. 164-170
Author(s):  
Mohammad Husam Alhumsi ◽  
Saleh Belhassen

Phonetic dictionaries are regarded as pivotal components of speech recognition systems. The function of speech recognition research is to generate a machine which will accurately identify and distinguish the normal human speech from any other speaker. Literature affirmed that Arabic phonetics is one of the major problems in Arabic speech recognition. Therefore, this paper reviews previous studies tackling the challenges faced by initiating an Arabic phonetic dictionary with respect to Arabic speech recognition. It has been found that the system of speech recognition investigated areas of differences concerning Arabic phonetics. In addition, an Arabic phonetic dictionary should be initiated where the Arabic vowels’ phonemes should be considered as a component of the consonants’ phonemes. Thus, the incorporation of developed machine translation systems may enhance the quality of the system. The current paper concludes with the existing challenges faced by Arabic phonetic dictionary.


Author(s):  
Diego Alves ◽  
Askars Salimbajevs ◽  
Mārcis Pinnis

Pipeline-based speech translation methods may suffer from errors found in speech recognition system output. Therefore, it is crucial that machine translation systems are trained to be robust against such noise. In this paper, we propose two methods for parallel data augmentation for pipeline-based speech translation system development. The first method utilises a speech processing workflow to introduce errors and the second method generates commonly found suffix errors using a rule-based method. We show that the methods in combination allow significantly improving speech translation quality by 1.87 BLEU points over a baseline system.


Author(s):  
Victor Zue ◽  
James Glass ◽  
Michael Phillips ◽  
Stephanie Seneff

2012 ◽  
Vol 5 ◽  
Author(s):  
Manny Rayner ◽  
Pierrette Bouillon ◽  
Paula Estrella ◽  
Yukie Nakao ◽  
Gwen Christian

We describe a series of experiments in which we start with English to French and English to Japanese versions of a rule-based speech translation system for a medical domain, and bootstrap corresponding statistical systems. Comparative evaluation reveals that the statistical systems are still slightly inferior to the rule-based ones, despite the fact that considerable effort has been invested in tuning both the recognition and translation components; however, a hybrid system is able to deliver a small but significant improvement in performance. In conclusion, we suggest that the hybrid architecture we describe potentially allows construction of limited-domain speech translation systems which combine substantial source-language coverage with high-precision translation.


2015 ◽  
Vol 115 (18) ◽  
pp. 7-10 ◽  
Author(s):  
Nitin Washani ◽  
Sandeep Sharma

Author(s):  
Anna Fernández Torné ◽  
Anna Matamala

This article aims to compare three machine translation systems with a focus on human evaluation. The systems under analysis are a domain-adapted statistical machine translation system, a domain-adapted neural machine translation system and a generic machine translation system. The comparison is carried out on translation from Spanish into German with industrial documentation of machine tool components and processes. The focus is on the human evaluation of the machine translation output, specifically on: fluency, adequacy and ranking at the segment level; fluency, adequacy, need for post-editing, ease of post-editing, and mental effort required in post-editing at the document level; productivity (post-editing speed and post-editing effort) and attitudes. Emphasis is placed on human factors in the evaluation process.


2020 ◽  
Vol 25 (3) ◽  
pp. 93-98
Author(s):  
Kyu-Seok Kim

Real-time voice translation systems receive a speaker s voice and translate their speech into another language. However, the meaning of a whole Korean sentence can be unintentionally changed because Korean words and syllables can be merged or divided by spaces. Therefore, the spaces between the speaker s sentences are occasionally not identified by the speech recognition system, so the translated sentences are sometimes incorrect. This paper presents a methodology to enhance the accuracy of voice translation by adding intentional spaces. An Android application was implemented using Google speech recognizer for Android and Google translator for the Web. The Google speech recognizer app for Android receives the speaker s voice sentences in Korean and shows the text results. Next, the proposed Android application adds spaces when the speaker speaks the dedicated word for the space. Finally, the modified Korean sentences are translated into English by Google translator for the Web. Using this method can enhance interpretation accuracy for translation systems.


Sign in / Sign up

Export Citation Format

Share Document