Open-source speech-processing platforms: An application example

2018 ◽  
Vol 143 (3) ◽  
pp. 1738-1738
Author(s):  
Arthur Boothroyd ◽  
Harinath Garudadri ◽  
Gregory Hobbs
Author(s):  
Kartik Tiwari

Abstract: This paper introduces a new text-to-speech presentation from end-to-end (E2E-TTS) using toolkit called ESPnet-TTS, which is an open source extension. ESPnet speech processing tools kit. Various models come under ESPnet TTS TacoTron 2, Transformer TTS, and Fast Speech. This also provides recipes recommended by the Kaldi speech recognition tool kit (ASR). Recipes based on the composition combined with the ESPnet ASR recipe, which provides high performance. This toolkit also provides pre-trained models and samples of all recipes for users to use as a base .It works on TTS-STT and translation features for various indicator languages, with a strong focus on English, Marathi and Hindi. This paper also shows that neural sequence-to-sequence models find the state of the art or near the effects of the art state on existing databases. We also analyze some of the key design challenges that contribute to the development of a multilingual business translation system, which includes processing bilingual business data sets and evaluating multiple translation methods. The test result can be obtained using tokens and these test results show that our models can achieve modern performance compared to the latest LJ Speech tool kit data. Terms of Reference — Open source, end-to-end, text-to-speech


2011 ◽  
Vol 22 (08) ◽  
pp. 1781-1795 ◽  
Author(s):  
CYRIL ALLAUZEN ◽  
MICHAEL RILEY ◽  
JOHAN SCHALKWYK

This paper describes a weighted finite-state transducer composition algorithm that generalizes the concept of the composition filter and presents various filters that process epsilon transitions, look-ahead along paths, and push forward labels along epsilon paths. These filters, either individually or in combination, make it possible to compose some transducers much more efficiently in time and space than otherwise possible. We present examples of this drawn, in part, from demanding speech-processing applications. The generalized composition algorithm and many of these filters have been included in Open-Fst, an open-source weighted transducer library.


Author(s):  
Piotr Kłosowski ◽  
Adam Dustor ◽  
Jacek Izydorczyk ◽  
Jan Kotas ◽  
Jacek Ślimok

Author(s):  
Fadi P. Deek ◽  
James A. M. McHugh
Keyword(s):  

2009 ◽  
Vol 14 (1) ◽  
pp. 78-89 ◽  
Author(s):  
Kenneth Hugdahl ◽  
René Westerhausen

The present paper is based on a talk on hemispheric asymmetry given by Kenneth Hugdahl at the Xth European Congress of Psychology, Praha July 2007. Here, we propose that hemispheric asymmetry evolved because of a left hemisphere speech processing specialization. The evolution of speech and the need for air-based communication necessitated division of labor between the hemispheres in order to avoid having duplicate copies in both hemispheres that would increase processing redundancy. It is argued that the neuronal basis of this labor division is the structural asymmetry observed in the peri-Sylvian region in the posterior part of the temporal lobe, with a left larger than right planum temporale area. This is the only example where a structural, or anatomical, asymmetry matches a corresponding functional asymmetry. The increase in gray matter volume in the left planum temporale area corresponds to a functional asymmetry of speech processing, as indexed from both behavioral, dichotic listening, and functional neuroimaging studies. The functional anatomy of the corpus callosum also supports such a view, with regional specificity of information transfer between the hemispheres.


Sign in / Sign up

Export Citation Format

Share Document