scholarly journals Discriminative piecewise linear transformation based on deep learning for noise robust automatic speech recognition

Author(s):  
Yosuke Kashiwagi ◽  
Daisuke Saito ◽  
Nobuaki Minematsu ◽  
Keikichi Hirose
2017 ◽  
Vol 24 (4) ◽  
pp. 339-352
Author(s):  
Nattanun Thatphithakkul ◽  
Boontee Kruatrachue ◽  
Chai Wutiwiwatchai ◽  
Sanparith Marukatat ◽  
Vataya Boonpiam

This paper proposes an efficient method of simulated-data adaptation for robust speech recognition. The method is applied to tree-structured piecewise linear transformation (PLT). The original PLT selects an acoustic model using tree-structured HMMs and the acoustic model is adapted by input speech in an unsupervised scheme. This adaptation can degrade the acoustic model if the input speech is incorrectly transcribed during the adaptation process. Moreover, adaptation may not be effective if only the input speech is used. Our proposed method increases the size of adaptation data by adding noise portions from the input speech to a set of prerecorded clean speech, of which correct transcriptions are known. We investigate various configurations of the proposed method. Evaluations are performed with both additive and real noisy speech. The experimental results show that the proposed system reaches higher recognition rate than MLLR, HMM-based model selection and PLT.


2021 ◽  
Author(s):  
Matheus Xavier Sampaio ◽  
Regis Pires Magalhães ◽  
Ticiana Linhares Coelho da Silva ◽  
Lívia Almada Cruz ◽  
Davi Romero de Vasconcelos ◽  
...  

Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, and Google Cloud Speech-to-Text. The results demonstrate that the evaluated solutions slightly differ. However, Microsoft Azure Speech outperformed the other analyzed APIs.


Sign in / Sign up

Export Citation Format

Share Document