Impact of the Approaches Involved on Word-Graph Derivation from the ASR System

Author(s):  
Raquel Justo ◽  
Alicia Pérez ◽  
M. Inés Torres
Keyword(s):  
2019 ◽  
Author(s):  
Ivan Medennikov ◽  
Yuri Khokhlov ◽  
Aleksei Romanenko ◽  
Ivan Sorokin ◽  
Anton Mitrofanov ◽  
...  
Keyword(s):  

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3063
Author(s):  
Aleksandr Laptev ◽  
Andrei Andrusenko ◽  
Ivan Podluzhny ◽  
Anton Mitrofanov ◽  
Ivan Medennikov ◽  
...  

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.


2020 ◽  
Vol 15 (3/4) ◽  
pp. 296
Author(s):  
Puneet Bawa ◽  
Shashi Bala ◽  
Virender Kadyan ◽  
Mohit Mittal

2009 ◽  
Author(s):  
Paul Deléglise ◽  
Yannick Estève ◽  
Sylvain Meignier ◽  
Teva Merlin

2008 ◽  
Author(s):  
D. Vergyri ◽  
A. Mandal ◽  
Wen Wang ◽  
Andreas Stolcke ◽  
Jing Zheng ◽  
...  
Keyword(s):  

2021 ◽  
Vol 2021 (1) ◽  
pp. 209-228
Author(s):  
Yuantian Miao ◽  
Minhui Xue ◽  
Chao Chen ◽  
Lei Pan ◽  
Jun Zhang ◽  
...  

AbstractWith the rapid development of deep learning techniques, the popularity of voice services implemented on various Internet of Things (IoT) devices is ever increasing. In this paper, we examine user-level membership inference in the problem space of voice services, by designing an audio auditor to verify whether a specific user had unwillingly contributed audio used to train an automatic speech recognition (ASR) model under strict black-box access. With user representation of the input audio data and their corresponding translated text, our trained auditor is effective in user-level audit. We also observe that the auditor trained on specific data can be generalized well regardless of the ASR model architecture. We validate the auditor on ASR models trained with LSTM, RNNs, and GRU algorithms on two state-of-the-art pipelines, the hybrid ASR system and the end-to-end ASR system. Finally, we conduct a real-world trial of our auditor on iPhone Siri, achieving an overall accuracy exceeding 80%. We hope the methodology developed in this paper and findings can inform privacy advocates to overhaul IoT privacy.


Sign in / Sign up

Export Citation Format

Share Document