Impact of the Approaches Involved on Word-Graph Derivation from the ASR System

The STC ASR System for the VOiCES from a Distance Challenge 2019

10.21437/interspeech.2019-1574 ◽

2019 ◽

Author(s):

Ivan Medennikov ◽

Yuri Khokhlov ◽

Aleksei Romanenko ◽

Ivan Sorokin ◽

Anton Mitrofanov ◽

...

Keyword(s):

Asr System

Download Full-text

On the Impact of Gabor Phase for Spectro-Temporal Feature Extraction in Building an ASR System

2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) ◽

10.1109/iemcon51383.2020.9284872 ◽

2020 ◽

Author(s):

Anirban Dutta ◽

Gudmalwar Prabhakar ◽

Ch V Rama Rao

Keyword(s):

Feature Extraction ◽

The Impact ◽

Asr System ◽

Temporal Feature

Download Full-text

Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition

Sensors ◽

10.3390/s21093063 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3063

Author(s):

Aleksandr Laptev ◽

Andrei Andrusenko ◽

Ivan Podluzhny ◽

Anton Mitrofanov ◽

Ivan Medennikov ◽

...

Keyword(s):

Speech Recognition ◽

Error Rate ◽

Rapid Development ◽

Computational Cost ◽

Vocabulary Size ◽

Word Error Rate ◽

Low Resource ◽

Steady Improvement ◽

End To End ◽

Asr System

With the rapid development of speech assistants, adapting server-intended automatic speech recognition (ASR) solutions to a direct device has become crucial. For on-device speech recognition tasks, researchers and industry prefer end-to-end ASR systems as they can be made resource-efficient while maintaining a higher quality compared to hybrid systems. However, building end-to-end models requires a significant amount of speech data. Personalization, which is mainly handling out-of-vocabulary (OOV) words, is another challenging task associated with speech assistants. In this work, we consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate, embodied in Babel Turkish and Babel Georgian tasks. We propose a method of dynamic acoustic unit augmentation based on the Byte Pair Encoding with dropout (BPE-dropout) technique. The method non-deterministically tokenizes utterances to extend the token’s contexts and to regularize their distribution for the model’s recognition of unseen words. It also reduces the need for optimal subword vocabulary size search. The technique provides a steady improvement in regular and personalized (OOV-oriented) speech recognition tasks (at least 6% relative word error rate (WER) and 25% relative F-score) at no additional computational cost. Owing to the BPE-dropout use, our monolingual Turkish Conformer has achieved a competitive result with 22.2% character error rate (CER) and 38.9% WER, which is close to the best published multilingual system.

Download Full-text

Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling

International Journal of Vehicle Autonomous Systems ◽

10.1504/ijvas.2020.10039663 ◽

2020 ◽

Vol 15 (3/4) ◽

pp. 296

Author(s):

Puneet Bawa ◽

Shashi Bala ◽

Virender Kadyan ◽

Mohit Mittal

Keyword(s):

Acoustic Modelling ◽

Noise Robust ◽

Asr System

Download Full-text

Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate?

10.21437/interspeech.2009-607 ◽

2009 ◽

Author(s):

Paul Deléglise ◽

Yannick Estève ◽

Sylvain Meignier ◽

Teva Merlin

Keyword(s):

Error Rate ◽

Word Error Rate ◽

Asr System

Download Full-text

Development of the SRI/nightingale Arabic ASR system

10.21437/interspeech.2008-415 ◽

2008 ◽

Author(s):

D. Vergyri ◽

A. Mandal ◽

Wen Wang ◽

Andreas Stolcke ◽

Jing Zheng ◽

...

Keyword(s):

Asr System

Download Full-text

The Audio Auditor: User-Level Membership Inference in Internet of Things Voice Services

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0012 ◽

2021 ◽

Vol 2021 (1) ◽

pp. 209-228

Author(s):

Yuantian Miao ◽

Minhui Xue ◽

Chao Chen ◽

Lei Pan ◽

Jun Zhang ◽

...

Keyword(s):

Internet Of Things ◽

State Of The Art ◽

Rapid Development ◽

Black Box ◽

Problem Space ◽

Specific Data ◽

Learning Techniques ◽

Audio Data ◽

Iot Devices ◽

Asr System

AbstractWith the rapid development of deep learning techniques, the popularity of voice services implemented on various Internet of Things (IoT) devices is ever increasing. In this paper, we examine user-level membership inference in the problem space of voice services, by designing an audio auditor to verify whether a specific user had unwillingly contributed audio used to train an automatic speech recognition (ASR) model under strict black-box access. With user representation of the input audio data and their corresponding translated text, our trained auditor is effective in user-level audit. We also observe that the auditor trained on specific data can be generalized well regardless of the ASR model architecture. We validate the auditor on ASR models trained with LSTM, RNNs, and GRU algorithms on two state-of-the-art pipelines, the hybrid ASR system and the end-to-end ASR system. Finally, we conduct a real-world trial of our auditor on iPhone Siri, achieving an overall accuracy exceeding 80%. We hope the methodology developed in this paper and findings can inform privacy advocates to overhaul IoT privacy.

Download Full-text