Dynamic Sparsity Neural Networks for Automatic Speech Recognition

The chapter commences with an overview of automatic speech recognition (ASR), which covers not only the de facto standard approach of hidden Markov models (HMMs), but also the tried-and-proven techniques of dynamic time warping and artificial neural networks (ANNs). The coverage then switches to Gluck’s (2004) draw-talk-write (DTW) process, developed over the past two decades to assist non-text literate people become gradually literate over time through telling and/or drawing their own stories. DTW has proved especially effective with “illiterate” people from strong oral, storytelling traditions. The chapter concludes by relating attempts to date in automating the DTW process using ANN-based pattern recognition techniques on an Apple Macintosh G4™ platform.

Download Full-text

Combining De-noising Auto-encoder and Recurrent Neural Networks in End-to-End Automatic Speech Recognition for Noise Robustness

2018 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2018.8639597 ◽

2018 ◽

Author(s):

Tzu-Hsuan Ting ◽

Chia-Ping Chen

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Networks ◽

Noise Robustness ◽

End To End

Download Full-text

Speech Assistance for Persons With Speech Impediments Using Artificial Neural Networks

Volume 3: Biomedical and Biotechnology Engineering ◽

10.1115/imece2017-71027 ◽

2017 ◽

Author(s):

Ramy Mounir ◽

Redwan Alqasemi ◽

Rajiv Dubey

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Challenging Problem ◽

Speech Impairment ◽

Recognition Model ◽

Wide Range ◽

Speech Variability

This work focuses on the research related to enabling individuals with speech impairment to use speech-to-text software to recognize and dictate their speech. Automatic Speech Recognition (ASR) tends to be a challenging problem for researchers because of the wide range of speech variability. Some of the variabilities include different accents, pronunciations, speeds, volumes, etc. It is very difficult to train an end-to-end speech recognition model on data with speech impediment due to the lack of large enough datasets, and the difficulty of generalizing a speech disorder pattern on all users with speech impediments. This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.

Download Full-text

Neural networks for feature computations in automatic speech recognition

[Proceedings 1992] IJCNN International Joint Conference on Neural Networks ◽

10.1109/ijcnn.1992.227242 ◽

2003 ◽

Author(s):

S.A. Zahorian ◽

D. Livingston

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition

Download Full-text

Adaptation of context-dependent deep neural networks for automatic speech recognition

2012 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt.2012.6424251 ◽

2012 ◽

Cited By ~ 92

Author(s):

Kaisheng Yao ◽

Dong Yu ◽

Frank Seide ◽

Hang Su ◽

Li Deng ◽

...

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Deep Neural Networks ◽

Context Dependent

Download Full-text

AN OVERVIEW OF METHODS FOR GENERATING, AUGMENTING AND EVALUATING ROOM IMPULSE RESPONSE USING ARTIFICIAL NEURAL NETWORKS

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2021.15152 ◽

2021 ◽

Vol 13 (0) ◽

pp. 1-5

Author(s):

Mantas Tamulionis

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Impulse Response ◽

Automatic Speech Recognition ◽

Audio Signal ◽

Training Data ◽

Audio Signal Processing ◽

Artificial Neural

Methods based on artificial neural networks (ANN) are widely used in various audio signal processing tasks. This provides opportunities to optimize processes and save resources required for calculations. One of the main objects we need to get to numerically capture the acoustics of a room is the room impulse response (RIR). Increasingly, research authors choose not to record these impulses in a real room but to generate them using ANN, as this gives them the freedom to prepare unlimited-sized training datasets. Neural networks are also used to augment the generated impulses to make them similar to the ones actually recorded. The widest use of ANN so far is observed in the evaluation of the generated results, for example, in automatic speech recognition (ASR) tasks. This review also describes datasets of recorded RIR impulses commonly found in various studies that are used as training data for neural networks.

Download Full-text