Advanced Recurrent Neural Networks for Automatic Speech Recognition

This paper addresses the problem of speech recognition to identify various modes of speech data. Speaker sounds are the acoustic sounds of speech. Statistical models of speech have been widely used for speech recognition under neural networks. In paper we propose and try to justify a new model in which speech co articulation the effect of phonetic context on speech sound is modeled explicitly under a statistical framework. We study speech phone recognition by recurrent neural networks and SOUL Neural Networks. A general framework for recurrent neural networks and considerations for network training are discussed in detail. SOUL NN clustering the large vocabulary that compresses huge data sets of speech. This project also different Indian languages utter by different speakers in different modes such as aggressive, happy, sad, and angry. Many alternative energy measures and training methods are proposed and implemented. A speaker independent phone recognition rate of 82% with 25% frame error rate has been achieved on the neural data base. Neural speech recognition experiments on the NTIMIT database result in a phone recognition rate of 68% correct. The research results in this thesis are competitive with the best results reported in the literature.Â

Download Full-text

Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling

10.21437/interspeech.2014-151 ◽

2014 ◽

Author(s):

Jürgen T. Geiger ◽

Zixing Zhang ◽

Felix Weninger ◽

Björn Schuller ◽

Gerhard Rigoll

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Robust Speech Recognition ◽

Short Term ◽

Term Memory ◽

Acoustic Modelling ◽

Long Short Term Memory

Download Full-text

Literacy by Way of Automatic Speech Recognition

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch121 ◽

2011 ◽

pp. 2074-2118

Author(s):

Russell Gluck ◽

John Fulcher

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Markov Models ◽

Hidden Markov ◽

Time Warping ◽

Oral Storytelling ◽

Pattern Recognition Techniques ◽

Dynamic Time ◽

Over Time

The chapter commences with an overview of automatic speech recognition (ASR), which covers not only the de facto standard approach of hidden Markov models (HMMs), but also the tried-and-proven techniques of dynamic time warping and artificial neural networks (ANNs). The coverage then switches to Gluck’s (2004) draw-talk-write (DTW) process, developed over the past two decades to assist non-text literate people become gradually literate over time through telling and/or drawing their own stories. DTW has proved especially effective with “illiterate” people from strong oral, storytelling traditions. The chapter concludes by relating attempts to date in automating the DTW process using ANN-based pattern recognition techniques on an Apple Macintosh G4™ platform.

Download Full-text

Character-level incremental speech recognition with recurrent neural networks

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7472696 ◽

2016 ◽

Cited By ~ 11

Author(s):

Kyuyeon Hwang ◽

Wonyong Sung

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Recurrent Neural Networks

Download Full-text