Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

Abstract Deep Neural Networks (DNN) are nothing but neural networks with many hidden layers. DNNs are becoming popular in automatic speech recognition tasks which combines a good acoustic with a language model. Standard feedforward neural networks cannot handle speech data well since they do not have a way to feed information from a later layer back to an earlier layer. Thus, Recurrent Neural Networks (RNNs) have been introduced to take temporal dependencies into account. However, the shortcoming of RNNs is that long-term dependencies due to the vanishing/exploding gradient problem cannot be handled. Therefore, Long Short-Term Memory (LSTM) networks were introduced, which are a special case of RNNs, that takes long-term dependencies in a speech in addition to short-term dependencies into account. Similarily, GRU (Gated Recurrent Unit) networks are an improvement of LSTM networks also taking long-term dependencies into consideration. Thus, in this paper, we evaluate RNN, LSTM, and GRU to compare their performances on a reduced TED-LIUM speech data set. The results show that LSTM achieves the best word error rates, however, the GRU optimization is faster while achieving word error rates close to LSTM.

Download Full-text

Multilingual Convolutional, Long Short-Term Memory, Deep Neural Networks for Low Resource Speech Recognition

Procedia Computer Science ◽

10.1016/j.procs.2017.03.179 ◽

2017 ◽

Vol 107 ◽

pp. 842-847 ◽

Cited By ~ 8

Author(s):

Danish bukhari ◽

Yutian Wang ◽

Hui Wang

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Deep Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Low Resource ◽

Long Short Term Memory

Download Full-text

Holistic, long-term soundscape monitoring in Upstate New York using convolutional long short-term memory deep neural networks

The Journal of the Acoustical Society of America ◽

10.1121/1.5147610 ◽

2020 ◽

Vol 148 (4) ◽

pp. 2740-2740

Author(s):

Mallory Morgan ◽

Jonas Braasch

Keyword(s):

Neural Networks ◽

New York ◽

Deep Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Upstate New York ◽

Long Short Term Memory

Download Full-text

Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networks

10.21437/interspeech.2015-717 ◽

2015 ◽

Author(s):

Jie Li ◽

Heng Zhang ◽

Xinyuan Cai ◽

Bo Xu

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

End To End ◽

Chinese Mandarin

Download Full-text

Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling

10.21437/interspeech.2014-151 ◽

2014 ◽

Author(s):

Jürgen T. Geiger ◽

Zixing Zhang ◽

Felix Weninger ◽

Björn Schuller ◽

Gerhard Rigoll

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Robust Speech Recognition ◽

Short Term ◽

Term Memory ◽

Acoustic Modelling ◽

Long Short Term Memory

Download Full-text

Acute Exercise and Sustained Attention on Memory Function

American Journal of Health Behavior ◽

10.5993/ajhb.44.3.5 ◽

2020 ◽

Vol 44 (3) ◽

pp. 326-332

Author(s):

Audreaiona Waters ◽

Liye Zou ◽

Myungjin Jung ◽

Qian Yu ◽

Jingyuan Lin ◽

...

Keyword(s):

Neural Networks ◽

Sustained Attention ◽

Acute Exercise ◽

Short Term Memory ◽

Memory Function ◽

Long Term Memory ◽

Short Term ◽

Attention Task ◽

Term Memory

Objective: Sustained attention is critical for various activities of daily living, including engaging in health-enhancing behaviors and inhibition of health compromising behaviors. Sustained attention activates neural networks involved in episodic memory function, a critical cognition for healthy living. Acute exercise has been shown to activate these same neural networks. Thus, it is plausible that engaging in a sustained attention task and engaging in a bout of acute exercise may have an additive effect in enhancing memory function, which was the purpose of this experiment. Methods: 23 young adults (Mage = 20.7 years) completed 2 visits, with each visit occurring approximately 24 hours apart, in a counterbalanced order, including: (1) acute exercise with sustained attention, and (2) sustained attention only. Memory was assessed using a word-list paradigm and included a short- and long-term memory assessment. Sustained attention was induced via a sustained attention to response task (SART). Acute exercise involved a 15-minute bout of moderate-intensity exercise. Results: Short-term memory performance was significantly greater than long-term memory, Mdiff = 1.86, p < .001, and short-term memory for Exercise with Sustained Attention was significantly greater than short-term memory for Sustained Attention Only, Mdiff = 1.50, p = .01. Conclusion: Engaging in an acute bout of exercise before a sustained attention task additively influenced short-term memory function.

Download Full-text

Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5831 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4115-4122

Author(s):

Kyle Helfrich ◽

Qiang Ye

Keyword(s):

Neural Networks ◽

Unit Circle ◽

Recurrent Neural Networks ◽

Unit Disc ◽

Gradient Descent ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Gradient Descent Algorithm

Several variants of recurrent neural networks (RNNs) with orthogonal or unitary recurrent matrices have recently been developed to mitigate the vanishing/exploding gradient problem and to model long-term dependencies of sequences. However, with the eigenvalues of the recurrent matrix on the unit circle, the recurrent state retains all input information which may unnecessarily consume model capacity. In this paper, we address this issue by proposing an architecture that expands upon an orthogonal/unitary RNN with a state that is generated by a recurrent matrix with eigenvalues in the unit disc. Any input to this state dissipates in time and is replaced with new inputs, simulating short-term memory. A gradient descent algorithm is derived for learning such a recurrent matrix. The resulting method, called the Eigenvalue Normalized RNN (ENRNN), is shown to be highly competitive in several experiments.

Download Full-text

Learning from past mistakes: improving automatic speech recognition output via noisy-clean phrase context modeling

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.31 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 5

Author(s):

Prashanth Gurunath Shivakumar ◽

Haoqi Li ◽

Kevin Knight ◽

Panayiotis Georgiou

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Language Model ◽

Context Modeling ◽

Short Term ◽

Extensive Analysis ◽

Network Language ◽

Correction System

AbstractAutomatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example, pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work, we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (Out-Of-Vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through Recurrent Neural Network (RNN) language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system.

Download Full-text

Long Short-Term Memory (LSTM) Deep Neural Networks in Energy Appliances Prediction

2019 Panhellenic Conference on Electronics & Telecommunications (PACET) ◽

10.1109/pacet48583.2019.8956252 ◽

2019 ◽

Cited By ~ 1

Author(s):

Georgios N. Kouziokas

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition

Archives of Acoustics ◽

10.1515/aoa-2015-0021 ◽

2015 ◽

Vol 40 (2) ◽

pp. 191-195 ◽

Cited By ~ 10

Author(s):

Łukasz Brocki ◽

Krzysztof Marasek

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Short Term Memory ◽

Recognition Accuracy ◽

Superior Performance ◽

Acoustic Model ◽

Short Term ◽

Term Memory ◽

Internal Memory ◽

Long Short Term Memory

Abstract This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional Long-Short Term Memory (LSTM) hybrid used as an acoustic model for Speech Recognition. It was demonstrated by many independent researchers that DBNNs exhibit superior performance to other known machine learning frameworks in terms of speech recognition accuracy. Their superiority comes from the fact that these are deep learning networks. However, a trained DBNN is simply a feed-forward network with no internal memory, unlike Recurrent Neural Networks (RNNs) which are Turing complete and do posses internal memory, thus allowing them to make use of longer context. In this paper, an experiment is performed to make a hybrid of a DBNN with an advanced bidirectional RNN used to process its output. Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy. However, the new model has many parameters and in some cases it may suffer performance issues in real-time applications.

Download Full-text

Long short-term memory based convolutional recurrent neural networks for large vocabulary speech recognition

10.21437/interspeech.2015-648 ◽

2015 ◽

Author(s):

Xiangang Li ◽

Xihong Wu

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Large Vocabulary ◽

Long Short Term Memory ◽

Large Vocabulary Speech Recognition

Download Full-text