scholarly journals Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

2019 ◽  
Vol 9 (4) ◽  
pp. 235-245 ◽  
Author(s):  
Apeksha Shewalkar ◽  
Deepika Nyavanandi ◽  
Simone A. Ludwig

Abstract Deep Neural Networks (DNN) are nothing but neural networks with many hidden layers. DNNs are becoming popular in automatic speech recognition tasks which combines a good acoustic with a language model. Standard feedforward neural networks cannot handle speech data well since they do not have a way to feed information from a later layer back to an earlier layer. Thus, Recurrent Neural Networks (RNNs) have been introduced to take temporal dependencies into account. However, the shortcoming of RNNs is that long-term dependencies due to the vanishing/exploding gradient problem cannot be handled. Therefore, Long Short-Term Memory (LSTM) networks were introduced, which are a special case of RNNs, that takes long-term dependencies in a speech in addition to short-term dependencies into account. Similarily, GRU (Gated Recurrent Unit) networks are an improvement of LSTM networks also taking long-term dependencies into consideration. Thus, in this paper, we evaluate RNN, LSTM, and GRU to compare their performances on a reduced TED-LIUM speech data set. The results show that LSTM achieves the best word error rates, however, the GRU optimization is faster while achieving word error rates close to LSTM.

2020 ◽  
Vol 44 (3) ◽  
pp. 326-332
Author(s):  
Audreaiona Waters ◽  
Liye Zou ◽  
Myungjin Jung ◽  
Qian Yu ◽  
Jingyuan Lin ◽  
...  

Objective: Sustained attention is critical for various activities of daily living, including engaging in health-enhancing behaviors and inhibition of health compromising behaviors. Sustained attention activates neural networks involved in episodic memory function, a critical cognition for healthy living. Acute exercise has been shown to activate these same neural networks. Thus, it is plausible that engaging in a sustained attention task and engaging in a bout of acute exercise may have an additive effect in enhancing memory function, which was the purpose of this experiment. Methods: 23 young adults (Mage = 20.7 years) completed 2 visits, with each visit occurring approximately 24 hours apart, in a counterbalanced order, including: (1) acute exercise with sustained attention, and (2) sustained attention only. Memory was assessed using a word-list paradigm and included a short- and long-term memory assessment. Sustained attention was induced via a sustained attention to response task (SART). Acute exercise involved a 15-minute bout of moderate-intensity exercise. Results: Short-term memory performance was significantly greater than long-term memory, Mdiff = 1.86, p < .001, and short-term memory for Exercise with Sustained Attention was significantly greater than short-term memory for Sustained Attention Only, Mdiff = 1.50, p = .01. Conclusion: Engaging in an acute bout of exercise before a sustained attention task additively influenced short-term memory function.


2020 ◽  
Vol 34 (04) ◽  
pp. 4115-4122
Author(s):  
Kyle Helfrich ◽  
Qiang Ye

Several variants of recurrent neural networks (RNNs) with orthogonal or unitary recurrent matrices have recently been developed to mitigate the vanishing/exploding gradient problem and to model long-term dependencies of sequences. However, with the eigenvalues of the recurrent matrix on the unit circle, the recurrent state retains all input information which may unnecessarily consume model capacity. In this paper, we address this issue by proposing an architecture that expands upon an orthogonal/unitary RNN with a state that is generated by a recurrent matrix with eigenvalues in the unit disc. Any input to this state dissipates in time and is replaced with new inputs, simulating short-term memory. A gradient descent algorithm is derived for learning such a recurrent matrix. The resulting method, called the Eigenvalue Normalized RNN (ENRNN), is shown to be highly competitive in several experiments.


Author(s):  
Prashanth Gurunath Shivakumar ◽  
Haoqi Li ◽  
Kevin Knight ◽  
Panayiotis Georgiou

AbstractAutomatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example, pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work, we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (Out-Of-Vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through Recurrent Neural Network (RNN) language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system.


2015 ◽  
Vol 40 (2) ◽  
pp. 191-195 ◽  
Author(s):  
Łukasz Brocki ◽  
Krzysztof Marasek

Abstract This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional Long-Short Term Memory (LSTM) hybrid used as an acoustic model for Speech Recognition. It was demonstrated by many independent researchers that DBNNs exhibit superior performance to other known machine learning frameworks in terms of speech recognition accuracy. Their superiority comes from the fact that these are deep learning networks. However, a trained DBNN is simply a feed-forward network with no internal memory, unlike Recurrent Neural Networks (RNNs) which are Turing complete and do posses internal memory, thus allowing them to make use of longer context. In this paper, an experiment is performed to make a hybrid of a DBNN with an advanced bidirectional RNN used to process its output. Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy. However, the new model has many parameters and in some cases it may suffer performance issues in real-time applications.


Sign in / Sign up

Export Citation Format

Share Document