Performance Analysis and Recognition of Speech using Recurrent Neural Network

Bishon Lamichanne; Hari K.C.

doi:10.3126/tj.v1i1.27596

Performance Analysis and Recognition of Speech using Recurrent Neural Network

Technical Journal ◽

10.3126/tj.v1i1.27596 ◽

2019 ◽

Vol 1 (1) ◽

pp. 87-95

Author(s):

Bishon Lamichanne ◽

Hari K.C.

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Performance Analysis ◽

Recurrent Neural Network ◽

Error Rate ◽

English Language ◽

Daily Lives ◽

Crucial Step ◽

The Neural Network ◽

Positive Results

Speech is one of the most natural ways to communicate between people. It plays an important role in our daily lives. To make machines able to talk with people is a challenging but very useful task. A crucial step is to enable machines to recognize and understand what people are saying. Hence, speech recognition becomes a key technique providing an interface for communication between machines and humans. There has been a long research history on speech recognition. Neural network is known as a technique that has ability to classify nonlinear problem. Today, lots of research are going in the field of speech recognition with the help of the Neural Network. Even though positive results have been obtained from continuous study, research on minimizing the error rate is still gaining lots attention. The English language offers a number of challenges for speech recognition. This paper implements the RNN to analyze and recognize speech from the set of spoken words.

Download Full-text

The System for Speech Recognition on the Basis of the Neural Network

Telecommunications and Radio Engineering ◽

10.1615/telecomradeng.v62.i2.40 ◽

2004 ◽

Vol 62 (1-6) ◽

pp. 131-142

Author(s):

V. A. Pimenov

Keyword(s):

Neural Network ◽

Speech Recognition ◽

The Neural Network

Download Full-text

Exploiting variable length segments with coarticulation effect in online speech recognition based on deep bidirectional recurrent neural network and context-sensitive segment

International Journal of Speech Technology ◽

10.1007/s10772-021-09885-1 ◽

2021 ◽

Author(s):

Song-Il Mun ◽

Chol-Jin Han ◽

Hye-Song Hong

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Variable Length ◽

Context Sensitive

Download Full-text

Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461550 ◽

2018 ◽

Cited By ~ 4

Author(s):

Xunying Liu ◽

Shansong Liu ◽

Jinze Sha ◽

Jianwei Yu ◽

Zhiyuan Xu ◽

...

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Language Models ◽

Limited Memory ◽

Network Language

Download Full-text

Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Communications in Computer and Information Science - Neural Information Processing ◽

10.1007/978-3-030-36802-9_76 ◽

2019 ◽

pp. 718-726

Author(s):

Jiabin Xue ◽

Tieran Zheng ◽

Jiqing Han

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Network ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Fast and accurate recurrent neural network acoustic models for speech recognition

10.21437/interspeech.2015-350 ◽

2015 ◽

Cited By ~ 3

Author(s):

Haşim Sak ◽

Andrew Senior ◽

Kanishka Rao ◽

Françoise Beaufays

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Acoustic Models

Download Full-text

Research on Speaker Recognition of DRNN in Different Noise Environment

10.21203/rs.3.rs-124941/v1 ◽

2020 ◽

Author(s):

chaofeng lan ◽

yuanyuan Zhang ◽

hongyun Zhao

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Speaker Recognition ◽

Signal To Noise Ratio ◽

Recognition Rate ◽

Noisy Environment ◽

Signal To Noise ◽

Noise Ratio ◽

Improved Model

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.

Download Full-text

A French to English Language Translator Using Recurrent Neural Network with Attention Mechanism

Nanoelectronics, Circuits and Communication Systems - Lecture Notes in Electrical Engineering ◽

10.1007/978-981-15-2854-5_38 ◽

2020 ◽

pp. 437-451

Author(s):

Akshay Sharma ◽

Partha Sarathy Banerjee ◽

Akshit Sharma ◽

Akshansh Yadav

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

English Language ◽

Attention Mechanism

Download Full-text

Joint CTC-Attention End-to-End Speech Recognition with a Triangle Recurrent Neural Network Encoder

Journal of Shanghai Jiaotong University (Science) ◽

10.1007/s12204-019-2147-6 ◽

2019 ◽

Vol 25 (1) ◽

pp. 70-75

Author(s):

Tao Zhu ◽

Chunling Cheng

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

End To End

Download Full-text

On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7472641 ◽

2016 ◽

Cited By ~ 36

Author(s):

Liang Lu ◽

Xingxing Zhang ◽

Steve Renais

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Large Vocabulary ◽

End To End

Download Full-text

Where to put the image in an image caption generator

Natural Language Engineering ◽

10.1017/s1351324918000098 ◽

2018 ◽

Vol 24 (3) ◽

pp. 467-489 ◽

Cited By ~ 16

Author(s):

MARC TANTI ◽

ALBERT GATT ◽

KENNETH P. CAMILLERI

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

State Vector ◽

Language Model ◽

Image Features ◽

Subsequent Stage ◽

Multimodal Integration ◽

The Neural Network ◽

Large Memory ◽

Image Caption

AbstractWhen a recurrent neural network (RNN) language model is used for caption generation, the image information can be fed to the neural network either by directly incorporating it in the RNN – conditioning the language model by ‘injecting’ image features – or in a layer following the RNN – conditioning the language model by ‘merging’ image features. While both options are attested in the literature, there is as yet no systematic comparison between the two. In this paper, we empirically show that it is not especially detrimental to performance whether one architecture is used or another. The merge architecture does have practical advantages, as conditioning by merging allows the RNN’s hidden state vector to shrink in size by up to four times. Our results suggest that the visual and linguistic modalities for caption generation need not be jointly encoded by the RNN as that yields large, memory-intensive models with few tangible advantages in performance; rather, the multimodal integration should be delayed to a subsequent stage.

Download Full-text