Speech Recognition using Multiscale Scattering of Audio Signals and Long Short-Term Memory of Neural Networks

Communication is one of the key elements of interaction. In order to understand the audio language used by humans, machines use different techniques to convert speech to machine readable form called speech recognition. This paper takes one of the most classic examples of the speech recognition domain, the spoken digit’s recognition. The recognition is done with the help of a technique called wavelet scattering that initially extracts useful information from the signals and sends this information further to a Long Short-Term Memory (LSTM) network to classify the signals. A major advantage of using the LSTM is that it overcomes the vanishing gradient problem and this proposed technique can be used in applications like entry of numerical data for blind people. This method provides an increased accuracy than other standard methods that uses Melfrequency Cepstral coefficients (MFFC) and LSTM network to recognize digits. The main objective of this work achieved its primary purpose to validate the efficiency of wavelet scattering technique and LSTM networks for spoken digits’ recognition

Download Full-text

Post Text Processing of Chinese Speech Recognition Based on Bidirectional LSTM Networks and CRF

Electronics ◽

10.3390/electronics8111248 ◽

2019 ◽

Vol 8 (11) ◽

pp. 1248 ◽

Cited By ~ 3

Author(s):

Li Yang ◽

Ying Li ◽

Jin Wang ◽

Zhuo Tang

Keyword(s):

Speech Recognition ◽

Error Detection ◽

Short Term Memory ◽

Text Processing ◽

Conditional Random Field ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

Lstm Network ◽

Two Stages

With the rapid development of Internet of Things Technology, speech recognition has been applied more and more widely. Chinese Speech Recognition is a complex process. In the process of speech-to-text conversion, due to the influence of dialect, environmental noise, and context, the accuracy of speech-to-text in multi-round dialogues and specific contexts is still not high. After the general speech recognition technology, the text after speech recognition can be detected and corrected in the specific context, which is helpful to improve the robustness of text comprehension and is a beneficial supplement to the speech recognition technology. In this paper, a text processing model after Chinese Speech Recognition is proposed, which combines a bidirectional long short-term memory (LSTM) network with a conditional random field (CRF) model. The task is divided into two stages: text error detection and text error correction. In this paper, a bidirectional long short-term memory (Bi-LSTM) network and conditional random field are used in two stages of text error detection and text error correction respectively. Through verification and system test on the SIGHAN 2013 Chinese Spelling Check (CSC) dataset, the experimental results show that the model can effectively improve the accuracy of text after speech recognition.

Download Full-text

Long Short-term Memory for Tibetan Speech Recognition

2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) ◽

10.1109/itnec48623.2020.9084681 ◽

2020 ◽

Author(s):

Weizhe Wang ◽

Ziyan Chen ◽

Hongwu Yang

Keyword(s):

Speech Recognition ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Extraction of local and global features by a convolutional neural network–long short-term memory network for diagnosing bearing faults

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/09544062211016505 ◽

2021 ◽

pp. 095440622110165

Author(s):

Zhang Chao ◽

Wang Wei-zhi ◽

Zhang Chen ◽

Fan Bin ◽

Wang Jian-guo ◽

...

Keyword(s):

Neural Network ◽

Fault Diagnosis ◽

Condition Monitoring ◽

Short Term Memory ◽

Vibration Signal ◽

Short Term ◽

Global Features ◽

Term Memory ◽

Long Short Term Memory ◽

Lstm Network

Accurate and reliable fault diagnosis is one of the key and difficult issues in mechanical condition monitoring. In recent years, Convolutional Neural Network (CNN) has been widely used in mechanical condition monitoring, which is also a great breakthrough in the field of bearing fault diagnosis. However, CNN can only extract local features of signals. The model accuracy and generalization of the original vibration signals are very low in the process of vibration signal processing only by CNN. Based on the above problems, this paper improves the traditional convolution layer of CNN, and builds the learning module (local feature learning block, LFLB) of the local characteristics. At the same time, the Long Short-Term Memory (LSTM) is introduced into the network, which is used to extract the global features. This paper proposes the new neural network—improved CNN-LSTM network. The extracted deep feature is used for fault classification. The improved CNN-LSTM network is applied to the processing of the vibration signal of the faulty bearing collected by the bearing failure laboratory of Inner Mongolia University of science and technology. The results show that the accuracy of the improved CNN-LSTM network on the same batch test set is 98.75%, which is about 24% higher than that of the traditional CNN. The proposed network is applied to the bearing data collection of Western Reserve University under the condition that the network parameters remain unchanged. The experiment shows that the improved CNN-LSTM network has better generalization than the traditional CNN.

Download Full-text

Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networks

10.21437/interspeech.2015-717 ◽

2015 ◽

Author(s):

Jie Li ◽

Heng Zhang ◽

Xinyuan Cai ◽

Bo Xu

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Recurrent Neural Networks ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory ◽

End To End ◽

Chinese Mandarin

Download Full-text

Convolutional Grid Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition

Communications in Computer and Information Science - Neural Information Processing ◽

10.1007/978-3-030-36802-9_76 ◽

2019 ◽

pp. 718-726

Author(s):

Jiabin Xue ◽

Tieran Zheng ◽

Jiqing Han

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Recurrent Neural Network ◽

Short Term Memory ◽

Short Term ◽

Term Memory ◽

Long Short Term Memory

Download Full-text

Intelligent Islanding Detection of Microgrids Using Long Short-Term Memory Networks

Energies ◽

10.3390/en14185762 ◽

2021 ◽

Vol 14 (18) ◽

pp. 5762

Author(s):

Syed Basit Ali Bukhari ◽

Khawaja Khalid Mehmood ◽

Abdul Wadood ◽

Herie Park

Keyword(s):

Short Term Memory ◽

Computational Time ◽

Islanding Detection ◽

Phase Voltage ◽

Short Term ◽

Term Memory ◽

Three Phase ◽

Empirical Wavelet Transform ◽

Long Short Term Memory ◽

Lstm Network

This paper presents a new intelligent islanding detection scheme (IIDS) based on empirical wavelet transform (EWT) and long short-term memory (LSTM) network to identify islanding events in microgrids. The concept of EWT is extended to extract features from three-phase signals. First, the three-phase voltage signals sampled at the terminal of targeted distributed energy resource (DER) or point of common coupling (PCC) are decomposed into empirical modes/frequency subbands using EWT. Then, instantaneous amplitudes and instantaneous frequencies of the three-phases at different frequency subbands are combined, and various statistical features are calculated. Finally, the EWT-based features along with the three-phase voltage signals are input to the LSTM network to differentiate between non-islanding and islanding events. To assess the efficacy of the proposed IIDS, extensive simulations are performed on an IEC microgrid and an IEEE 34-node system. The simulation results verify the effectiveness of the proposed IIDS in terms of non-detection zone (NDZ), computational time, detection accuracy, and robustness against noisy measurement. Furthermore, comparisons with existing intelligent methods and different LSTM architectures demonstrate that the proposed IIDS offers higher reliability by significantly reducing the NDZ and stands robust against measurements uncertainty.

Download Full-text