A Speech Recognition Algorithm of Speaker-Independent Chinese Isolated Words Based on RNN-LSTM and Attention Mechanism

This paper proposes a concept of a new approach to the development of speech recognition systems using multi-agent neurocognitive modeling. The fundamental foundations of these developments are based on the theory of cognitive psychology and neuroscience, and advances in computer science. The purpose of this work is the development of general theoretical principles of sound image recognition by an intelligent robot and, as the sequence, the development of a universal system of automatic speech recognition, resistant to speech variability, not only with respect to the individual characteristics of the speaker, but also with respect to the diversity of accents. Based on the analysis of experimental data obtained from behavioral studies, as well as theoretical model ideas about the mechanisms of speech recognition from the point of view of psycholinguistic knowledge, an algorithm resistant to variety of accents for machine learning with imitation of the formation of a person’s phonemic hearing has been developed.

Download Full-text

Robust DTW-based recognition algorithm for hand-held consumer devices [speech recognition]

2005 Digest of Technical Papers. International Conference on Consumer Electronics, 2005. ICCE. ◽

10.1109/icce.2005.1429903 ◽

2005 ◽

Cited By ~ 1

Author(s):

Chanwoo Kim ◽

Kwang-deok Seo

Keyword(s):

Speech Recognition ◽

Recognition Algorithm ◽

Consumer Devices

Download Full-text

End-to-end recognition of streaming Japanese speech using CTC and local attention

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2020.23 ◽

2020 ◽

Vol 9 ◽

Author(s):

Jiahao Chen ◽

Ryota Nishimura ◽

Norihide Kitaoka

Keyword(s):

Speech Recognition ◽

Recognition Performance ◽

Time Lag ◽

Recognition Algorithm ◽

Streaming Data ◽

Continuous Speech Recognition ◽

Voice Input ◽

Sequence Modeling ◽

End To End ◽

Bidirectional Networks

Many end-to-end, large vocabulary, continuous speech recognition systems are now able to achieve better speech recognition performance than conventional systems. Most of these approaches are based on bidirectional networks and sequence-to-sequence modeling however, so automatic speech recognition (ASR) systems using such techniques need to wait for an entire segment of voice input to be entered before they can begin processing the data, resulting in a lengthy time-lag, which can be a serious drawback in some applications. An obvious solution to this problem is to develop a speech recognition algorithm capable of processing streaming data. Therefore, in this paper we explore the possibility of a streaming, online, ASR system for Japanese using a model based on unidirectional LSTMs trained using connectionist temporal classification (CTC) criteria, with local attention. Such an approach has not been well investigated for use with Japanese, as most Japanese-language ASR systems employ bidirectional networks. The best result for our proposed system during experimental evaluation was a character error rate of 9.87%.

Download Full-text