Design and Implementation of Embedded Real-Time English Speech Recognition System Based on Big Data Analysis

This article uses Field Programmable Gate Array (FPGA) as a carrier and uses IP core to form a System on Programmable Chip (SOPC) English speech recognition system. The SOPC system uses a modular hardware system design method. Except for the independent development of the hardware acceleration module and its control module, the other modules are implemented by software or IP provided by Xilinx development tools. Hardware acceleration IP adopts a top-down design method, provides parallel operation of multiple operation components, and uses pipeline technology, which speeds up data operation, so that only one operation cycle is required to obtain an operation result. In terms of recognition algorithm, a more effective training algorithm is proposed, Genetic Continuous Hidden Markov Model (GA_CHMM), which uses genetic algorithm to directly train CHMM model. It is to find the optimal model by encoding the parameter values of the CHMM and performing operations such as selection, crossover, and mutation according to the fitness function. The optimal parameter value after decoding corresponds to the CHMM model, and then the English speech recognition is performed through the CHMM algorithm. This algorithm can save a lot of training time, thereby improving the recognition rate and speed. This paper studies the optimization of embedded system software. By studying the fixed-point software algorithm and the optimization of system storage space, the real-time response speed of the system has been reduced from about 10 seconds to an average of 220 milliseconds. Through the optimization of the CHMM algorithm, the real-time performance of the system is improved again, and the average time to complete the recognition is significantly shortened. At the same time, the system can achieve a recognition rate of over 90% when the English speech vocabulary is less than 200.

Download Full-text

Speech recognition system for embedded real-time applications

2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) ◽

10.1109/isspit.2009.5407487 ◽

2009 ◽

Cited By ~ 5

Author(s):

Octavian Cheng ◽

Waleed Abdulla ◽

Zoran Salcic

Keyword(s):

Speech Recognition ◽

Real Time ◽

Recognition System ◽

Speech Recognition System ◽

Real Time Applications

Download Full-text

A real time, speaker independent, speech recognition system

COMSIG 1991 Proceedings: South African Symposium on Communications and Signal Processing ◽

10.1109/comsig.1991.278235 ◽

2002 ◽

Cited By ~ 4

Author(s):

G. van Wyk ◽

I.H.J. Nel ◽

W. Coetzer

Keyword(s):

Speech Recognition ◽

Real Time ◽

Recognition System ◽

Speech Recognition System ◽

Speaker Independent

Download Full-text

User independent real-time speech recognition system and method

The Journal of the Acoustical Society of America ◽

10.1121/1.421159 ◽

1998 ◽

Vol 103 (2) ◽

pp. 648

Author(s):

C. Hal Hansen

Keyword(s):

Speech Recognition ◽

Real Time ◽

Recognition System ◽

Speech Recognition System

Download Full-text

Enhanced Automatic Speech Recognition System Based on Enhancing Power-Normalized Cepstral Coefficients

Applied Sciences ◽

10.3390/app9102166 ◽

2019 ◽

Vol 9 (10) ◽

pp. 2166 ◽

Cited By ~ 3

Author(s):

Mohamed Tamazin ◽

Ahmed Gouda ◽

Mohamed Khedr

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Additive White Gaussian Noise ◽

Recognition Rate ◽

Data Entry ◽

Recognition System ◽

Speech Recognition System ◽

Automatic Speech Recognition System ◽

Different Types ◽

A New Technique

Many new consumer applications are based on the use of automatic speech recognition (ASR) systems, such as voice command interfaces, speech-to-text applications, and data entry processes. Although ASR systems have remarkably improved in recent decades, the speech recognition system performance still significantly degrades in the presence of noisy environments. Developing a robust ASR system that can work in real-world noise and other acoustic distorting conditions is an attractive research topic. Many advanced algorithms have been developed in the literature to deal with this problem; most of these algorithms are based on modeling the behavior of the human auditory system with perceived noisy speech. In this research, the power-normalized cepstral coefficient (PNCC) system is modified to increase robustness against the different types of environmental noises, where a new technique based on gammatone channel filtering combined with channel bias minimization is used to suppress the noise effects. The TIDIGITS database is utilized to evaluate the performance of the proposed system in comparison to the state-of-the-art techniques in the presence of additive white Gaussian noise (AWGN) and seven different types of environmental noises. In this research, one word is recognized from a set containing 11 possibilities only. The experimental results showed that the proposed method provides significant improvements in the recognition accuracy at low signal to noise ratios (SNR). In the case of subway noise at SNR = 5 dB, the proposed method outperforms the mel-frequency cepstral coefficient (MFCC) and relative spectral (RASTA)–perceptual linear predictive (PLP) methods by 55% and 47%, respectively. Moreover, the recognition rate of the proposed method is higher than the gammatone frequency cepstral coefficient (GFCC) and PNCC methods in the case of car noise. It is enhanced by 40% in comparison to the GFCC method at SNR 0dB, while it is improved by 20% in comparison to the PNCC method at SNR −5dB.

Download Full-text