Implementation of Embedded Unspecific Continuous English Speech Recognition Based on HMM

Author(s):  
Xiaoli Lu ◽  
Mohd Asif Shah

Background: Human-computer interaction plays a vital role through Natural Language Conversational Interfaces to improve the usage of computers. Speech recognition technology allows the machine to understand human language. A speech recognition algorithm is used to achieve this function. Methodology: This paper is mainly based on the fundamental theoretical research of speech signals, establishes the HMM model, uses speech collection, recognition, and other methods, simulates on MATLAB, and integrates the recognition system ported to ARM for debugging and running to realize the embedded speech recognition function based on HMM under the ARM platform. Conclusion: The conclusion shows that the HMM-based embedded unspecific continuous English speech recognition system has high recognition accuracy and fast speed.

Author(s):  
Na Wang ◽  
Xiaohong Zhang ◽  
Ashutosh Sharma

: The computer assisted speech recognition system enabling voice recognition for understanding the spoken words using sound digitization is extensively being used in the field of education, scientific research, industry, etc. This article unveils the technological perspective of automated speech recognition system in order to realize the spoken English speech recognition system based on MATLAB. A speech recognition technology has been designed and implemented in this work which can collect the speech signals of the spoken English learning system and then filter those speech signals. This paper mainly adopts the preprocessing module for the processing of the raw speech data collected utilizing the MATLAB commands. The method of feature extraction is based on HMM model, codebook generation and template training. The research results show that the recognition accuracy of 98% is achieved by the spoken English speech recognition system studied in this paper. It can be seen that the spoken English speech recognition system based on MATLAB has high recognition accuracy and fast speed. This work addresses the current research issued needed to be tackled in the speech recognition field. This approach is able to provide the technical support and interface for the spoken English learning system.


2014 ◽  
Vol 926-930 ◽  
pp. 1729-1732
Author(s):  
Sha Yang ◽  
Tian Hu ◽  
Yun Lu Zhang

After about 50 years of development, speech recognition technology has been able to achieve large vocabulary, non-specific human continuous speech recognition system. On account of Chinese pronunciation features, we research the small vocabulary, non-specific Chinese speech recognition based on continuous Hidden Markov Model approach. With comparing the datasets of VQ/DTW, VQ/DHMM, CHMM state-1 recognition algorithm and CHMM state-2 recognition algorithm, the results of our experiment show that: (1) CHMM state-2 branch method performs primely in reduction of the recognition time; and (2) the recognition accuracy is improved eventually.


2010 ◽  
Vol 44-47 ◽  
pp. 1422-1426
Author(s):  
Mei Juan Gao ◽  
Zhi Xin Yang

In this paper, based on the study of two speech recognition algorithms, two designs of speech recognition system are given to realize this isolated speech recognition mobile robot control system based on ARM9 processor. The speech recognition process includes pretreatment of speech signal, characteristic extrication, pattern matching and post-processing. Mel-Frequency cepstrum coefficients (MFCC) and linear prediction cepstrum coefficients (LPCC) are the two most common parameters. Through analysis and comparison the parameters, MFCC shows more noise immunity than LPCC, so MFCC is selected as the characteristic parameters. Both dynamic time warping (DTW) and hidden markov model (HMM) are commonly used algorithm. For the different characteristics of DTW and HMM recognition algorithm, two different programs were designed for mobile robot control system. The effect and speed of the two speech recognition system were analyzed and compared.


Author(s):  
Noboru Hayasaka

Although many noise-robust techniques have been presented, the improvement under low SNR condition is still insufficient. The purpose of this paper is to achieve the high recognition accuracy under low SNR condition with low calculation costs. Therefore, this paper proposes a novel noise-robust speech recognition system that makes full use of spectral subtraction (SS), mean variance normalization (MVN), temporal filtering (TF), and multi-condition HMMs (MC-HMMs). First, from the results of SS with clean HMMs, we obtained the improvement from 46.61% to 65.71% under 0 dB SNR condition. Then, SS+ MVN+TF with clean HMMs improved the recognition accuracy from 65.71% to 80.97% under the same SNR condition. Finally, we achieved the further improvement from 80.97% to 92.23% by employing SS+MVN+TF with MC-HMMs.


Sensors ◽  
2020 ◽  
Vol 20 (15) ◽  
pp. 4091
Author(s):  
Musong Gu ◽  
Kuan-Ching Li ◽  
Zhongwen Li ◽  
Qiyi Han ◽  
Wenjie Fan

The original pattern recognition and classification of crop diseases needs to collect a large amount of data in the field and send them next to a computer server through the network for recognition and classification. This method usually takes a long time, is expensive, and is difficult to carry out for timely monitoring of crop diseases, causing delays to diagnosis and treatment. With the emergence of edge computing, one can attempt to deploy the pattern recognition algorithm to the farmland environment and monitor the growth of crops promptly. However, due to the limited resources of the edge device, the original deep recognition model is challenging to apply. Due to this, in this article, a recognition model based on a depthwise separable convolutional neural network (DSCNN) is proposed, which operation particularities include a significant reduction in the number of parameters and the amount of computation, making the proposed design well suited for the edge. To show its effectiveness, simulation results are compared with the main convolution neural network (CNN) models LeNet and Visual Geometry Group Network (VGGNet) and show that, based on high recognition accuracy, the recognition time of the proposed model is reduced by 80.9% and 94.4%, respectively. Given its fast recognition speed and high recognition accuracy, the model is suitable for the real-time monitoring and recognition of crop diseases by provisioning remote embedded equipment and deploying the proposed model using edge computing.


2020 ◽  
Vol 13 (4) ◽  
pp. 650-656
Author(s):  
Somayeh Khajehasani ◽  
Louiza Dehyadegari

Background: Today, the automatic intelligent system requirement has caused an increasing consideration on the interactive modern techniques between human being and machine. These techniques generally consist of two types: audio and visual methods. Meanwhile, the need for developing the algorithms that enable the human speech recognition by machine is of high importance and frequently studied by the researchers. Objective: Using artificial intelligence methods has led to better results in human speech recognition, but the basic problem is the lack of an appropriate strategy to select the recognition data among the huge amount of speech information that practically makes it impossible for the available algorithms to work. Method: In this article, to solve the problem, the linear predictive coding coefficients extraction method is used to sum up the data related to the English digits pronunciation. After extracting the database, it is utilized to an Elman neural network to recognize the relation between the linear coding coefficients of an audio file with the pronounced digit. Results: The results show that this method has a good performance compared to other methods. According to the experiments, the obtained results of network training (99% recognition accuracy) indicate that the network still has better performance than RBF despite many errors. Conclusion: The results of the experiments showed that the Elman memory neural network has had an acceptable performance in recognizing the speech signal compared to the other algorithms. The use of the linear predictive coding coefficients along with the Elman neural network has led to higher recognition accuracy and improved the speech recognition system.


2020 ◽  
Vol 10 (13) ◽  
pp. 4602
Author(s):  
Moa Lee ◽  
Joon-Hyuk Chang

Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.


2019 ◽  
Vol 8 (3) ◽  
pp. 7827-7831

Kannada is the regional language of India spoken in Karnataka. This paper presents development of continuous kannada speech recognition system using monophone modelling and triphone modelling using HTK. Mel Frequency Cepstral Coefficient (MFCC) is used as feature extractor, exploits cepstral and perceptual frequency scale leads good recognition accuracy. Hidden Markov Model is used as classifier. In this paper Gaussian mixture splitting is done that captures the variations of the phones. The paper presents performance of continuous Kannada Automatic Speech Recognition (ASR) system with respect to 2, 4,8,16 and 32 Gaussian mixtures with monophone and context dependent tri-phone modelling. The experimental result shows that good recognition accuracy is achieved for context dependent tri-phone modelling than monophone modelling as the number Gaussian mixture is increased.


Sign in / Sign up

Export Citation Format

Share Document