Implementation of Embedded Unspecific Continuous English Speech Recognition Based on HMM

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096514666210715144717 ◽

2021 ◽

Vol 14 ◽

Author(s):

Xiaoli Lu ◽

Mohd Asif Shah

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Recognition System ◽

Vital Role ◽

Recognition Algorithm ◽

Theoretical Research ◽

Fast Speed ◽

Conversational Interfaces ◽

Recognition Function ◽

High Recognition Accuracy

Background: Human-computer interaction plays a vital role through Natural Language Conversational Interfaces to improve the usage of computers. Speech recognition technology allows the machine to understand human language. A speech recognition algorithm is used to achieve this function. Methodology: This paper is mainly based on the fundamental theoretical research of speech signals, establishes the HMM model, uses speech collection, recognition, and other methods, simulates on MATLAB, and integrates the recognition system ported to ARM for debugging and running to realize the embedded speech recognition function based on HMM under the ARM platform. Conclusion: The conclusion shows that the HMM-based embedded unspecific continuous English speech recognition system has high recognition accuracy and fast speed.

Download Full-text

A Research on HMM based Speech Recognition in Spoken English

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096514666210413122517 ◽

2021 ◽

Vol 14 ◽

Author(s):

Na Wang ◽

Xiaohong Zhang ◽

Ashutosh Sharma

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Recognition System ◽

Learning System ◽

Computer Assisted ◽

Speech Signals ◽

English Learning ◽

Speech Recognition System ◽

Spoken English ◽

High Recognition Accuracy

: The computer assisted speech recognition system enabling voice recognition for understanding the spoken words using sound digitization is extensively being used in the field of education, scientific research, industry, etc. This article unveils the technological perspective of automated speech recognition system in order to realize the spoken English speech recognition system based on MATLAB. A speech recognition technology has been designed and implemented in this work which can collect the speech signals of the spoken English learning system and then filter those speech signals. This paper mainly adopts the preprocessing module for the processing of the raw speech data collected utilizing the MATLAB commands. The method of feature extraction is based on HMM model, codebook generation and template training. The research results show that the recognition accuracy of 98% is achieved by the spoken English speech recognition system studied in this paper. It can be seen that the spoken English speech recognition system based on MATLAB has high recognition accuracy and fast speed. This work addresses the current research issued needed to be tackled in the speech recognition field. This approach is able to provide the technical support and interface for the spoken English learning system.

Download Full-text

Keyword Recognition Based on MFCC

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.1729 ◽

2014 ◽

Vol 926-930 ◽

pp. 1729-1732

Author(s):

Sha Yang ◽

Tian Hu ◽

Yun Lu Zhang

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Hidden Markov ◽

Recognition System ◽

Recognition Algorithm ◽

Speech Recognition System ◽

Continuous Speech Recognition ◽

Recognition Time ◽

State 1 ◽

Model Approach

After about 50 years of development, speech recognition technology has been able to achieve large vocabulary, non-specific human continuous speech recognition system. On account of Chinese pronunciation features, we research the small vocabulary, non-specific Chinese speech recognition based on continuous Hidden Markov Model approach. With comparing the datasets of VQ/DTW, VQ/DHMM, CHMM state-1 recognition algorithm and CHMM state-2 recognition algorithm, the results of our experiment show that: (1) CHMM state-2 branch method performs primely in reduction of the recognition time; and (2) the recognition accuracy is improved eventually.

Download Full-text

Research and Realization on the Voice Command Recognition System for Robot Control Based on ARM9

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.1422 ◽

2010 ◽

Vol 44-47 ◽

pp. 1422-1426

Author(s):

Mei Juan Gao ◽

Zhi Xin Yang

Keyword(s):

Control System ◽

Speech Recognition ◽

Mobile Robot ◽

Robot Control ◽

Linear Prediction ◽

Recognition System ◽

Recognition Algorithm ◽

Speech Recognition System ◽

Mobile Robot Control ◽

Robot Control System

In this paper, based on the study of two speech recognition algorithms, two designs of speech recognition system are given to realize this isolated speech recognition mobile robot control system based on ARM9 processor. The speech recognition process includes pretreatment of speech signal, characteristic extrication, pattern matching and post-processing. Mel-Frequency cepstrum coefficients (MFCC) and linear prediction cepstrum coefficients (LPCC) are the two most common parameters. Through analysis and comparison the parameters, MFCC shows more noise immunity than LPCC, so MFCC is selected as the characteristic parameters. Both dynamic time warping (DTW) and hidden markov model (HMM) are commonly used algorithm. For the different characteristics of DTW and HMM recognition algorithm, two different programs were designed for mobile robot control system. The effect and speed of the two speech recognition system were analyzed and compared.

Download Full-text

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.201261.54324 ◽

1970 ◽

Vol 6 (1) ◽

pp. 81-88

Author(s):

Noboru Hayasaka

Keyword(s):

Recognition Accuracy ◽

Recognition System ◽

Temporal Filtering ◽

Low Snr ◽

Robust Techniques ◽

Noise Robust Speech Recognition ◽

Isolated Word ◽

High Recognition Accuracy ◽

Noise Robust ◽

Mean Variance

Although many noise-robust techniques have been presented, the improvement under low SNR condition is still insufficient. The purpose of this paper is to achieve the high recognition accuracy under low SNR condition with low calculation costs. Therefore, this paper proposes a novel noise-robust speech recognition system that makes full use of spectral subtraction (SS), mean variance normalization (MVN), temporal filtering (TF), and multi-condition HMMs (MC-HMMs). First, from the results of SS with clean HMMs, we obtained the improvement from 46.61% to 65.71% under 0 dB SNR condition. Then, SS+ MVN+TF with clean HMMs improved the recognition accuracy from 65.71% to 80.97% under the same SNR condition. Finally, we achieved the further improvement from 80.97% to 92.23% by employing SS+MVN+TF with MC-HMMs.

Download Full-text

Recognition of Crop Diseases Based on Depthwise Separable Convolution in Edge Computing

Sensors ◽

10.3390/s20154091 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4091

Author(s):

Musong Gu ◽

Kuan-Ching Li ◽

Zhongwen Li ◽

Qiyi Han ◽

Wenjie Fan

Keyword(s):

Neural Network ◽

Pattern Recognition ◽

Recognition Accuracy ◽

Recognition Algorithm ◽

Edge Computing ◽

Recognition Model ◽

Fast Recognition ◽

Proposed Model ◽

Crop Diseases ◽

High Recognition Accuracy

The original pattern recognition and classification of crop diseases needs to collect a large amount of data in the field and send them next to a computer server through the network for recognition and classification. This method usually takes a long time, is expensive, and is difficult to carry out for timely monitoring of crop diseases, causing delays to diagnosis and treatment. With the emergence of edge computing, one can attempt to deploy the pattern recognition algorithm to the farmland environment and monitor the growth of crops promptly. However, due to the limited resources of the edge device, the original deep recognition model is challenging to apply. Due to this, in this article, a recognition model based on a depthwise separable convolutional neural network (DSCNN) is proposed, which operation particularities include a significant reduction in the number of parameters and the amount of computation, making the proposed design well suited for the edge. To show its effectiveness, simulation results are compared with the main convolution neural network (CNN) models LeNet and Visual Geometry Group Network (VGGNet) and show that, based on high recognition accuracy, the recognition time of the proposed model is reduced by 80.9% and 94.4%, respectively. Given its fast recognition speed and high recognition accuracy, the model is suitable for the real-time monitoring and recognition of crop diseases by provisioning remote embedded equipment and deploying the proposed model using edge computing.

Download Full-text

Speech Recognition Using Elman Artificial Neural Network and Linear Predictive Coding

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190411113728 ◽

2020 ◽

Vol 13 (4) ◽

pp. 650-656

Author(s):

Somayeh Khajehasani ◽

Louiza Dehyadegari

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Intelligent System ◽

Recognition Accuracy ◽

Predictive Coding ◽

Recognition System ◽

Visual Methods ◽

Linear Predictive Coding ◽

Elman Neural Network ◽

Human Speech

Background: Today, the automatic intelligent system requirement has caused an increasing consideration on the interactive modern techniques between human being and machine. These techniques generally consist of two types: audio and visual methods. Meanwhile, the need for developing the algorithms that enable the human speech recognition by machine is of high importance and frequently studied by the researchers. Objective: Using artificial intelligence methods has led to better results in human speech recognition, but the basic problem is the lack of an appropriate strategy to select the recognition data among the huge amount of speech information that practically makes it impossible for the available algorithms to work. Method: In this article, to solve the problem, the linear predictive coding coefficients extraction method is used to sum up the data related to the English digits pronunciation. After extracting the database, it is utilized to an Elman neural network to recognize the relation between the linear coding coefficients of an audio file with the pronounced digit. Results: The results show that this method has a good performance compared to other methods. According to the experiments, the obtained results of network training (99% recognition accuracy) indicate that the network still has better performance than RBF despite many errors. Conclusion: The results of the experiments showed that the Elman memory neural network has had an acceptable performance in recognizing the speech signal compared to the other algorithms. The use of the linear predictive coding coefficients along with the Elman neural network has led to higher recognition accuracy and improved the speech recognition system.

Download Full-text

Speech recognition system with an accurate recognition function

The Journal of the Acoustical Society of America ◽

10.1121/1.404136 ◽

1992 ◽

Vol 92 (6) ◽

pp. 3458-3458

Author(s):

Takaaki Ishii ◽

Toru Kuge

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Recognition Function

Download Full-text

Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots

Applied Sciences ◽

10.3390/app10134602 ◽

2020 ◽

Vol 10 (13) ◽

pp. 4602

Author(s):

Moa Lee ◽

Joon-Hyuk Chang

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Recognition Algorithm ◽

Internal Layers ◽

Acoustic Features ◽

Word Error Rate ◽

State Information ◽

Intelligent Robots ◽

State Dependent ◽

Latent Features

Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.

Download Full-text

Effects of recognition accuracy and vocabulary size of a speech recognition system on task performance and user acceptance

Applied Ergonomics ◽

10.1016/0003-6870(91)90427-j ◽

1991 ◽

Vol 22 (5) ◽

pp. 352

Keyword(s):

Speech Recognition ◽

Task Performance ◽

Recognition Accuracy ◽

Recognition System ◽

User Acceptance ◽

Speech Recognition System ◽

Vocabulary Size

Download Full-text

Continuous Speech Recognition System for Kannada Language with Triphone Modelling using HTK

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5394.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 7827-7831

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Gaussian Mixture ◽

Recognition System ◽

Experimental Result ◽

Speech Recognition System ◽

Continuous Speech Recognition ◽

Frequency Scale ◽

Context Dependent ◽

Mel Frequency Cepstral Coefficient

Kannada is the regional language of India spoken in Karnataka. This paper presents development of continuous kannada speech recognition system using monophone modelling and triphone modelling using HTK. Mel Frequency Cepstral Coefficient (MFCC) is used as feature extractor, exploits cepstral and perceptual frequency scale leads good recognition accuracy. Hidden Markov Model is used as classifier. In this paper Gaussian mixture splitting is done that captures the variations of the phones. The paper presents performance of continuous Kannada Automatic Speech Recognition (ASR) system with respect to 2, 4,8,16 and 32 Gaussian mixtures with monophone and context dependent tri-phone modelling. The experimental result shows that good recognition accuracy is achieved for context dependent tri-phone modelling than monophone modelling as the number Gaussian mixture is increased.

Download Full-text