Compact Wake-Up Word Speech Recognition on Embedded Platforms

2014 ◽  
Vol 596 ◽  
pp. 402-405 ◽  
Author(s):  
An Hao Xing ◽  
Ta Li ◽  
Jie Lin Pan ◽  
Yong Hong Yan

The wake-up word speech recognition system is a new paradigm in the field of automatic speech recognition (ASR). This new paradigm is not yet widely recognized but useful in many applications such as mobile phones and smart home systems. In this paper we describe the development of a compact wake-up word recognizer for embedded platforms. To keep resource cost low, a variety of simplification techniques are used. Speech feature observations are compressed to lower dimension and the simple distance-based template matching method is used in place of complex Viterbi scoring. We apply double scoring method to achieve a better performance. To cooperate with double scoring method, the support vector machine classifier is used as well. We were able to accomplish a performance improvement with false rejection rate reduced from 6.88% to 5.50% and false acceptance rate reduced from 8.40% to 3.01%.

2011 ◽  
Vol 467-469 ◽  
pp. 1905-1910
Author(s):  
Jun Feng Zhao ◽  
Ye Ping Zhu

This paper introduces the characteristics and requirements of speech recognition technology based on embedded platform. It also describes the basic theory and related properties of Support Vector Machine. The advantages and disadvantages of the Multiclass SVM algorithms are analyzed, providing the algorithms principles for training and recognition of SVM application in the embedded speech recognition system. Finally, we proposed a design strategy based on multiclass SVM decision tree classifier, combined with the features of the embedded speech recognition.


In order to make fast communication between human and machine, speech recognition system are used. Number of speech recognition systems have been developed by various researchers. For example speech recognition, speaker verification and speaker recognition. The basic stages of speech recognition system are pre-processing, feature extraction and feature selection and classification. Numerous works have been done for improvement of all these stages to get accurate and better results. In this paper the main focus is given to addition of machine learning in speech recognition system. This paper covers architecture of ASR that helps in getting idea about basic stages of speech recognition system. Then focus is given to the use of machine learning in ASR. The work done by various researchers using Support vector machine and artificial neural network is also covered in a section of the paper. Along with this review is presented on work done using SVM, ELM, ANN, Naive Bayes and kNN classifier. The simulation results show that the best accuracy is achieved using ELM classifier. The last section of paper covers the results obtained by using proposed approaches in which SVM, ANN with Cuckoo search algorithm and ANN with back propagation classifier is used. The focus is also on the improvement of pre-processing and feature extraction processes.


2021 ◽  
Vol 11 (19) ◽  
pp. 8842
Author(s):  
Aisha Aiman ◽  
Yao Shen ◽  
Malika Bendechache ◽  
Irum Inayat ◽  
Teerath Kumar

The ongoing development of audio datasets for numerous languages has spurred research activities towards designing smart speech recognition systems. A typical speech recognition system can be applied in many emerging applications, such as smartphone dialing, airline reservations, and automatic wheelchairs, among others. Urdu is a national language of Pakistan and is also widely spoken in many other South Asian countries (e.g., India, Afghanistan). Therefore, we present a comprehensive dataset of spoken Urdu digits ranging from 0 to 9. Our dataset has 25,518 sound samples that are collected from 740 participants. To test the proposed dataset, we apply different existing classification algorithms on the datasets including Support Vector Machine (SVM), Multilayer Perceptron (MLP), and flavors of the EfficientNet. These algorithms serve as a baseline. Furthermore, we propose a convolutional neural network (CNN) for audio digit classification. We conduct the experiment using these networks, and the results show that the proposed CNN is efficient and outperforms the baseline algorithms in terms of classification accuracy.


2014 ◽  
Vol 571-572 ◽  
pp. 205-208
Author(s):  
Guan Yu Li ◽  
Hong Zhi Yu ◽  
Yong Hong Li ◽  
Ning Ma

Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.


Author(s):  
Youllia Indrawaty Nurhasanah ◽  
Irma Amelia Dewi ◽  
Bagus Ade Saputro

Historically, the study of Qur'an in Indonesia evolved along with the spread of Islam. Learning methods of reading the Qur'an have been found ranging from al-Baghdadi, al-Barqi, Qiraati, Iqro', Human, Tartila, and others, which can make it easier to learn to read the Qur'an. Currently, the development of speech recognition technology can be used for the detection of Iqro vol 3 reading pronunciations. Speech recognition consists of two general stages of feature extraction and speech matching. The feature extraction step is used to derive speech-feature and speech-matching stages to compare compatibility between test sound and train voice. The speech recognition method used to recognize Iqro readings is extracting speech signal features using Mel Frequency Cepstral Coefficient (MFCC) and classifying them using Vector Quantization (VQ) to get the appropriate speech results. The result of testing for speech recognition system of Iqro reading has been tested for 30 peoples as a sample of data and there are 6 utterances indicating the information failed, so the system has a success rate of 80%.


2020 ◽  
pp. 1-11
Author(s):  
Qian Hou ◽  
Cuijuan Li ◽  
Min Kang ◽  
Xin Zhao

English feature recognition has a certain influence on the development of English intelligent technology. In particular, the speech recognition technology has the problem of accuracy when performing English feature recognition. In order to improve the English feature recognition effect, this study takes the intelligent learning algorithm as the system algorithm and combines support vector machines to construct an English feature recognition system and uses linear classifiers and nonlinear classifiers to complete the relevant work of subjective recognition. Moreover, spectral subtraction is introduced in the front end of feature extraction, and the spectral amplitude of the noise-free signal is subtracted from the spectral amplitude of the noise to obtain the spectral amplitude of the pure signal. By taking advantage of the insensitivity of speech to the phase, the phase angle information before spectral subtraction is directly used to reconstruct the signal after spectral subtraction to obtain the denoised speech. In addition, this study uses a nonlinear power function that simulates the hearing characteristics of the human ear to extract the features of the denoised speech signal and combines the English features to expand the recognition. Finally, this study analyzes the performance of the algorithm proposed in this study through comparative experiments. The research results show that the algorithm in this paper has a certain effect.


Sign in / Sign up

Export Citation Format

Share Document