Effects of the Dynamic and Energy based Feature Extraction on Hindi Speech Recognition

Author(s):  
Shobha Bhatt ◽  
Amita Dev ◽  
Anurag Jain

Background: Speech Recognition is the most effective and suitable way of communication. Extracted features play an important role in speech recognition. Previous research works for Hindi speech recognition lack detailed comparative analysis of the feature extraction methods using dynamic and energy parameters. Objective: The research work presents experimental work done to explore the effects of integrating dynamic coefficients and energy parameters with different feature extraction techniques on Connected word Hindi Speech recognition. As extracted features play a significant role in speech recognition, a comparative analysis is presented to show the effects of integration of dynamic and energy parameters to basic extracted features. Method: Speaker dependent system was proposed with monophones based five states Hidden Markov Model (HMM) using HTK Tool kit. Speech data set of connected words in Hindi was created. The feature extraction techniques such as Linear Predictive Coding Cepstral coefficients (LPCCs), Mel Frequency Cepstral Coefficients (MFCCs), and Perceptual Linear Prediction (PLPs) coefficients were applied integrating delta, delta2, and energy parameters to evaluate the performance of the proposed methodology for speaker dependent recognition. Results: Experimental results show that the system achieved the highest recognition word accuracy of 89.97% using PLP coefficients. The PLP coefficients achieved 4% increment in word accuracy than original MFCCs and 16% increment in word accuracy than LPCCs. Adding energy parameters to original MFCCs increased word accuracy by 1.5% only while adding dynamic coefficients delta and double delta has no significant effect on speech recognition accuracy. Conclusion: Research findings reveal that PLP coefficients outperformed. Explorations reveal that the integration of energy parameters are better than original MFCCs. Investgations also reveal that adding energy parametres improved recognition score while adding delta and delta2 coefficients to basic features did not improve the recognition scores. Research findings could be used to enhance the performance of a speech recognition system by using a suitable feature extraction technique and combining the different feature extraction techniques. Further, investigations can be used to develop language resources for refining speech recognition. The work can be extended to develop a continuous Hindi speech recognition system

2020 ◽  
Vol 10 (2) ◽  
pp. 5547-5553
Author(s):  
A. A. Alasadi ◽  
T. H. Aldhayni ◽  
R. R. Deshmukh ◽  
A. H. Alahmadi ◽  
A. S. Alshebami

This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.


2019 ◽  
Vol 29 (1) ◽  
pp. 1275-1282
Author(s):  
Shipra J. Arora ◽  
Rishipal Singh

Abstract The paper represents a Punjabi corpus in the agriculture domain. There are various dialects in the Punjabi language and the main concentration is on major dialects, i.e. Majhi, Malwai and Doabi for the present study. A speech corpus of 125 isolated words is taken into consideration. These words are uttered by 100 speakers, i.e. 60 Malwi dialect speakers (30 male and 30 female), 20 Majhi dialect speakers (10 male and 10 female) and 20 Doabi dialect speakers (10 male and 10 female). Tonemes, adhak (geminated) and nasal words are selected from the corpus. Recordings have been processed through two mediums. The paper also elaborates some distinctive features of the corpus. This corpus is of quite significance for the speech recognition system. Prosodic characteristics such as intonation, rhythm and stress create a crucial impact on the speech recognition system. These characteristics vary from language to language as well as various dialects of a language. This paper portrays a comparative analysis of isolated words prosodic features of Malwi, Majhi and Doabi dialects of Punjabi language. Analysis is done using the PRAAT tool. Pitch, intensity, formant I and formant II values are extracted for toneme, adhak, nasal (bindi) and nasal (tippi) words. For all kinds of words, there is a significant variation in pitch (fundamental frequency), intensity, formant I and formant II values of male and female speakers of Malwi, Majhi and Doabi dialects. A detailed analysis has been discussed throughout this paper.


In order to make fast communication between human and machine, speech recognition system are used. Number of speech recognition systems have been developed by various researchers. For example speech recognition, speaker verification and speaker recognition. The basic stages of speech recognition system are pre-processing, feature extraction and feature selection and classification. Numerous works have been done for improvement of all these stages to get accurate and better results. In this paper the main focus is given to addition of machine learning in speech recognition system. This paper covers architecture of ASR that helps in getting idea about basic stages of speech recognition system. Then focus is given to the use of machine learning in ASR. The work done by various researchers using Support vector machine and artificial neural network is also covered in a section of the paper. Along with this review is presented on work done using SVM, ELM, ANN, Naive Bayes and kNN classifier. The simulation results show that the best accuracy is achieved using ELM classifier. The last section of paper covers the results obtained by using proposed approaches in which SVM, ANN with Cuckoo search algorithm and ANN with back propagation classifier is used. The focus is also on the improvement of pre-processing and feature extraction processes.


Sign in / Sign up

Export Citation Format

Share Document