Effects of the Dynamic and Energy based Feature Extraction on Hindi Speech Recognition

Background: Speech Recognition is the most effective and suitable way of communication. Extracted features play an important role in speech recognition. Previous research works for Hindi speech recognition lack detailed comparative analysis of the feature extraction methods using dynamic and energy parameters. Objective: The research work presents experimental work done to explore the effects of integrating dynamic coefficients and energy parameters with different feature extraction techniques on Connected word Hindi Speech recognition. As extracted features play a significant role in speech recognition, a comparative analysis is presented to show the effects of integration of dynamic and energy parameters to basic extracted features. Method: Speaker dependent system was proposed with monophones based five states Hidden Markov Model (HMM) using HTK Tool kit. Speech data set of connected words in Hindi was created. The feature extraction techniques such as Linear Predictive Coding Cepstral coefficients (LPCCs), Mel Frequency Cepstral Coefficients (MFCCs), and Perceptual Linear Prediction (PLPs) coefficients were applied integrating delta, delta2, and energy parameters to evaluate the performance of the proposed methodology for speaker dependent recognition. Results: Experimental results show that the system achieved the highest recognition word accuracy of 89.97% using PLP coefficients. The PLP coefficients achieved 4% increment in word accuracy than original MFCCs and 16% increment in word accuracy than LPCCs. Adding energy parameters to original MFCCs increased word accuracy by 1.5% only while adding dynamic coefficients delta and double delta has no significant effect on speech recognition accuracy. Conclusion: Research findings reveal that PLP coefficients outperformed. Explorations reveal that the integration of energy parameters are better than original MFCCs. Investgations also reveal that adding energy parametres improved recognition score while adding delta and delta2 coefficients to basic features did not improve the recognition scores. Research findings could be used to enhance the performance of a speech recognition system by using a suitable feature extraction technique and combining the different feature extraction techniques. Further, investigations can be used to develop language resources for refining speech recognition. The work can be extended to develop a continuous Hindi speech recognition system

Download Full-text

A Comparative Study of Feature Extraction Techniques for Speech Recognition System

International Journal of Innovative Research in Science Engineering and Technology ◽

10.15680/ijirset.2014.0312034 ◽

2014 ◽

Vol 03 (12) ◽

pp. 18006-18016 ◽

Cited By ~ 14

Author(s):

Pratik K. Kurzekar ◽

Ratnadeep R. Deshmukh ◽

Vishal B. Waghmare ◽

Pukhraj P. Shrishrimal

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Comparative Study ◽

Recognition System ◽

Speech Recognition System ◽

Extraction Techniques

Download Full-text

Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.3465 ◽

2020 ◽

Vol 10 (2) ◽

pp. 5547-5553

Author(s):

A. A. Alasadi ◽

T. H. Aldhayni ◽

R. R. Deshmukh ◽

A. H. Alahmadi ◽

A. S. Alshebami

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Group Delay ◽

Recognition System ◽

Support Vector ◽

Speech Recognition System ◽

Mel Frequency Cepstral Coefficients ◽

Delay Function ◽

Cepstral Coefficients ◽

Arabic Speech Recognition

This paper studies three feature extraction methods, Mel-Frequency Cepstral Coefficients (MFCC), Power-Normalized Cepstral Coefficients (PNCC), and Modified Group Delay Function (ModGDF) for the development of an Automated Speech Recognition System (ASR) in Arabic. The Support Vector Machine (SVM) algorithm processed the obtained features. These feature extraction algorithms extract speech or voice characteristics and process the group delay functionality calculated straight from the voice signal. These algorithms were deployed to extract audio forms from Arabic speakers. PNCC provided the best recognition results in Arabic speech in comparison with the other methods. Simulation results showed that PNCC and ModGDF were more accurate than MFCC in Arabic speech recognition.

Download Full-text

Study of robust feature extraction techniques for speech recognition system

2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) ◽

10.1109/ablaze.2015.7154944 ◽

2015 ◽

Cited By ~ 5

Author(s):

Usha Sharma ◽

Sushila Maheshkar ◽

A. N. Mishra

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Extraction Techniques ◽

Robust Feature Extraction

Download Full-text

Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System

2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE) ◽

10.1109/iciteed.2018.8534807 ◽

2018 ◽

Cited By ~ 7

Author(s):

Risanuri Hidayat ◽

Agus Bejo ◽

Sujoko Sumaryono ◽

Anggun Winursito

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Wavelet Transformation ◽

Recognition System ◽

Speech Recognition System

Download Full-text

Feature Extraction for a Speech Recognition System in Noisy Environment: A Study

2010 Second International Conference on Computer Engineering and Applications ◽

10.1109/iccea.2010.76 ◽

2010 ◽

Cited By ~ 2

Author(s):

Urmila Shrawankar ◽

Vilas Thakare

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Noisy Environment

Download Full-text

Bottleneck Feature Extraction in Punjabi Adult Speech Recognition System

Innovations in Computer Science and Engineering - Lecture Notes in Networks and Systems ◽

10.1007/978-981-33-4543-0_53 ◽

2021 ◽

pp. 493-501

Author(s):

Shashi Bala ◽

Virender Kadyan ◽

Vivek Bhardwaj

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Recognition System ◽

Speech Recognition System

Download Full-text

Database Creation and Dialect-Wise Comparative Analysis of Prosodic Features for Punjabi Language

Journal of Intelligent Systems ◽

10.1515/jisys-2019-2511 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1275-1282

Author(s):

Shipra J. Arora ◽

Rishipal Singh

Keyword(s):

Comparative Analysis ◽

Speech Recognition ◽

Significant Variation ◽

Recognition System ◽

Speech Recognition System ◽

Prosodic Features ◽

Speech Corpus ◽

Male And Female ◽

Distinctive Features ◽

Language Analysis

Abstract The paper represents a Punjabi corpus in the agriculture domain. There are various dialects in the Punjabi language and the main concentration is on major dialects, i.e. Majhi, Malwai and Doabi for the present study. A speech corpus of 125 isolated words is taken into consideration. These words are uttered by 100 speakers, i.e. 60 Malwi dialect speakers (30 male and 30 female), 20 Majhi dialect speakers (10 male and 10 female) and 20 Doabi dialect speakers (10 male and 10 female). Tonemes, adhak (geminated) and nasal words are selected from the corpus. Recordings have been processed through two mediums. The paper also elaborates some distinctive features of the corpus. This corpus is of quite significance for the speech recognition system. Prosodic characteristics such as intonation, rhythm and stress create a crucial impact on the speech recognition system. These characteristics vary from language to language as well as various dialects of a language. This paper portrays a comparative analysis of isolated words prosodic features of Malwi, Majhi and Doabi dialects of Punjabi language. Analysis is done using the PRAAT tool. Pitch, intensity, formant I and formant II values are extracted for toneme, adhak, nasal (bindi) and nasal (tippi) words. For all kinds of words, there is a significant variation in pitch (fundamental frequency), intensity, formant I and formant II values of male and female speakers of Malwi, Majhi and Doabi dialects. A detailed analysis has been discussed throughout this paper.

Download Full-text

A low-power, fixed-point, front-end feature extraction for a distributed speech recognition system

IEEE International Conference on Acoustics Speech and Signal Processing ◽

10.1109/icassp.2002.1005859 ◽

2002 ◽

Cited By ~ 8

Author(s):

Delaney ◽

Jayant ◽

Hans ◽

Simunic ◽

Acquaviva

Keyword(s):

Feature Extraction ◽

Fixed Point ◽

Speech Recognition ◽

Low Power ◽

Recognition System ◽

Speech Recognition System ◽

Distributed Speech Recognition ◽

Front End

Download Full-text

Hierarchical speech recognition system using MFCC feature extraction and dynamic spiking RSOM

15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) ◽

10.1109/snpd.2014.6888680 ◽

2014 ◽

Cited By ~ 1

Author(s):

Behi Tarek ◽

Arous Najet ◽

Ellouze Noureddine

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Recognition System ◽

Speech Recognition System

Download Full-text

A Robust Isolated Automatic Speech Recognition System using Machine Learning Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8765.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 2325-2331

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Speech Recognition ◽

Speaker Recognition ◽

Search Algorithm ◽

Recognition System ◽

Machine Learning Techniques ◽

Support Vector ◽

Speech Recognition System ◽

Work Done

In order to make fast communication between human and machine, speech recognition system are used. Number of speech recognition systems have been developed by various researchers. For example speech recognition, speaker verification and speaker recognition. The basic stages of speech recognition system are pre-processing, feature extraction and feature selection and classification. Numerous works have been done for improvement of all these stages to get accurate and better results. In this paper the main focus is given to addition of machine learning in speech recognition system. This paper covers architecture of ASR that helps in getting idea about basic stages of speech recognition system. Then focus is given to the use of machine learning in ASR. The work done by various researchers using Support vector machine and artificial neural network is also covered in a section of the paper. Along with this review is presented on work done using SVM, ELM, ANN, Naive Bayes and kNN classifier. The simulation results show that the best accuracy is achieved using ELM classifier. The last section of paper covers the results obtained by using proposed approaches in which SVM, ANN with Cuckoo search algorithm and ANN with back propagation classifier is used. The focus is also on the improvement of pre-processing and feature extraction processes.

Download Full-text