scholarly journals Comparative Analysis of Methods Used to Extract Speech Signal Features

Author(s):  
Ziad A. Alqadi ◽  
Sayel Shareef Rimawi

The stage of extracting the features of the speech file is one of the most important stages of building a system for identifying a person through the use of his voice. Accordingly, the choice of the method of extracting speech features is an important process because of its subsequent negative or positive effects on the speech recognition system. In this paper research we will analyze the most popular methods of speech signal features extraction: LPC, Kmeans clustering, WPT decomposition and MLBP methods. These methods will be implemented and tested using various speech files. The amplitude and sampling frequency will be changed to see the affects of changing on the extracted features. Depending on the results of analysis some recommendations will be given.

2019 ◽  
Vol 29 (1) ◽  
pp. 1275-1282
Author(s):  
Shipra J. Arora ◽  
Rishipal Singh

Abstract The paper represents a Punjabi corpus in the agriculture domain. There are various dialects in the Punjabi language and the main concentration is on major dialects, i.e. Majhi, Malwai and Doabi for the present study. A speech corpus of 125 isolated words is taken into consideration. These words are uttered by 100 speakers, i.e. 60 Malwi dialect speakers (30 male and 30 female), 20 Majhi dialect speakers (10 male and 10 female) and 20 Doabi dialect speakers (10 male and 10 female). Tonemes, adhak (geminated) and nasal words are selected from the corpus. Recordings have been processed through two mediums. The paper also elaborates some distinctive features of the corpus. This corpus is of quite significance for the speech recognition system. Prosodic characteristics such as intonation, rhythm and stress create a crucial impact on the speech recognition system. These characteristics vary from language to language as well as various dialects of a language. This paper portrays a comparative analysis of isolated words prosodic features of Malwi, Majhi and Doabi dialects of Punjabi language. Analysis is done using the PRAAT tool. Pitch, intensity, formant I and formant II values are extracted for toneme, adhak, nasal (bindi) and nasal (tippi) words. For all kinds of words, there is a significant variation in pitch (fundamental frequency), intensity, formant I and formant II values of male and female speakers of Malwi, Majhi and Doabi dialects. A detailed analysis has been discussed throughout this paper.


Author(s):  
Keshav Sinha ◽  
Rasha Subhi Hameed ◽  
Partha Paul ◽  
Karan Pratap Singh

In recent years, the advancement in voice-based authentication leads in the field of numerous forensic voice authentication technology. For verification, the speech reference model is collected from various open-source clusters. In this chapter, the primary focus is on automatic speech recognition (ASR) technique which stores and retrieves the data and processes them in a scalable manner. There are the various conventional techniques for speech recognition such as BWT, SVD, and MFCC, but for automatic speech recognition, the efficiency of these conventional recognition techniques degrade. So, to overcome this problem, the authors propose a speech recognition system using E-SVD, D3-MFCC, and dynamic time wrapping (DTW). The speech signal captures its important qualities while discarding the unimportant and distracting features using D3-MFCC.


Author(s):  
Shobha Bhatt ◽  
Amita Dev ◽  
Anurag Jain

Background: Speech Recognition is the most effective and suitable way of communication. Extracted features play an important role in speech recognition. Previous research works for Hindi speech recognition lack detailed comparative analysis of the feature extraction methods using dynamic and energy parameters. Objective: The research work presents experimental work done to explore the effects of integrating dynamic coefficients and energy parameters with different feature extraction techniques on Connected word Hindi Speech recognition. As extracted features play a significant role in speech recognition, a comparative analysis is presented to show the effects of integration of dynamic and energy parameters to basic extracted features. Method: Speaker dependent system was proposed with monophones based five states Hidden Markov Model (HMM) using HTK Tool kit. Speech data set of connected words in Hindi was created. The feature extraction techniques such as Linear Predictive Coding Cepstral coefficients (LPCCs), Mel Frequency Cepstral Coefficients (MFCCs), and Perceptual Linear Prediction (PLPs) coefficients were applied integrating delta, delta2, and energy parameters to evaluate the performance of the proposed methodology for speaker dependent recognition. Results: Experimental results show that the system achieved the highest recognition word accuracy of 89.97% using PLP coefficients. The PLP coefficients achieved 4% increment in word accuracy than original MFCCs and 16% increment in word accuracy than LPCCs. Adding energy parameters to original MFCCs increased word accuracy by 1.5% only while adding dynamic coefficients delta and double delta has no significant effect on speech recognition accuracy. Conclusion: Research findings reveal that PLP coefficients outperformed. Explorations reveal that the integration of energy parameters are better than original MFCCs. Investgations also reveal that adding energy parametres improved recognition score while adding delta and delta2 coefficients to basic features did not improve the recognition scores. Research findings could be used to enhance the performance of a speech recognition system by using a suitable feature extraction technique and combining the different feature extraction techniques. Further, investigations can be used to develop language resources for refining speech recognition. The work can be extended to develop a continuous Hindi speech recognition system


2016 ◽  
Vol 4 (2) ◽  
pp. 152-155
Author(s):  
Moirangthem Tiken Singh ◽  

This paper presents a report on an Automatic Speech Recognition System (ASR) for different Indian language under different accent. The paper is a comparative study of the performance of system developed which uses Hidden Markov Model (HMM) as the classifier and Mel-Frequency Cepstral Coefficients (MFCC) as speech features.


Author(s):  
Lery Sakti Ramba

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.


Sign in / Sign up

Export Citation Format

Share Document