Automatic gender recognition using linear prediction coefficients and artificial neural network on speech signal

Author(s):  
M. A. Yusnita ◽  
A. M. Hafiz ◽  
M. Nor Fadzilah ◽  
Aida Zulia Zulhanip ◽  
Mohaiyedin Idris
Author(s):  
M. Yasin Pir ◽  
Mohamad Idris Wani

Speech forms a significant means of communication and the variation in pitch of a speech signal of a gender is commonly used to classify gender as male or female. In this study, we propose a system for gender classification from speech by combining hybrid model of 1-D Stationary Wavelet Transform (SWT) and artificial neural network. Features such as power spectral density, frequency, and amplitude of human voice samples were used to classify the gender. We use Daubechies wavelet transform at different levels for decomposition and reconstruction of the signal. The reconstructed signal is fed to artificial neural network using feed forward network for classification of gender. This study uses 400 voice samples of both the genders from Michigan University database which has been sampled at 16000 Hz. The experimental results show that the proposed method has more than 94% classification efficiency for both training and testing datasets.


2016 ◽  
pp. 196-212
Author(s):  
Mousmita Sarma ◽  
Kandarpa Kumar Sarma

Acoustic modeling of the sound unit is a crucial component of Automatic Speech Recognition (ASR) system. This is the process of establishing statistical representations for the feature vector sequences for a particular sound unit so that a classifier for the entire sound unit used in the ASR system can be designed. Current ASR systems use Hidden Markov Model (HMM) to deal with temporal variability and Gaussian Mixture Model (GMM) for acoustic modeling. Recently machine learning paradigms have been explored for application in speech recognition domain. In this regard, Multi Layer Perception (MLP), Recurrent Neural Network (RNN) etc. are extensively used. Artificial Neural Network (ANN)s are trained by back propagating the error derivatives and therefore have the potential to learn much better models of nonlinear data. Recently, Deep Neural Network (DNN)s with many hidden layer have been up voted by the researchers and have been accepted to be suitable for speech signal modeling. In this chapter various techniques and works on the ANN based acoustic modeling are described.


Author(s):  
Mousmita Sarma ◽  
Kandarpa Kumar Sarma

Acoustic modeling of the sound unit is a crucial component of Automatic Speech Recognition (ASR) system. This is the process of establishing statistical representations for the feature vector sequences for a particular sound unit so that a classifier for the entire sound unit used in the ASR system can be designed. Current ASR systems use Hidden Markov Model (HMM) to deal with temporal variability and Gaussian Mixture Model (GMM) for acoustic modeling. Recently machine learning paradigms have been explored for application in speech recognition domain. In this regard, Multi Layer Perception (MLP), Recurrent Neural Network (RNN) etc. are extensively used. Artificial Neural Network (ANN)s are trained by back propagating the error derivatives and therefore have the potential to learn much better models of nonlinear data. Recently, Deep Neural Network (DNN)s with many hidden layer have been up voted by the researchers and have been accepted to be suitable for speech signal modeling. In this chapter various techniques and works on the ANN based acoustic modeling are described.


2021 ◽  
Author(s):  
Mourad Talbi ◽  
Riadh Baazaoui ◽  
Med Salim Bouhlel

In this chapter, we will detail a new speech enhancement technique based on Lifting Wavelet Transform (LWT) and Artifitial Neural Network (ANN). This technique also uses the MMSE Estimate of Spectral Amplitude. It consists at the first step in applying the LWTto the noisy speech signal in order to obtain two noisy details coefficients, cD1 and cD2 and one approximation coefficient, cA2. After that, cD1 and cD2 are denoised by soft thresholding and for their thresholding, we need to use suitable thresholds, thrj,1≤j≤2. Those thresholds, thrj,1≤j≤2, are determined by using an Artificial Neural Network (ANN). The soft thresholding of those coefficients, cD1 and cD2, is performed in order to obtain two denoised coefficients, cDd1 and cDd2 . Then the denoising technique based on MMSE Estimate of Spectral Amplitude is applied to the noisy approximation cA2 in order to obtain a denoised coefficient, cAd2. Finally, the enhanced speech signal is obtained from the application of the inverse of LWT, LWT−1 to cDd1, cDd2 and cAd2. The performance of the proposed speech enhancement technique is justified by the computations of the Signal to Noise Ratio (SNR), Segmental SNR (SSNR) and Perceptual Evaluation of Speech Quality (PESQ).


2000 ◽  
Vol 25 (4) ◽  
pp. 325-325
Author(s):  
J.L.N. Roodenburg ◽  
H.J. Van Staveren ◽  
N.L.P. Van Veen ◽  
O.C. Speelman ◽  
J.M. Nauta ◽  
...  

2004 ◽  
Vol 171 (4S) ◽  
pp. 502-503
Author(s):  
Mohamed A. Gomha ◽  
Khaled Z. Sheir ◽  
Saeed Showky ◽  
Khaled Madbouly ◽  
Emad Elsobky ◽  
...  

1998 ◽  
Vol 49 (7) ◽  
pp. 717-722 ◽  
Author(s):  
M C M de Carvalho ◽  
M S Dougherty ◽  
A S Fowkes ◽  
M R Wardman

Sign in / Sign up

Export Citation Format

Share Document