Emotion Recognition Using Rational Dilation Wavelet Transform For Speech Signal

Author(s):  
Ravi ◽  
Sachin Taran
2004 ◽  
Vol 14 (2) ◽  
pp. 150-155 ◽  
Author(s):  
Hyoun-Joo Go ◽  
Dae-Jong Lee ◽  
Jang-Hwan Park ◽  
Myung-Geun Chun

Author(s):  
Mourad Talbi ◽  
Med Salim Bouhlel

Background: In this paper, we propose a secure image watermarking technique which is applied to grayscale and color images. It consists in applying the SVD (Singular Value Decomposition) in the Lifting Wavelet Transform domain for embedding a speech image (the watermark) into the host image. Methods: It also uses signature in the embedding and extraction steps. Its performance is justified by the computation of PSNR (Pick Signal to Noise Ratio), SSIM (Structural Similarity), SNR (Signal to Noise Ratio), SegSNR (Segmental SNR) and PESQ (Perceptual Evaluation Speech Quality). Results: The PSNR and SSIM are used for evaluating the perceptual quality of the watermarked image compared to the original image. The SNR, SegSNR and PESQ are used for evaluating the perceptual quality of the reconstructed or extracted speech signal compared to the original speech signal. Conclusion: The Results obtained from computation of PSNR, SSIM, SNR, SegSNR and PESQ show the performance of the proposed technique.


2013 ◽  
Vol 25 (12) ◽  
pp. 3294-3317 ◽  
Author(s):  
Lijiang Chen ◽  
Xia Mao ◽  
Pengfei Wei ◽  
Angelo Compare

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.


Author(s):  
M. Yasin Pir ◽  
Mohamad Idris Wani

Speech forms a significant means of communication and the variation in pitch of a speech signal of a gender is commonly used to classify gender as male or female. In this study, we propose a system for gender classification from speech by combining hybrid model of 1-D Stationary Wavelet Transform (SWT) and artificial neural network. Features such as power spectral density, frequency, and amplitude of human voice samples were used to classify the gender. We use Daubechies wavelet transform at different levels for decomposition and reconstruction of the signal. The reconstructed signal is fed to artificial neural network using feed forward network for classification of gender. This study uses 400 voice samples of both the genders from Michigan University database which has been sampled at 16000 Hz. The experimental results show that the proposed method has more than 94% classification efficiency for both training and testing datasets.


2007 ◽  
Vol 2007 ◽  
pp. 1-5 ◽  
Author(s):  
Aïcha Bouzid ◽  
Noureddine Ellouze

This paper describes a multiscale product method (MPM) for open quotient measure in voiced speech. The method is based on determining the glottal closing and opening instants. The proposed approach consists of making the products of wavelet transform of speech signal at different scales in order to enhance the edge detection and parameter estimation. We show that the proposed method is effective and robust for detecting speech singularity. Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important in a wide range of speech processing tasks. In this paper, accurate estimation of GCIs and GOIs is used to measure the local open quotient (Oq) which is the ratio of the open time by the pitch period. Multiscale product operates automatically on speech signal; the reference electroglottogram (EGG) signal is used for performance evaluation. The ratio of good GCI detection is 95.5% and that of GOI is 76%. The pitch period relative error is 2.6% and the open phase relative error is 5.6%. The relative error measured on open quotient reaches 3% for the whole Keele database.


2006 ◽  
Author(s):  
Sheng Zhang ◽  
P. C. Ching ◽  
Fanrang Kong

Sign in / Sign up

Export Citation Format

Share Document