Classifying Recurrent Dynamics on Emotional Speech Signals

Author(s):  
Sudhangshu Sarkar ◽  
Anilesh Dey
2011 ◽  
Vol 121-126 ◽  
pp. 815-819 ◽  
Author(s):  
Yu Qiang Qin ◽  
Xue Ying Zhang

Ensemble empirical mode decomposition(EEMD) is a newly developed method aimed at eliminating mode mixing present in the original empirical mode decomposition (EMD). To evaluate the performance of this new method, this paper investigates the effect of two parameters pertinent to EEMD: the emotional envelop and the number of emotional ensemble trials. At the same time, the proposed technique has been utilized for four kinds of emotional(angry、happy、sad and neutral) speech signals, and compute the number of each emotional ensemble trials. We obtain an emotional envelope by transforming the IMFe of emotional speech signals, and obtain a new method of emotion recognition according to different emotional envelop and emotional ensemble trials.


Author(s):  
Lakshmi Srinivas Dendukuri ◽  
Shaik Jakeer Hussain

Extraction of voiced regions of speech is one of the latest topics in speech domain for various speech applications. Emotional speech signals contain most of the information in voiced regions of speech. In this particular work, voiced regions of speech are extracted from emotional speech signals using wavelet-pitch method. Daubechies wavelet (Db4) is applied on the speech frames after downsampling the speech signals. Autocorrelation function is performed on the extracted approximation coefficients of each speech frame and corresponding pitch values are obtained. A local threshold is defined on obtained pitch values to extract voiced regions. The threshold values are different for male and female speakers, as male pitch values are low compared to the female pitch values in general. The obtained pitch values are scaled down and are compared with the thresholds to extract the voiced frames. The transition frames between the voiced and unvoiced frames are also extracted if the previous frame is voiced frame, to preserve the emotional content in extracted frames. The extracted frames are reshaped to have desired emotional speech signal. Signal to Noise Ratio (SNR), Normalized Root Mean Square Error (NRMSE) and statistical parameters are used as evaluation metrics. This particular work provides better SNR and Normalized Root Mean Square Error values compared to the zero crossing-energy and residual signal based methods in voiced region extraction. Db4 wavelet provides better results compared to Haar and Db2 wavelets in extracting voiced regions using wavelet-pitch method from emotional speech signals.


2016 ◽  
Author(s):  
Laura Rachman ◽  
Marco Liuni ◽  
Pablo Arias ◽  
Andreas Lind ◽  
Petter Johansson ◽  
...  

We present an open-source software platform that transforms the emotions expressed by speech signals using audio effects like pitch shifting, inflection, vibrato, and filtering. The emotional transformations can be applied to any audio file, but can also run in real-time (with less than 20-millisecond latency), using live input from a microphone. We anticipate that this tool will be useful for the study of emotions in psychology and neuroscience, because it enables a high level of control over the acoustical and emotional content of experimental stimuli in a variety of laboratory situations, including real-time social situations. We present here results of a series of validation experiments showing that transformed emotions are recognized at above-chance levels in the French, English, Swedish and Japanese languages, with a naturalness comparable to natural speech. Then, we provide a list of twenty-five experimental ideas applying this new tool to important topics in the behavioral sciences.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Ying Sun ◽  
Xue-Ying Zhang ◽  
Jiang-He Ma ◽  
Chun-Xiao Song ◽  
Hui-Fen Lv

Due to the shortcomings of linear feature parameters in speech signals, and the limitations of existing time- and frequency-domain attribute features in characterizing the integrity of the speech information, in this paper, we propose a nonlinear method for feature extraction based on the phase space reconstruction (PSR) theory. First, the speech signal was analyzed using a nonlinear dynamic model. Then, the model was used to reconstruct a one-dimensional time speech signal. Finally, nonlinear dynamic (NLD) features based on the reconstruction of the phase space were extracted as the new characteristic parameters. Then, the performance of NLD features was verified by comparing their recognition rates with those of other features (NLD features, prosodic features, and MFCC features). Finally, the Korean isolated words database, the Berlin emotional speech database, and the CASIA emotional speech database were chosen for validation. The effectiveness of the NLD features was tested using the Support Vector Machine classifier. The results show that NLD features not only have high recognition rate and excellent antinoise performance for speech recognition tasks but also can fully characterize the different emotions contained in speech signals.


Author(s):  
C. Revathy ◽  
R. Sureshbabu

Speech processing is one of the required fields in digital signal processing that helps in processing the speech signals. The speech process is utilized in different fields such as emotion recognition, virtual assistants, voice identification, etc. Among the various applications, emotion recognition is one of the critical areas because it is used to recognize people’s exact emotions and eliminate physiological issues. Several researchers utilize signal processing and machine learning techniques together to find the exact human emotions. However, they fail to attain their feelings with less computational complexity and high accuracy. This paper introduces the intelligent computational technique called cat swarm optimized spiking neural network (CSSPNN). Initially, the emotional speech signal is collected from the Toronto emotional speech set (TESS) dataset, which is then processed by applying a wavelet approach to extract the features. The derived features are further examined using the defined classifier CSSPNN, which recognizes human emotions due to the effective training and learning process. Finally, the proficiency of the system is determined using experimental results and discussions. The proposed system recognizes the speech emotions up to 99.3% accuracy compared to recurrent neural networks (RNNs), deep neural networks (DNNs) and deep shallow neural networks (DSNNs).


Sign in / Sign up

Export Citation Format

Share Document