Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features

The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.

Download Full-text

Mexican Emotional Speech Database Based on Semantic, Frequency, Familiarity, Concreteness, and Cultural Shaping of Affective Prosody

Data ◽

10.3390/data6120130 ◽

2021 ◽

Vol 6 (12) ◽

pp. 130

Author(s):

Mathilde Marie Duville ◽

Luz María Alonso-Valerdi ◽

David I. Ibarra-Zarate

Keyword(s):

Statistical Analysis ◽

Emotional Expression ◽

Support Vector ◽

Cultural Variation ◽

Emotional Prosody ◽

Emotional Speech ◽

Linguistic Features ◽

Male Adult ◽

Speech Database ◽

Emotional Speech Database

In this paper, the Mexican Emotional Speech Database (MESD) that contains single-word emotional utterances for anger, disgust, fear, happiness, neutral and sadness with adult (male and female) and child voices is described. To validate the emotional prosody of the uttered words, a cubic Support Vector Machines classifier was trained on the basis of prosodic, spectral and voice quality features for each case study: (1) male adult, (2) female adult and (3) child. In addition, cultural, semantic, and linguistic shaping of emotional expression was assessed by statistical analysis. This study was registered at BioMed Central and is part of the implementation of a published study protocol. Mean emotional classification accuracies yielded 93.3%, 89.4% and 83.3% for male, female and child utterances respectively. Statistical analysis emphasized the shaping of emotional prosodies by semantic and linguistic features. A cultural variation in emotional expression was highlighted by comparing the MESD with the INTERFACE for Castilian Spanish database. The MESD provides reliable content for linguistic emotional prosody shaped by the Mexican cultural environment. In order to facilitate further investigations, a corpus controlled for linguistic features and emotional semantics, as well as one containing words repeated across voices and emotions are provided. The MESD is made freely available.

Download Full-text

Nonlinear Dynamic Feature Extraction Based on Phase Space Reconstruction for the Classification of Speech and Emotion

Mathematical Problems in Engineering ◽

10.1155/2020/9452976 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Ying Sun ◽

Xue-Ying Zhang ◽

Jiang-He Ma ◽

Chun-Xiao Song ◽

Hui-Fen Lv

Keyword(s):

Feature Extraction ◽

Phase Space ◽

Speech Signal ◽

Nonlinear Dynamic ◽

Phase Space Reconstruction ◽

Support Vector ◽

Speech Signals ◽

Emotional Speech ◽

Speech Database ◽

Emotional Speech Database

Due to the shortcomings of linear feature parameters in speech signals, and the limitations of existing time- and frequency-domain attribute features in characterizing the integrity of the speech information, in this paper, we propose a nonlinear method for feature extraction based on the phase space reconstruction (PSR) theory. First, the speech signal was analyzed using a nonlinear dynamic model. Then, the model was used to reconstruct a one-dimensional time speech signal. Finally, nonlinear dynamic (NLD) features based on the reconstruction of the phase space were extracted as the new characteristic parameters. Then, the performance of NLD features was verified by comparing their recognition rates with those of other features (NLD features, prosodic features, and MFCC features). Finally, the Korean isolated words database, the Berlin emotional speech database, and the CASIA emotional speech database were chosen for validation. The effectiveness of the NLD features was tested using the Support Vector Machine classifier. The results show that NLD features not only have high recognition rate and excellent antinoise performance for speech recognition tasks but also can fully characterize the different emotions contained in speech signals.

Download Full-text

Acoustic comparison of electronics disguised voice using Different semitones

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.16.11502 ◽

2018 ◽

Vol 7 (2.16) ◽

pp. 98 ◽

Cited By ~ 2

Author(s):

Mahesh K. Singh ◽

A K. Singh ◽

Narendra Singh

Keyword(s):

Support Vector Machine ◽

Acoustic Analysis ◽

Speaker Identification ◽

Support Vector ◽

Acoustic Features ◽

Acoustic Feature ◽

Mel Frequency Cepstral Coefficients ◽

Identification Rate ◽

Normal Voice ◽

Feature Based

This paper emphasizes an algorithm that is based on acoustic analysis of electronics disguised voice. Proposed work is given a comparative analysis of all acoustic feature and its statistical coefficients. Acoustic features are computed by Mel-frequency cepstral coefficients (MFCC) method and compare with a normal voice and disguised voice by different semitones. All acoustic features passed through the feature based classifier and detected the identification rate of all type of electronically disguised voice. There are two types of support vector machine (SVM) and decision tree (DT) classifiers are used for speaker identification in terms of classification efficiency of electronically disguised voice by different semitones.

Download Full-text

Kannada Emotional Speech Database: Design, Development and Evaluation

Proceedings of International Conference on Cognition and Recognition - Lecture Notes in Networks and Systems ◽

10.1007/978-981-10-5146-3_14 ◽

2017 ◽

pp. 135-143

Author(s):

A. Geethashree ◽

D J Ravi

Keyword(s):

Database Design ◽

Emotional Speech ◽

Design Development ◽

Speech Database ◽

Emotional Speech Database

Download Full-text

Assessment of spontaneous emotional speech database toward emotion recognition: Intensity and similarity of perceived emotion from spontaneously expressed emotional speech

Acoustical Science and Technology ◽

10.1250/ast.32.26 ◽

2011 ◽

Vol 32 (1) ◽

pp. 26-29 ◽

Cited By ~ 2

Author(s):

Yoshiko Arimoto ◽

Sumio Ohno ◽

Hitoshi Iida

Keyword(s):

Emotion Recognition ◽

Emotional Speech ◽

Speech Database ◽

Emotional Speech Database ◽

Perceived Emotion

Download Full-text

A combined cepstral distance method for emotional speech recognition

International Journal of Advanced Robotic Systems ◽

10.1177/1729881417719836 ◽

2017 ◽

Vol 14 (4) ◽

pp. 172988141771983 ◽

Cited By ~ 2

Author(s):

Changqin Quan ◽

Bin Zhang ◽

Xiao Sun ◽

Fuji Ren

Keyword(s):

Artificial Intelligence ◽

Speech Recognition ◽

Affective Computing ◽

Recognition Rate ◽

Support Vector ◽

Emotional Speech ◽

Emotion Classification ◽

Distance Method ◽

Speech Database ◽

Emotional Speech Recognition

Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.

Download Full-text

Emotional Speech Database and the Acoustic Analysis of Emotional Speech

EONEOHAG ◽

10.17290/jlsk.2015..72.175 ◽

2015 ◽

Vol null (72) ◽

pp. 175-199

Author(s):

손남호 ◽

Hwang Hyosung ◽

Ho-Young Lee

Keyword(s):

Acoustic Analysis ◽

Emotional Speech ◽

Speech Database ◽

Emotional Speech Database

Download Full-text

Speech Emotion Recognition System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-v4-i3-024 ◽

2021 ◽

pp. 156-159

Author(s):

Sourabh Suke ◽

Ganesh Regulwar ◽

Nikesh Aote ◽

Pratik Chaudhari ◽

Rajat Ghatode ◽

...

Keyword(s):

Emotion Recognition ◽

Automobile Industry ◽

Emotional State ◽

Recognition System ◽

Classification Model ◽

General Idea ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional Speech ◽

Acoustic Features

This project describes "VoiEmo- A Speech Emotion Recognizer", a system for recognizing the emotional state of an individual from his/her speech. For example, one's speech becomes loud and fast, with a higher and wider range in pitch, when in a state of fear, anger, or joy whereas human voice is generally slow and low pitched in sadness and tiredness. We have particularly developed a classification model speech emotion detection based on Convolutional neural networks (CNNs), Support Vector Machine (SVM), Multilayer Perceptron (MLP) Classification which make predictions considering the acoustic features of speech signal such as Mel Frequency Cepstral Coefficient (MFCC). Our models have been trained to recognize seven common emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). For training and testing the model, we have used relevant data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and the Toronto Emotional Speech Set (TESS) Dataset. The system is advantageous as it can provide a general idea about the emotional state of the individual based on the acoustic features of the speech irrespective of the language the speaker speaks in, moreover, it also saves time and effort. Speech emotion recognition systems have their applications in various fields like in call centers and BPOs, criminal investigation, psychiatric therapy, the automobile industry, etc.

Download Full-text

The Impact of Pixel Resolution, Integration Scale, Preprocessing, and Feature Normalization on Texture Analysis for Mass Classification in Mammograms

International Journal of Optics ◽

10.1155/2016/1370259 ◽

2016 ◽

Vol 2016 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Mohamed Abdel-Nasser ◽

Jaime Melendez ◽

Antonio Moreno ◽

Domenec Puig

Keyword(s):

Texture Analysis ◽

Spatial Arrangement ◽

Classification Performance ◽

Support Vector ◽

Forward Selection ◽

Mass Detection ◽

Pixel Resolution ◽

Feature Normalization ◽

Mass Classification ◽

The Impact

Texture analysis methods are widely used to characterize breast masses in mammograms. Texture gives information about the spatial arrangement of the intensities in the region of interest. This information has been used in mammogram analysis applications such as mass detection, mass classification, and breast density estimation. In this paper, we study the effect of factors such as pixel resolution, integration scale, preprocessing, and feature normalization on the performance of those texture methods for mass classification. The classification performance was assessed considering linear and nonlinear support vector machine classifiers. To find the best combination among the studied factors, we used three approaches: greedy, sequential forward selection (SFS), and exhaustive search. On the basis of our study, we conclude that the factors studied affect the performance of texture methods, so the best combination of these factors should be determined to achieve the best performance with each texture method. SFS can be an appropriate way to approach the factor combination problem because it is less computationally intensive than the other methods.

Download Full-text

Creation and Analysis of Emotional Speech Database for Multiple Emotions Recognition

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) ◽

10.1109/o-cocosda50338.2020.9295041 ◽

2020 ◽

Author(s):

Ryota Sato ◽

Ryohei Sasaki ◽

Norisato Suga ◽

Toshihiro Furukawa

Keyword(s):

Emotional Speech ◽

Speech Database ◽

Emotional Speech Database

Download Full-text