Auditory Emotion Recognition Impairments in Schizophrenia: Relationship to Acoustic Features and Cognition

With the advancement of machine learning methods, audio sentiment analysis has become an active research area in recent years. For example, business organizations are interested in persuasion tactics from vocal cues and acoustic measures in speech. A typical approach is to find a set of acoustic features from audio data that can indicate or predict a customer's attitude, opinion, or emotion state. For audio signals, acoustic features have been widely used in many machine learning applications, such as music classification, language recognition, emotion recognition, and so on. For emotion recognition, previous work shows that pitch and speech rate features are important features. This thesis work focuses on determining sentiment from call center audio records, each containing a conversation between a sales representative and a customer. The sentiment of an audio record is considered positive if the conversation ended with an appointment being made, and is negative otherwise. In this project, a data processing and machine learning pipeline for this problem has been developed. It consists of three major steps: 1) an audio record is split into segments by speaker turns; 2) acoustic features are extracted from each segment; and 3) classification models are trained on the acoustic features to predict sentiment. Different set of features have been used and different machine learning methods, including classical machine learning algorithms and deep neural networks, have been implemented in the pipeline. In our deep neural network method, the feature vectors of audio segments are stacked in temporal order into a feature matrix, which is fed into deep convolution neural networks as input. Experimental results based on real data shows that acoustic features, such as Mel frequency cepstral coefficients, timbre and Chroma features, are good indicators for sentiment. Temporal information in an audio record can be captured by deep convolutional neural networks for improved prediction accuracy.

Download Full-text

Affective Latent Representation of Acoustic and Lexical Features for Emotion Recognition

Sensors ◽

10.3390/s20092614 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2614

Author(s):

Eesung Kim ◽

Hyungchan Song ◽

Jong Won Shin

Keyword(s):

Emotion Recognition ◽

Motion Capture ◽

Deep Neural Network ◽

Previous Attempt ◽

Distributed Representation ◽

Acoustic Features ◽

Recognition Method ◽

Latent Representations ◽

Discriminant Power ◽

Statistical Functionals

In this paper, we propose a novel emotion recognition method based on the underlying emotional characteristics extracted from a conditional adversarial auto-encoder (CAAE), in which both acoustic and lexical features are used as inputs. The acoustic features are generated by calculating statistical functionals of low-level descriptors and by a deep neural network (DNN). These acoustic features are concatenated with three types of lexical features extracted from the text, which are a sparse representation, a distributed representation, and an affective lexicon-based dimensions. Two-dimensional latent representations similar to vectors in the valence-arousal space are obtained by a CAAE, which can be directly mapped into the emotional classes without the need for a sophisticated classifier. In contrast to the previous attempt to a CAAE using only acoustic features, the proposed approach could enhance the performance of the emotion recognition because combined acoustic and lexical features provide enough discriminant power. Experimental results on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus showed that our method outperformed the previously reported best results on the same corpus, achieving 76.72% in the unweighted average recall.

Download Full-text

Improving multilingual speech emotion recognition by combining acoustic features in a three-layer model

Speech Communication ◽

10.1016/j.specom.2019.04.004 ◽

2019 ◽

Vol 110 ◽

pp. 1-12 ◽

Cited By ~ 13

Author(s):

Xingfeng Li ◽

Masato Akagi

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Acoustic Features ◽

Layer Model

Download Full-text

DNN-based Emotion Recognition Based on Bottleneck Acoustic Features and Lexical Features

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2019.8683077 ◽

2019 ◽

Cited By ~ 3

Author(s):

Eesung Kim ◽

Jong Won Shin

Keyword(s):

Emotion Recognition ◽

Acoustic Features

Download Full-text

A Study on Emotion Recognition using Speech Acoustic Features and Face Images

The Transactions of The Korean Institute of Electrical Engineers ◽

10.5370/kiee.2020.69.7.1081 ◽

2020 ◽

Vol 69 (7) ◽

pp. 1081-1086

Author(s):

Myoung-jin Son ◽

Seok-pil Lee

Keyword(s):

Emotion Recognition ◽

Acoustic Features ◽

Face Images

Download Full-text

Speech Emotion Recognition System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-v4-i3-024 ◽

2021 ◽

pp. 156-159

Author(s):

Sourabh Suke ◽

Ganesh Regulwar ◽

Nikesh Aote ◽

Pratik Chaudhari ◽

Rajat Ghatode ◽

...

Keyword(s):

Emotion Recognition ◽

Automobile Industry ◽

Emotional State ◽

Recognition System ◽

Classification Model ◽

General Idea ◽

Speech Emotion Recognition ◽

Support Vector ◽

Emotional Speech ◽

Acoustic Features

This project describes "VoiEmo- A Speech Emotion Recognizer", a system for recognizing the emotional state of an individual from his/her speech. For example, one's speech becomes loud and fast, with a higher and wider range in pitch, when in a state of fear, anger, or joy whereas human voice is generally slow and low pitched in sadness and tiredness. We have particularly developed a classification model speech emotion detection based on Convolutional neural networks (CNNs), Support Vector Machine (SVM), Multilayer Perceptron (MLP) Classification which make predictions considering the acoustic features of speech signal such as Mel Frequency Cepstral Coefficient (MFCC). Our models have been trained to recognize seven common emotions (neutral, calm, happy, sad, angry, fearful, disgust, surprise). For training and testing the model, we have used relevant data from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset and the Toronto Emotional Speech Set (TESS) Dataset. The system is advantageous as it can provide a general idea about the emotional state of the individual based on the acoustic features of the speech irrespective of the language the speaker speaks in, moreover, it also saves time and effort. Speech emotion recognition systems have their applications in various fields like in call centers and BPOs, criminal investigation, psychiatric therapy, the automobile industry, etc.

Download Full-text

Emotion Recognition from Decision Level Fusion of Visual and Acoustic Features Using Hausdorff Classifier

Communications in Computer and Information Science - Computer Networks and Intelligent Computing ◽

10.1007/978-3-642-22786-8_76 ◽

2011 ◽

pp. 601-610

Author(s):

Vankayalapati H.D. ◽

Anne K.R. ◽

Kyamakya K.

Keyword(s):

Emotion Recognition ◽

Acoustic Features ◽

Decision Level ◽

Decision Level Fusion ◽

Level Fusion

Download Full-text

Novel acoustic features for speech emotion recognition

Science in China Series E ◽

10.1007/s11431-009-0204-3 ◽

2009 ◽

Vol 52 (7) ◽

pp. 1838-1848 ◽

Cited By ~ 6

Author(s):

Yong-Wan Roh ◽

Dong-Ju Kim ◽

Woo-Seok Lee ◽

Kwang-Seok Hong

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Acoustic Features

Download Full-text

Emotion Recognition from Speech Signals Using Elicited Data and Fuzzy KDA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.385-386.1385 ◽

2013 ◽

Vol 385-386 ◽

pp. 1385-1388

Author(s):

Yong Qiang Bao ◽

Li Zhao ◽

Cheng Wei Hang

Keyword(s):

Discriminant Analysis ◽

Emotion Recognition ◽

Real World ◽

Recognition Rate ◽

Experimental Results ◽

Speech Emotion Recognition ◽

Acoustic Features ◽

Reliable System ◽

Kernel Discriminant Analysis ◽

Real World Applications

In this paper we introduced the application of Fuzzy KDA in speech emotion recognition using elicited data. The emotional data induced in a psychology experiment. The acted data is not suitable for developing real world applications and by using more naturalistic data we may build more reliable system. The emotional feature set is then constructed for modeling and recognition. A total of 372 low level acoustic features are used and kernel discriminant analysis is used for emotion recognition. The experimental results show a promising recognition rate.

Download Full-text