Emotion recognition by combining prosody and sentiment analysis for expressing reactive emotion by humanoid robot

Acoustic feature-based sentiment analysis of call center data

10.32469/10355/66751 ◽

2017 ◽

Author(s):

◽

Zeshan Peng

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Call Center ◽

Machine Learning Algorithms ◽

Language Recognition ◽

Acoustic Features ◽

Learning Methods ◽

Machine Learning Methods

With the advancement of machine learning methods, audio sentiment analysis has become an active research area in recent years. For example, business organizations are interested in persuasion tactics from vocal cues and acoustic measures in speech. A typical approach is to find a set of acoustic features from audio data that can indicate or predict a customer's attitude, opinion, or emotion state. For audio signals, acoustic features have been widely used in many machine learning applications, such as music classification, language recognition, emotion recognition, and so on. For emotion recognition, previous work shows that pitch and speech rate features are important features. This thesis work focuses on determining sentiment from call center audio records, each containing a conversation between a sales representative and a customer. The sentiment of an audio record is considered positive if the conversation ended with an appointment being made, and is negative otherwise. In this project, a data processing and machine learning pipeline for this problem has been developed. It consists of three major steps: 1) an audio record is split into segments by speaker turns; 2) acoustic features are extracted from each segment; and 3) classification models are trained on the acoustic features to predict sentiment. Different set of features have been used and different machine learning methods, including classical machine learning algorithms and deep neural networks, have been implemented in the pipeline. In our deep neural network method, the feature vectors of audio segments are stacked in temporal order into a feature matrix, which is fed into deep convolution neural networks as input. Experimental results based on real data shows that acoustic features, such as Mel frequency cepstral coefficients, timbre and Chroma features, are good indicators for sentiment. Temporal information in an audio record can be captured by deep convolutional neural networks for improved prediction accuracy.

Download Full-text

EMOSIS Sentiment Analysis on Tweets with Emotion and Intensity Level Recognition Considering Ending Punctuation Marks

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d4518.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10289-10293

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Language Processing ◽

Significant Role ◽

Language Model ◽

Intensity Level ◽

Processing Stage ◽

Overall Performance

Sentiment Analysis is a tool used for determining the Polarity or Emotion of a Sentence. It is a field of Natural Language Processing which focuses on the study of opinions. In this study, the researchers solved one key challenge in Sentiment Analysis, which is to consider the Ending Punctuation Marks present in a sentence. Ending punctuation marks plays a significant role in Emotion Recognition and Intensity Level Recognition. The research made used of tweets expressing opinions about Philippine President Rodrigo Duterte. These downloaded tweets served as the inputs. It was initially subjected to pre-processing stage to be able to prepare the sentences for processing. A Language Model was created to serve as the classifier for determining the scores of the tweets. The scores give the polarity of the sentence. Accuracy is very important in sentiment analysis. To increase the chance of correctly identifying the polarity of the tweets, the input undergone Intensity Level Recognition which determines the intensifiers and negations within the sentences. The system was evaluated with overall performance of 80.27%.

Download Full-text

Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6431 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8992-8999

Author(s):

Zhongkai Sun ◽

Prathusha Sarma ◽

William Sethares ◽

Yingyu Liang

Keyword(s):

Emotion Recognition ◽

Sentiment Analysis ◽

Canonical Correlation ◽

Language Models ◽

Outer Product ◽

Language Analysis ◽

Benchmark Datasets ◽

Text Features ◽

Multimodal Language ◽

Multimodal Sentiment Analysis

Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment analysis and emotion recognition can be improved by learning (hidden) correlations between features extracted from the outer product of text and audio (we call this text-based audio) and analogous text-based video. This paper proposes a novel model, the Interaction Canonical Correlation Network (ICCN), to learn such multimodal embeddings. ICCN learns correlations between all three modes via deep canonical correlation analysis (DCCA) and the proposed embeddings are then tested on several benchmark datasets and against other state-of-the-art multimodal embedding algorithms. Empirical results and ablation studies confirm the effectiveness of ICCN in capturing useful information from all three views.

Download Full-text

Intelligent facial emotion recognition and semantic-based topic detection for a humanoid robot

Expert Systems with Applications ◽

10.1016/j.eswa.2013.03.016 ◽

2013 ◽

Vol 40 (13) ◽

pp. 5160-5168 ◽

Cited By ~ 55

Author(s):

Li Zhang ◽

Ming Jiang ◽

Dewan Farid ◽

M.A. Hossain

Keyword(s):

Emotion Recognition ◽

Humanoid Robot ◽

Facial Emotion Recognition ◽

Facial Emotion ◽

Topic Detection

Download Full-text

Emotion Analysis from Human Voice Using Various Prosodic Features and Text Analysis

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9055 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4244-4247

Author(s):

Vybhav Jain ◽

S. B. Rajeshwari ◽

Jagadish S. Kallimani

Keyword(s):

Emotion Recognition ◽

Sentiment Analysis ◽

Text Analysis ◽

Good Accuracy ◽

Age Groups ◽

Speech Emotion Recognition ◽

Prosodic Features ◽

Voice Analysis ◽

Emotion Analysis ◽

Human Voice

Emotion Analysis is a dynamic field of research with the aim to provide a method to recognize the emotions of a person only from their voice. It is more famously recognized as the Speech Emotion Recognition (SER) problem. This problem has been studied upon from more than a decade with results coming from either Voice Analysis or Text Analysis. Individually, both these methods have shown a good accuracy up till now. But, the use of both of these methods in unison has showed a much more better result than either one of those parts considered individually. When different people of different age groups are talking, it is important to understand their emotions behind what they say as this will in turn help us in reacting better. To try and achieve this, the paper implements a model which performs Emotion Analysis based on both Tone and Text Analysis. The prosodic features of the tone are analyzed and then the speech is converted to text. Once the text has been extracted from the speech, Sentiment Analysis is done on the extracted text to further improve the accuracy of the Emotion Recognition.

Download Full-text