MindLink-Eumpy: An Open-Source Python Toolbox for Multimodal Emotion Recognition

Emotion recognition plays an important role in intelligent human–computer interaction, but the related research still faces the problems of low accuracy and subject dependence. In this paper, an open-source software toolbox called MindLink-Eumpy is developed to recognize emotions by integrating electroencephalogram (EEG) and facial expression information. MindLink-Eumpy first applies a series of tools to automatically obtain physiological data from subjects and then analyzes the obtained facial expression data and EEG data, respectively, and finally fuses the two different signals at a decision level. In the detection of facial expressions, the algorithm used by MindLink-Eumpy is a multitask convolutional neural network (CNN) based on transfer learning technique. In the detection of EEG, MindLink-Eumpy provides two algorithms, including a subject-dependent model based on support vector machine (SVM) and a subject-independent model based on long short-term memory network (LSTM). In the decision-level fusion, weight enumerator and AdaBoost technique are applied to combine the predictions of SVM and CNN. We conducted two offline experiments on the Database for Emotion Analysis Using Physiological Signals (DEAP) dataset and the Multimodal Database for Affect Recognition and Implicit Tagging (MAHNOB-HCI) dataset, respectively, and conducted an online experiment on 15 healthy subjects. The results show that multimodal methods outperform single-modal methods in both offline and online experiments. In the subject-dependent condition, the multimodal method achieved an accuracy of 71.00% in the valence dimension and an accuracy of 72.14% in the arousal dimension. In the subject-independent condition, the LSTM-based method achieved an accuracy of 78.56% in the valence dimension and an accuracy of 77.22% in the arousal dimension. The feasibility and efficiency of MindLink-Eumpy for emotion recognition is thus demonstrated.

Download Full-text

Analysis of Direction Selectivity Arising from Recurrent Cortical Interactions

Neural Computation ◽

10.1162/089976698300017791 ◽

1998 ◽

Vol 10 (2) ◽

pp. 353-371 ◽

Cited By ~ 24

Author(s):

Paul Mineiro ◽

David Zipser

Keyword(s):

Visual Cortex ◽

Primary Visual Cortex ◽

Detailed Comparison ◽

Direction Selectivity ◽

Dynamical Model ◽

Physiological Data ◽

Model Based ◽

The Subject ◽

Neuroscience Community

The relative contributions of feedforward and recurrent connectivity to the direction-selective responses of cells in layer IVB of primary visual cortex are currently the subject of debate in the neuroscience community. Recently, biophysically detailed simulations have shown that realistic direction-selective responses can be achieved via recurrent cortical interactions between cells with nondirection-selective feedforward input (Suarez et al., 1995; Maex & Orban, 1996). Unfortunately these models, while desirable for detailed comparison with biology, are complex and thus difficult to analyze mathematically. In this article, a relatively simple cortical dynamical model is used to analyze the emergence of direction-selective responses via recurrent interactions. A comparison between a model based on our analysis and physiological data is presented. The approach also allows analysis of the recurrently propagated signal, revealing the predictive nature of the implementation.

Download Full-text

Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks

Applied Sciences ◽

10.3390/app112411738 ◽

2021 ◽

Vol 11 (24) ◽

pp. 11738

Author(s):

Thomas Teixeira ◽

Éric Granger ◽

Alessandro Lameiras Koerich

Keyword(s):

Neural Networks ◽

Facial Expression ◽

Emotion Recognition ◽

Facial Expressions ◽

Convolutional Neural Networks ◽

Affective Computing ◽

Spatial Information ◽

Short Term Memory ◽

State Of The Art ◽

Fine Tuning

Facial expressions are one of the most powerful ways to depict specific patterns in human behavior and describe the human emotional state. However, despite the impressive advances of affective computing over the last decade, automatic video-based systems for facial expression recognition still cannot correctly handle variations in facial expression among individuals as well as cross-cultural and demographic aspects. Nevertheless, recognizing facial expressions is a difficult task, even for humans. This paper investigates the suitability of state-of-the-art deep learning architectures based on convolutional neural networks (CNNs) to deal with long video sequences captured in the wild for continuous emotion recognition. For such an aim, several 2D CNN models that were designed to model spatial information are extended to allow spatiotemporal representation learning from videos, considering a complex and multi-dimensional emotion space, where continuous values of valence and arousal must be predicted. We have developed and evaluated convolutional recurrent neural networks, combining 2D CNNs and long short term-memory units and inflated 3D CNN models, which are built by inflating the weights of a pre-trained 2D CNN model during fine-tuning, using application-specific videos. Experimental results on the challenging SEWA-DB dataset have shown that these architectures can effectively be fine-tuned to encode spatiotemporal information from successive raw pixel images and achieve state-of-the-art results on such a dataset.

Download Full-text

An Open Source Classifier for Bed Mattress Signal in Infant Sleep Monitoring

Frontiers in Neuroscience ◽

10.3389/fnins.2020.602852 ◽

2021 ◽

Vol 14 ◽

Author(s):

Jukka Ranta ◽

Manu Airaksinen ◽

Turkka Kirjavainen ◽

Sampsa Vanhatalo ◽

Nathan J. Stevenson

Keyword(s):

Neural Network ◽

Open Source ◽

Short Term Memory ◽

Support Vector ◽

Svm Classifier ◽

Infant Sleep ◽

Deep Sleep ◽

Non Invasive ◽

Piezo Element ◽

Comparable Accuracy

ObjectiveTo develop a non-invasive and clinically practical method for a long-term monitoring of infant sleep cycling in the intensive care unit.MethodsForty three infant polysomnography recordings were performed at 1–18 weeks of age, including a piezo element bed mattress sensor to record respiratory and gross-body movements. The hypnogram scored from polysomnography signals was used as the ground truth in training sleep classifiers based on 20,022 epochs of movement and/or electrocardiography signals. Three classifier designs were evaluated in the detection of deep sleep (N3 state): support vector machine (SVM), Long Short-Term Memory neural network, and convolutional neural network (CNN).ResultsDeep sleep was accurately identified from other states with all classifier variants. The SVM classifier based on a combination of movement and electrocardiography features had the highest performance (AUC 97.6%). A SVM classifier based on only movement features had comparable accuracy (AUC 95.0%). The feature-independent CNN resulted in roughly comparable accuracy (AUC 93.3%).ConclusionAutomated non-invasive tracking of sleep state cycling is technically feasible using measurements from a piezo element situated under a bed mattress.SignificanceAn open source infant deep sleep detector of this kind allows quantitative, continuous bedside assessment of infant’s sleep cycling.

Download Full-text

Multimodal Decision-level Group Sentiment Prediction of Students in Classrooms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3549.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 4902-4909

Keyword(s):

Neural Network ◽

Classroom Environment ◽

Short Term Memory ◽

Human Life ◽

Support Vector ◽

Fusion Algorithm ◽

Decision Level ◽

Audio Features ◽

Twitter Account ◽

Group Emotions

Sentiment analysis can be used to study an individual or a group’s emotions and attitudes towards other people and entities like products, services, or social events. With the advancements in the field of deep learning, the enormity of available information on internet, chiefly on social media, combined with powerful computing machines, it’s just a matter of time before artificial intelligence (AI) systems make their presence in every aspect of human life, making our lives more introspective. In this paper, we propose to implement a multimodal sentiment prediction system that can analyze the emotions predicted from different modal sources such as video, audio and text and integrate them to recognize the group emotions of the students in a classroom. Our experimental setup involves a digital video camera with microphones to capture the live video and audio feeds of the students during a lecture. The students are advised to provide their digital feedback on the lecture as ‘tweets’ on their twitter account addressed to the lecturer’s official twitter account. The audio and video frames are separated from the live streaming video using tools such as lame and ffmpeg. A twitter API was used to access and extract messages from twitter platform. The audio and video features are extracted using Mel-Frequency Cepstral Co-efficients (MFCC) and Haar Cascades classifier respectively. The extracted features are then passed to the Convolutional Neural Network (CNN) model trained on the FER2013 facial images database to generate the feature vector for classification of video-based emotions. A Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM), trained on speech emotion corpus database was used to train on the audio features. A lexicon-based approach with senti-word dictionary and learning based approach with custom dataset trained by Support Vector Machines (SVM) was used in the twitter-texts based approach. A decision-level fusion algorithm was applied on these three different modal schemes to integrate the classification results and deduce the overall group emotions of the students. The use-case of this proposed system will be in student emotion recognition, employee performance feedback, monitoring or surveillance-based systems. The implemented system framework was tested in a classroom environment during a live lecture and the predicted emotions demonstrated the classification accuracy of our approach.

Download Full-text

Facial Expression Region Segmentation Based Approach to Emotion Recognition Using 2D Gabor Filter and Multiclass Support Vector Machine

2018 21st International Conference of Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2018.8631922 ◽

2018 ◽

Cited By ~ 1

Author(s):

Bayezid Islam ◽

Firoz Mahmud ◽

Arfat Hossain

Keyword(s):

Support Vector Machine ◽

Facial Expression ◽

Emotion Recognition ◽

Gabor Filter ◽

Support Vector ◽

Region Segmentation ◽

Multiclass Support Vector Machine ◽

2D Gabor Filter

Download Full-text

Weighted Feature Gaussian Kernel SVM for Emotion Recognition

Computational Intelligence and Neuroscience ◽

10.1155/2016/7696035 ◽

2016 ◽

Vol 2016 ◽

pp. 1-7 ◽

Cited By ~ 8

Author(s):

Wei Wei ◽

Qingxuan Jia

Keyword(s):

Facial Expression ◽

Emotion Recognition ◽

Kernel Function ◽

Recognition Rate ◽

Gaussian Kernel ◽

Support Vector ◽

The Past ◽

Gaussian Kernel Function ◽

Feature Based ◽

Novel Method

Emotion recognition with weighted feature based on facial expression is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, utilizing subregion recognition rate to weight kernel function. First, we divide the facial expression image into some uniform subregions and calculate corresponding recognition rate and weight. Then, we get a weighted feature Gaussian kernel function and construct a classifier based on Support Vector Machine (SVM). At last, the experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. The experiments on the extended Cohn-Kanade (CK+) dataset show that our method has achieved encouraging recognition results compared to the state-of-the-art methods.

Download Full-text

Facial Expression and Electrodermal Activity Analysis for Continuous Pain Intensity Monitoring On The X-ITE Pain Database

10.21203/rs.3.rs-927204/v1 ◽

2021 ◽

Author(s):

Ehsan Othman ◽

Philipp Werner ◽

Frerk Saxen ◽

Ayoub Al-Hamadi ◽

Sascha Gruss ◽

...

Keyword(s):

Random Forest ◽

Facial Expression ◽

Pain Intensity ◽

Short Term Memory ◽

Electrodermal Activity ◽

Decision Fusion ◽

Physiological Data ◽

Weighting Method ◽

Imbalance Problem ◽

Pain Recognition

Abstract Automatic systems enable continuous monitoring of patients' pain intensity as shown in prior studies. Facial expression and physiological data such as electrodermal activity (EDA) are very informative for pain recognition. The features extracted from EDA indicate the stress and anxiety caused by different levels of pain. In this paper, we investigate using the EDA modality and fusing two modalities (frontal RGB video and EDA) for continuous pain intensity recognition with the X-ITE Pain Database. Further, we compare the performance of automated models before and after reducing the imbalance problem in heat and electrical pain datasets that include phasic (short) and tonic (long) stimuli. We use three distinct real-time methods: A Random Forest (RF) baseline methods [Random Forest classifier (RFc) and Random Forest regression (RFr)], Long-Short Term Memory Network (LSTM), and LSTM using sample weighting method (called LSTM-SW). Experimental results (1) report the first results of continuous pain intensity recognition using EDA data on the X-ITE Pain Database, (2) show that LSTM and LSTM-SW outperform guessing and baseline methods (RFc and RFr), (3) confirm that the electrodermal activity (EDA) with most models is the best, (4) show the fusion of the output of two LSTM models using facial expression and EDA data (called Decision Fusion = DF). The DF improves results further with some datasets (e.g. Heat Phasic Dataset (HTD)).

Download Full-text

Fusion of Facial Expressions and EEG for Multimodal Emotion Recognition

Computational Intelligence and Neuroscience ◽

10.1155/2017/2107451 ◽

2017 ◽

Vol 2017 ◽

pp. 1-8 ◽

Cited By ~ 17

Author(s):

Yongrui Huang ◽

Jianhao Yang ◽

Pengkai Liao ◽

Jiahui Pan

Keyword(s):

Facial Expression ◽

Emotion Recognition ◽

Facial Expressions ◽

Multimodal Fusion ◽

Support Vector ◽

Basic Emotion ◽

Input Signals ◽

Vector Machines ◽

Fusion Methods ◽

Multimodal Emotion Recognition

This paper proposes two multimodal fusion methods between brain and peripheral signals for emotion recognition. The input signals are electroencephalogram and facial expression. The stimuli are based on a subset of movie clips that correspond to four specific areas of valance-arousal emotional space (happiness, neutral, sadness, and fear). For facial expression detection, four basic emotion states (happiness, neutral, sadness, and fear) are detected by a neural network classifier. For EEG detection, four basic emotion states and three emotion intensity levels (strong, ordinary, and weak) are detected by two support vector machines (SVM) classifiers, respectively. Emotion recognition is based on two decision-level fusion methods of both EEG and facial expression detections by using a sum rule or a production rule. Twenty healthy subjects attended two experiments. The results show that the accuracies of two multimodal fusion detections are 81.25% and 82.75%, respectively, which are both higher than that of facial expression (74.38%) or EEG detection (66.88%). The combination of facial expressions and EEG information for emotion recognition compensates for their defects as single information sources.

Download Full-text

When Old Meets New: Emotion Recognition from Speech Signals

Cognitive Computation ◽

10.1007/s12559-021-09865-2 ◽

2021 ◽

Author(s):

Keith April Araño ◽

Peter Gloor ◽

Carlotta Orsenigo ◽

Carlo Vercellis

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Image Features ◽

Speech Emotion Recognition ◽

Support Vector ◽

Full Potential ◽

Mel Frequency Cepstral Coefficients ◽

Class Prediction ◽

End To End ◽

Lstm Network

AbstractSpeech is one of the most natural communication channels for expressing human emotions. Therefore, speech emotion recognition (SER) has been an active area of research with an extensive range of applications that can be found in several domains, such as biomedical diagnostics in healthcare and human–machine interactions. Recent works in SER have been focused on end-to-end deep neural networks (DNNs). However, the scarcity of emotion-labeled speech datasets inhibits the full potential of training a deep network from scratch. In this paper, we propose new approaches for classifying emotions from speech by combining conventional mel-frequency cepstral coefficients (MFCCs) with image features extracted from spectrograms by a pretrained convolutional neural network (CNN). Unlike prior studies that employ end-to-end DNNs, our methods eliminate the resource-intensive network training process. By using the best prediction model obtained, we also build an SER application that predicts emotions in real time. Among the proposed methods, the hybrid feature set fed into a support vector machine (SVM) achieves an accuracy of 0.713 in a 6-class prediction problem evaluated on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, which is higher than the previously published results. Interestingly, MFCCs taken as unique input into a long short-term memory (LSTM) network achieve a slightly higher accuracy of 0.735. Our results reveal that the proposed approaches lead to an improvement in prediction accuracy. The empirical findings also demonstrate the effectiveness of using a pretrained CNN as an automatic feature extractor for the task of emotion prediction. Moreover, the success of the MFCC-LSTM model is evidence that, despite being conventional features, MFCCs can still outperform more sophisticated deep-learning feature sets.

Download Full-text

Valence-Arousal Model based Emotion Recognition using EEG, peripheral physiological signals and Facial Expression

Proceedings of the 4th International Conference on Machine Learning and Soft Computing ◽

10.1145/3380688.3380694 ◽

2020 ◽

Author(s):

Qingyang Zhu ◽

Guanming Lu ◽

Jingjie Yan

Keyword(s):

Facial Expression ◽

Emotion Recognition ◽

Physiological Signals ◽

Model Based ◽

Arousal Model

Download Full-text