scholarly journals MindLink-Eumpy: An Open-Source Python Toolbox for Multimodal Emotion Recognition

2021 ◽  
Vol 15 ◽  
Author(s):  
Ruixin Li ◽  
Yan Liang ◽  
Xiaojian Liu ◽  
Bingbing Wang ◽  
Wenxin Huang ◽  
...  

Emotion recognition plays an important role in intelligent human–computer interaction, but the related research still faces the problems of low accuracy and subject dependence. In this paper, an open-source software toolbox called MindLink-Eumpy is developed to recognize emotions by integrating electroencephalogram (EEG) and facial expression information. MindLink-Eumpy first applies a series of tools to automatically obtain physiological data from subjects and then analyzes the obtained facial expression data and EEG data, respectively, and finally fuses the two different signals at a decision level. In the detection of facial expressions, the algorithm used by MindLink-Eumpy is a multitask convolutional neural network (CNN) based on transfer learning technique. In the detection of EEG, MindLink-Eumpy provides two algorithms, including a subject-dependent model based on support vector machine (SVM) and a subject-independent model based on long short-term memory network (LSTM). In the decision-level fusion, weight enumerator and AdaBoost technique are applied to combine the predictions of SVM and CNN. We conducted two offline experiments on the Database for Emotion Analysis Using Physiological Signals (DEAP) dataset and the Multimodal Database for Affect Recognition and Implicit Tagging (MAHNOB-HCI) dataset, respectively, and conducted an online experiment on 15 healthy subjects. The results show that multimodal methods outperform single-modal methods in both offline and online experiments. In the subject-dependent condition, the multimodal method achieved an accuracy of 71.00% in the valence dimension and an accuracy of 72.14% in the arousal dimension. In the subject-independent condition, the LSTM-based method achieved an accuracy of 78.56% in the valence dimension and an accuracy of 77.22% in the arousal dimension. The feasibility and efficiency of MindLink-Eumpy for emotion recognition is thus demonstrated.

1998 ◽  
Vol 10 (2) ◽  
pp. 353-371 ◽  
Author(s):  
Paul Mineiro ◽  
David Zipser

The relative contributions of feedforward and recurrent connectivity to the direction-selective responses of cells in layer IVB of primary visual cortex are currently the subject of debate in the neuroscience community. Recently, biophysically detailed simulations have shown that realistic direction-selective responses can be achieved via recurrent cortical interactions between cells with nondirection-selective feedforward input (Suarez et al., 1995; Maex & Orban, 1996). Unfortunately these models, while desirable for detailed comparison with biology, are complex and thus difficult to analyze mathematically. In this article, a relatively simple cortical dynamical model is used to analyze the emergence of direction-selective responses via recurrent interactions. A comparison between a model based on our analysis and physiological data is presented. The approach also allows analysis of the recurrently propagated signal, revealing the predictive nature of the implementation.


2021 ◽  
Vol 11 (24) ◽  
pp. 11738
Author(s):  
Thomas Teixeira ◽  
Éric Granger ◽  
Alessandro Lameiras Koerich

Facial expressions are one of the most powerful ways to depict specific patterns in human behavior and describe the human emotional state. However, despite the impressive advances of affective computing over the last decade, automatic video-based systems for facial expression recognition still cannot correctly handle variations in facial expression among individuals as well as cross-cultural and demographic aspects. Nevertheless, recognizing facial expressions is a difficult task, even for humans. This paper investigates the suitability of state-of-the-art deep learning architectures based on convolutional neural networks (CNNs) to deal with long video sequences captured in the wild for continuous emotion recognition. For such an aim, several 2D CNN models that were designed to model spatial information are extended to allow spatiotemporal representation learning from videos, considering a complex and multi-dimensional emotion space, where continuous values of valence and arousal must be predicted. We have developed and evaluated convolutional recurrent neural networks, combining 2D CNNs and long short term-memory units and inflated 3D CNN models, which are built by inflating the weights of a pre-trained 2D CNN model during fine-tuning, using application-specific videos. Experimental results on the challenging SEWA-DB dataset have shown that these architectures can effectively be fine-tuned to encode spatiotemporal information from successive raw pixel images and achieve state-of-the-art results on such a dataset.


2021 ◽  
Vol 14 ◽  
Author(s):  
Jukka Ranta ◽  
Manu Airaksinen ◽  
Turkka Kirjavainen ◽  
Sampsa Vanhatalo ◽  
Nathan J. Stevenson

ObjectiveTo develop a non-invasive and clinically practical method for a long-term monitoring of infant sleep cycling in the intensive care unit.MethodsForty three infant polysomnography recordings were performed at 1–18 weeks of age, including a piezo element bed mattress sensor to record respiratory and gross-body movements. The hypnogram scored from polysomnography signals was used as the ground truth in training sleep classifiers based on 20,022 epochs of movement and/or electrocardiography signals. Three classifier designs were evaluated in the detection of deep sleep (N3 state): support vector machine (SVM), Long Short-Term Memory neural network, and convolutional neural network (CNN).ResultsDeep sleep was accurately identified from other states with all classifier variants. The SVM classifier based on a combination of movement and electrocardiography features had the highest performance (AUC 97.6%). A SVM classifier based on only movement features had comparable accuracy (AUC 95.0%). The feature-independent CNN resulted in roughly comparable accuracy (AUC 93.3%).ConclusionAutomated non-invasive tracking of sleep state cycling is technically feasible using measurements from a piezo element situated under a bed mattress.SignificanceAn open source infant deep sleep detector of this kind allows quantitative, continuous bedside assessment of infant’s sleep cycling.


Sentiment analysis can be used to study an individual or a group’s emotions and attitudes towards other people and entities like products, services, or social events. With the advancements in the field of deep learning, the enormity of available information on internet, chiefly on social media, combined with powerful computing machines, it’s just a matter of time before artificial intelligence (AI) systems make their presence in every aspect of human life, making our lives more introspective. In this paper, we propose to implement a multimodal sentiment prediction system that can analyze the emotions predicted from different modal sources such as video, audio and text and integrate them to recognize the group emotions of the students in a classroom. Our experimental setup involves a digital video camera with microphones to capture the live video and audio feeds of the students during a lecture. The students are advised to provide their digital feedback on the lecture as ‘tweets’ on their twitter account addressed to the lecturer’s official twitter account. The audio and video frames are separated from the live streaming video using tools such as lame and ffmpeg. A twitter API was used to access and extract messages from twitter platform. The audio and video features are extracted using Mel-Frequency Cepstral Co-efficients (MFCC) and Haar Cascades classifier respectively. The extracted features are then passed to the Convolutional Neural Network (CNN) model trained on the FER2013 facial images database to generate the feature vector for classification of video-based emotions. A Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM), trained on speech emotion corpus database was used to train on the audio features. A lexicon-based approach with senti-word dictionary and learning based approach with custom dataset trained by Support Vector Machines (SVM) was used in the twitter-texts based approach. A decision-level fusion algorithm was applied on these three different modal schemes to integrate the classification results and deduce the overall group emotions of the students. The use-case of this proposed system will be in student emotion recognition, employee performance feedback, monitoring or surveillance-based systems. The implemented system framework was tested in a classroom environment during a live lecture and the predicted emotions demonstrated the classification accuracy of our approach.


2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Wei Wei ◽  
Qingxuan Jia

Emotion recognition with weighted feature based on facial expression is a challenging research topic and has attracted great attention in the past few years. This paper presents a novel method, utilizing subregion recognition rate to weight kernel function. First, we divide the facial expression image into some uniform subregions and calculate corresponding recognition rate and weight. Then, we get a weighted feature Gaussian kernel function and construct a classifier based on Support Vector Machine (SVM). At last, the experimental results suggest that the approach based on weighted feature Gaussian kernel function has good performance on the correct rate in emotion recognition. The experiments on the extended Cohn-Kanade (CK+) dataset show that our method has achieved encouraging recognition results compared to the state-of-the-art methods.


2021 ◽  
Author(s):  
Ehsan Othman ◽  
Philipp Werner ◽  
Frerk Saxen ◽  
Ayoub Al-Hamadi ◽  
Sascha Gruss ◽  
...  

Abstract Automatic systems enable continuous monitoring of patients' pain intensity as shown in prior studies. Facial expression and physiological data such as electrodermal activity (EDA) are very informative for pain recognition. The features extracted from EDA indicate the stress and anxiety caused by different levels of pain. In this paper, we investigate using the EDA modality and fusing two modalities (frontal RGB video and EDA) for continuous pain intensity recognition with the X-ITE Pain Database. Further, we compare the performance of automated models before and after reducing the imbalance problem in heat and electrical pain datasets that include phasic (short) and tonic (long) stimuli. We use three distinct real-time methods: A Random Forest (RF) baseline methods [Random Forest classifier (RFc) and Random Forest regression (RFr)], Long-Short Term Memory Network (LSTM), and LSTM using sample weighting method (called LSTM-SW). Experimental results (1) report the first results of continuous pain intensity recognition using EDA data on the X-ITE Pain Database, (2) show that LSTM and LSTM-SW outperform guessing and baseline methods (RFc and RFr), (3) confirm that the electrodermal activity (EDA) with most models is the best, (4) show the fusion of the output of two LSTM models using facial expression and EDA data (called Decision Fusion = DF). The DF improves results further with some datasets (e.g. Heat Phasic Dataset (HTD)).


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Yongrui Huang ◽  
Jianhao Yang ◽  
Pengkai Liao ◽  
Jiahui Pan

This paper proposes two multimodal fusion methods between brain and peripheral signals for emotion recognition. The input signals are electroencephalogram and facial expression. The stimuli are based on a subset of movie clips that correspond to four specific areas of valance-arousal emotional space (happiness, neutral, sadness, and fear). For facial expression detection, four basic emotion states (happiness, neutral, sadness, and fear) are detected by a neural network classifier. For EEG detection, four basic emotion states and three emotion intensity levels (strong, ordinary, and weak) are detected by two support vector machines (SVM) classifiers, respectively. Emotion recognition is based on two decision-level fusion methods of both EEG and facial expression detections by using a sum rule or a production rule. Twenty healthy subjects attended two experiments. The results show that the accuracies of two multimodal fusion detections are 81.25% and 82.75%, respectively, which are both higher than that of facial expression (74.38%) or EEG detection (66.88%). The combination of facial expressions and EEG information for emotion recognition compensates for their defects as single information sources.


Author(s):  
Keith April Araño ◽  
Peter Gloor ◽  
Carlotta Orsenigo ◽  
Carlo Vercellis

AbstractSpeech is one of the most natural communication channels for expressing human emotions. Therefore, speech emotion recognition (SER) has been an active area of research with an extensive range of applications that can be found in several domains, such as biomedical diagnostics in healthcare and human–machine interactions. Recent works in SER have been focused on end-to-end deep neural networks (DNNs). However, the scarcity of emotion-labeled speech datasets inhibits the full potential of training a deep network from scratch. In this paper, we propose new approaches for classifying emotions from speech by combining conventional mel-frequency cepstral coefficients (MFCCs) with image features extracted from spectrograms by a pretrained convolutional neural network (CNN). Unlike prior studies that employ end-to-end DNNs, our methods eliminate the resource-intensive network training process. By using the best prediction model obtained, we also build an SER application that predicts emotions in real time. Among the proposed methods, the hybrid feature set fed into a support vector machine (SVM) achieves an accuracy of 0.713 in a 6-class prediction problem evaluated on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, which is higher than the previously published results. Interestingly, MFCCs taken as unique input into a long short-term memory (LSTM) network achieve a slightly higher accuracy of 0.735. Our results reveal that the proposed approaches lead to an improvement in prediction accuracy. The empirical findings also demonstrate the effectiveness of using a pretrained CNN as an automatic feature extractor for the task of emotion prediction. Moreover, the success of the MFCC-LSTM model is evidence that, despite being conventional features, MFCCs can still outperform more sophisticated deep-learning feature sets.


Sign in / Sign up

Export Citation Format

Share Document