scholarly journals Emotion Recognition Expressed on the Face By Multimodal Method using Deep Learning

Emotional recognition plays a vital role in the behavioral and emotional interactions between humans. It is a difficult task because it relies on the prediction of abstract emotional states from multimodal input data. Emotion recognition systems operate in three phases. A first that consists of taking input data from the real world through sensors. Then extract the emotional characteristics to predict the emotion. To do this, methods are used to exaction and classification. Deep learning methods allow recognition in different ways. In this article, we are interested in facial expression. We proceed to the extraction of emotional characteristics expressed on the face in two ways by two different methods. On the one hand, we use Gabor filters to extract textures and facial appearances for different scales and orientations. On the other hand, we extract movements of the face muscles namely eyes, eyebrows, nose and mouth. Then we make an entire classification using the convolutional neural networks (CNN) and then a decision-level merge. The convolutional network model has been training and validating on datasets.

Face recognition plays a vital role in security purpose. In recent years, the researchers have focused on the pose illumination, face recognition, etc,. The traditional methods of face recognition focus on Open CV’s fisher faces which results in analyzing the face expressions and attributes. Deep learning method used in this proposed system is Convolutional Neural Network (CNN). Proposed work includes the following modules: [1] Face Detection [2] Gender Recognition [3] Age Prediction. Thus the results obtained from this work prove that real time age and gender detection using CNN provides better accuracy results compared to other existing approaches.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3046
Author(s):  
Shervin Minaee ◽  
Mehdi Minaei ◽  
Amirali Abdolrashidi

Facial expression recognition has been an active area of research over the past few decades, and it is still challenging due to the high intra-class variation. Traditional approaches for this problem rely on hand-crafted features such as SIFT, HOG, and LBP, followed by a classifier trained on a database of images or videos. Most of these works perform reasonably well on datasets of images captured in a controlled condition but fail to perform as well on more challenging datasets with more image variation and partial faces. In recent years, several works proposed an end-to-end framework for facial expression recognition using deep learning models. Despite the better performance of these works, there are still much room for improvement. In this work, we propose a deep learning approach based on attentional convolutional network that is able to focus on important parts of the face and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique that is able to find important facial regions to detect different emotions based on the classifier’s output. Through experimental results, we show that different emotions are sensitive to different parts of the face.


Author(s):  
Liping Zhou ◽  
Mingwei Gao ◽  
Chun He

At present, the correct recognition rate of face recognition algorithm is limited under unconstrained conditions. To solve this problem, a face recognition algorithm based on deep learning under unconstrained conditions is proposed in this paper. The algorithm takes LBP texture feature as the input data of deep network, and trains the network layer by layer greedily to obtain optimized parameters of network, and then uses the trained network to predict the test samples. Experimental results on the face database LFW show that the proposed algorithm has higher correct recognition rate than some traditional algorithms under unconstrained conditions. In order to further verify its effectiveness and universality, this algorithm was also tested in YALE and YALE-B, and achieved a high correct recognition rate as well, which indicated that the deep learning method using LBP texture feature as input data is effective and robust to face recognition.


2021 ◽  
pp. 1-17
Author(s):  
Naveen Masood ◽  
Humera Farooq

Most of the electroencephalography (EEG) based emotion recognition systems rely on single stimulus to evoke emotions. EEG data is mostly recorded with higher number of electrodes that can lead to data redundancy and longer experimental setup time. The question “whether the configuration with lesser number of electrodes is common amongst different stimuli presentation paradigms” remains unanswered. There are publicly available datasets for EEG based human emotional states recognition. Since this work is focused towards classifying emotions while subjects are experiencing different stimuli, therefore we need to perform new experiments. Keeping aforementioned issues in consideration, this work presents a novel experimental study that records EEG data for three different human emotional states evoked with four different stimuli presentation paradigms. A methodology based on iterative Genetic Algorithm in combination with majority voting has been used to achieve configuration with reduced number of EEG electrodes keeping in consideration minimum loss of classification accuracy. The results obtained are comparable with recent studies. Stimulus independent configurations with lesser number of electrodes lead towards low computational complexity as well as reduced set up time for future EEG based smart systems for emotions recognition


Author(s):  
T. Peters ◽  
C. Brenner ◽  
M. Song

Abstract. The goal of this paper is to use transfer learning for semi supervised semantic segmentation in 2D images: given a pretrained deep convolutional network (DCNN), our aim is to adapt it to a new camera-sensor system by enforcing predictions to be consistent for the same object in space. This is enabled by projecting 3D object points into multi-view 2D images. Since every 3D object point is usually mapped to a number of 2D images, each of which undergoes a pixelwise classification using the pretrained DCNN, we obtain a number of predictions (labels) for the same object point. This makes it possible to detect and correct outlier predictions. Ultimately, we retrain the DCNN on the corrected dataset in order to adapt the network to the new input data. We demonstrate the effectiveness of our approach on a mobile mapping dataset containing over 10’000 images and more than 1 billion 3D points. Moreover, we manually annotated a subset of the mobile mapping images and show that we were able to rise the mean intersection over union (mIoU) by approximately 10% with Deeplabv3+, using our approach.


2021 ◽  
Vol 25 (3) ◽  
pp. 1717-1730
Author(s):  
Esma Mansouri-Benssassi ◽  
Juan Ye

AbstractEmotion recognition through facial expression and non-verbal speech represents an important area in affective computing. They have been extensively studied from classical feature extraction techniques to more recent deep learning approaches. However, most of these approaches face two major challenges: (1) robustness—in the face of degradation such as noise, can a model still make correct predictions? and (2) cross-dataset generalisation—when a model is trained on one dataset, can it be used to make inference on another dataset?. To directly address these challenges, we first propose the application of a spiking neural network (SNN) in predicting emotional states based on facial expression and speech data, then investigate, and compare their accuracy when facing data degradation or unseen new input. We evaluate our approach on third-party, publicly available datasets and compare to the state-of-the-art techniques. Our approach demonstrates robustness to noise, where it achieves an accuracy of 56.2% for facial expression recognition (FER) compared to 22.64% and 14.10% for CNN and SVM, respectively, when input images are degraded with the noise intensity of 0.5, and the highest accuracy of 74.3% for speech emotion recognition (SER) compared to 21.95% of CNN and 14.75% for SVM when audio white noise is applied. For generalisation, our approach achieves consistently high accuracy of 89% for FER and 70% for SER in cross-dataset evaluation and suggests that it can learn more effective feature representations, which lead to good generalisation of facial features and vocal characteristics across subjects.


2021 ◽  
Vol 17 ◽  
pp. 28-40
Author(s):  
Isah Salim Ahmad ◽  
Shuai Zhang ◽  
Sani Saminu ◽  
Lingyue Wang ◽  
Abd El Kader Isselmou ◽  
...  

Emotion recognition based on brain-computer interface (BCI) has attracted important research attention despite its difficulty. It plays a vital role in human cognition and helps in making the decision. Many researchers use electroencephalograms (EEG) signals to study emotion because of its easy and convenient. Deep learning has been employed for the emotion recognition system. It recognizes emotion into single or multi-models, with visual or music stimuli shown on a screen. In this article, the convolutional neural network (CNN) model is introduced to simultaneously learn the feature and recognize the emotion of positive, neutral, and negative states of pure EEG signals single model based on the SJTU emotion EEG dataset (SEED) with ResNet50 and Adam optimizer. The dataset is shuffle, divided into training and testing, and then fed to the CNN model. The negative emotion has the highest accuracy of 94.86% fellow by neutral emotion with 94.29% and positive emotion with 93.25% respectively. With average accuracy of 94.13%. The results showed excellent classification ability of the model and can improve emotion recognition.


Author(s):  
Thanh-Tam NGUYEN ◽  
Son-Thai LE ◽  
Van-Thuy LE

One of the widely used prominent biometric techniques for identity authentication is Face Recognition. It plays an essential role in many areas, such as daily life, public security, finance, the military, and the smart school. The facial recognition task is identifying or verifying the identity of a person base on their face. The first step is face detection, which detects and locates human faces in images and videos. The face match process then finds an identity of the detected face. In recent years there have been many face recognition systems improving the performance based on deep learning models. Deep learning learns representations of the face based on multiple processing layers with multiple levels of feature extraction. This approach has made sufficient improvement in face recognition since 2014, launched by the breakthroughs of DeepFace and DeepID. However, finding a way to choose the best hyperparameters remains an open question. In this paper, we introduce a method for adaptive hyperparameters selection to improve recognition accuracy. The proposed method achieves improvements on three datasets.


2022 ◽  
Vol 12 (1) ◽  
pp. 527
Author(s):  
Fei Ma ◽  
Yang Li ◽  
Shiguang Ni ◽  
Shaolun Huang ◽  
Lin Zhang

Audio–visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human–machine interactions. With the help of deep learning, previous works have made great progress for audio–visual emotion recognition. However, these deep learning methods often require a large amount of data for training. In reality, data acquisition is difficult and expensive, especially for the multimodal data with different modalities. As a result, the training data may be in the low-data regime, which cannot be effectively used for deep learning. In addition, class imbalance may occur in the emotional data, which can further degrade the performance of audio–visual emotion recognition. To address these problems, we propose an efficient data augmentation framework by designing a multimodal conditional generative adversarial network (GAN) for audio–visual emotion recognition. Specifically, we design generators and discriminators for audio and visual modalities. The category information is used as their shared input to make sure our GAN can generate fake data of different categories. In addition, the high dependence between the audio modality and the visual modality in the generated multimodal data is modeled based on Hirschfeld–Gebelein–Re´nyi (HGR) maximal correlation. In this way, we relate different modalities in the generated data to approximate the real data. Then, the generated data are used to augment our data manifold. We further apply our approach to deal with the problem of class imbalance. To the best of our knowledge, this is the first work to propose a data augmentation strategy with a multimodal conditional GAN for audio–visual emotion recognition. We conduct a series of experiments on three public multimodal datasets, including eNTERFACE’05, RAVDESS, and CMEW. The results indicate that our multimodal conditional GAN has high effectiveness for data augmentation of audio–visual emotion recognition.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Lin Jiang ◽  
Jia Chen ◽  
Hiroyoshi Todo ◽  
Zheng Tang ◽  
Sicheng Liu ◽  
...  

With the development of society, deep learning has been widely used in object detection, face recognition, speech recognition, and other fields. Among them, object detection is a popular direction in computer vision and digital image processing, and face detection is a focus of this hot direction. Although face detection technology has gone through a long research stage, it is still considered as one of the more difficult subjects in human feature detection technology. In addition, the face detection technology itself has two sides, imperceptibility and complexity of the environment, and other defects cause the existing technology to be unable to accurately recognize faces of different proportions, obscured and different postures. Therefore, this paper adopts an advanced deep learning method based on machine vision to detect human faces automatically. In order to accurately detect a variety of human faces, a multiscale fast RCNN method based on upper and lower layers (UPL-RCNN) is proposed. The network is composed of spatial affine transformation components and feature region components (ROI). This method plays a vital role in face detection. First of all, multiscale information can be grouped in detection, so as to deal with small areas of the face. Then, the method can use the inspiration of the human visual system to perform contextual reasoning and spatial transformation, including zooming, cutting, and rotating. Through comparative experiments, the analysis results show that this method can not only accurately detect human faces but also has better performance than fast RCNN. Compared with some advanced methods, this method has the advantages of high accuracy, less time consumption, and no correlation mark.


Sign in / Sign up

Export Citation Format

Share Document