scholarly journals Continuous Emotion Recognition with Spatiotemporal Convolutional Neural Networks

2021 ◽  
Vol 11 (24) ◽  
pp. 11738
Author(s):  
Thomas Teixeira ◽  
Éric Granger ◽  
Alessandro Lameiras Koerich

Facial expressions are one of the most powerful ways to depict specific patterns in human behavior and describe the human emotional state. However, despite the impressive advances of affective computing over the last decade, automatic video-based systems for facial expression recognition still cannot correctly handle variations in facial expression among individuals as well as cross-cultural and demographic aspects. Nevertheless, recognizing facial expressions is a difficult task, even for humans. This paper investigates the suitability of state-of-the-art deep learning architectures based on convolutional neural networks (CNNs) to deal with long video sequences captured in the wild for continuous emotion recognition. For such an aim, several 2D CNN models that were designed to model spatial information are extended to allow spatiotemporal representation learning from videos, considering a complex and multi-dimensional emotion space, where continuous values of valence and arousal must be predicted. We have developed and evaluated convolutional recurrent neural networks, combining 2D CNNs and long short term-memory units and inflated 3D CNN models, which are built by inflating the weights of a pre-trained 2D CNN model during fine-tuning, using application-specific videos. Experimental results on the challenging SEWA-DB dataset have shown that these architectures can effectively be fine-tuned to encode spatiotemporal information from successive raw pixel images and achieve state-of-the-art results on such a dataset.

Human feelings are mental conditions of sentiments that emerge immediately as opposed to cognitive exertion. Some of the basic feelings are happy, angry, neutral, sad and surprise. These internal feelings of a person are reflected on the face as Facial Expressions. This paper presents a novel methodology for Facial Expression Analysis which will aid to develop a facial expression recognition system. This system can be used in real time to classify five basic emotions. The recognition of facial expressions is important because of its applications in many domains such as artificial intelligence, security and robotics. Many different approaches can be used to overcome the problems of Facial Expression Recognition (FER) but the best suited technique for automated FER is Convolutional Neural Networks(CNN). Thus, a novel CNN architecture is proposed and a combination of multiple datasets such as FER2013, FER+, JAFFE and CK+ is used for training and testing. This helps to improve the accuracy and develop a robust real time system. The proposed methodology confers quite good results and the obtained accuracy may give encouragement and offer support to researchers to build better models for Automated Facial Expression Recognition systems.


2021 ◽  
Vol 25 (3) ◽  
pp. 1671-1687
Author(s):  
Andreas Wunsch ◽  
Tanja Liesch ◽  
Stefan Broda

Abstract. It is now well established to use shallow artificial neural networks (ANNs) to obtain accurate and reliable groundwater level forecasts, which are an important tool for sustainable groundwater management. However, we observe an increasing shift from conventional shallow ANNs to state-of-the-art deep-learning (DL) techniques, but a direct comparison of the performance is often lacking. Although they have already clearly proven their suitability, shallow recurrent networks frequently seem to be excluded from the study design due to the euphoria about new DL techniques and its successes in various disciplines. Therefore, we aim to provide an overview on the predictive ability in terms of groundwater levels of shallow conventional recurrent ANNs, namely non-linear autoregressive networks with exogenous input (NARX) and popular state-of-the-art DL techniques such as long short-term memory (LSTM) and convolutional neural networks (CNNs). We compare the performance on both sequence-to-value (seq2val) and sequence-to-sequence (seq2seq) forecasting on a 4-year period while using only few, widely available and easy to measure meteorological input parameters, which makes our approach widely applicable. Further, we also investigate the data dependency in terms of time series length of the different ANN architectures. For seq2val forecasts, NARX models on average perform best; however, CNNs are much faster and only slightly worse in terms of accuracy. For seq2seq forecasts, mostly NARX outperform both DL models and even almost reach the speed of CNNs. However, NARX are the least robust against initialization effects, which nevertheless can be handled easily using ensemble forecasting. We showed that shallow neural networks, such as NARX, should not be neglected in comparison to DL techniques especially when only small amounts of training data are available, where they can clearly outperform LSTMs and CNNs; however, LSTMs and CNNs might perform substantially better with a larger dataset, where DL really can demonstrate its strengths, which is rarely available in the groundwater domain though.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2393 ◽  
Author(s):  
Daniel Octavian Melinte ◽  
Luige Vladareanu

The interaction between humans and an NAO robot using deep convolutional neural networks (CNN) is presented in this paper based on an innovative end-to-end pipeline method that applies two optimized CNNs, one for face recognition (FR) and another one for the facial expression recognition (FER) in order to obtain real-time inference speed for the entire process. Two different models for FR are considered, one known to be very accurate, but has low inference speed (faster region-based convolutional neural network), and one that is not as accurate but has high inference speed (single shot detector convolutional neural network). For emotion recognition transfer learning and fine-tuning of three CNN models (VGG, Inception V3 and ResNet) has been used. The overall results show that single shot detector convolutional neural network (SSD CNN) and faster region-based convolutional neural network (Faster R-CNN) models for face detection share almost the same accuracy: 97.8% for Faster R-CNN on PASCAL visual object classes (PASCAL VOCs) evaluation metrics and 97.42% for SSD Inception. In terms of FER, ResNet obtained the highest training accuracy (90.14%), while the visual geometry group (VGG) network had 87% accuracy and Inception V3 reached 81%. The results show improvements over 10% when using two serialized CNN, instead of using only the FER CNN, while the recent optimization model, called rectified adaptive moment optimization (RAdam), lead to a better generalization and accuracy improvement of 3%-4% on each emotion recognition CNN.


2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Mehmet Akif Ozdemir ◽  
Murside Degirmenci ◽  
Elif Izci ◽  
Aydin Akan

AbstractThe emotional state of people plays a key role in physiological and behavioral human interaction. Emotional state analysis entails many fields such as neuroscience, cognitive sciences, and biomedical engineering because the parameters of interest contain the complex neuronal activities of the brain. Electroencephalogram (EEG) signals are processed to communicate brain signals with external systems and make predictions over emotional states. This paper proposes a novel method for emotion recognition based on deep convolutional neural networks (CNNs) that are used to classify Valence, Arousal, Dominance, and Liking emotional states. Hence, a novel approach is proposed for emotion recognition with time series of multi-channel EEG signals from a Database for Emotion Analysis and Using Physiological Signals (DEAP). We propose a new approach to emotional state estimation utilizing CNN-based classification of multi-spectral topology images obtained from EEG signals. In contrast to most of the EEG-based approaches that eliminate spatial information of EEG signals, converting EEG signals into a sequence of multi-spectral topology images, temporal, spectral, and spatial information of EEG signals are preserved. The deep recurrent convolutional network is trained to learn important representations from a sequence of three-channel topographical images. We have achieved test accuracy of 90.62% for negative and positive Valence, 86.13% for high and low Arousal, 88.48% for high and low Dominance, and finally 86.23% for like–unlike. The evaluations of this method on emotion recognition problem revealed significant improvements in the classification accuracy when compared with other studies using deep neural networks (DNNs) and one-dimensional CNNs.


Author(s):  
Yang Yi ◽  
Feng Ni ◽  
Yuexin Ma ◽  
Xinge Zhu ◽  
Yuankai Qi ◽  
...  

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.


Author(s):  
William Dias ◽  
Fernanda Andaló ◽  
Rafael Padilha ◽  
Gabriel Bertocco ◽  
Waldir Almeida ◽  
...  

Mathematics ◽  
2020 ◽  
Vol 8 (6) ◽  
pp. 936 ◽  
Author(s):  
Nebojsa Bacanin ◽  
Timea Bezdan ◽  
Eva Tuba ◽  
Ivana Strumberger ◽  
Milan Tuba

Convolutional neural networks have a broad spectrum of practical applications in computer vision. Currently, much of the data come from images, and it is crucial to have an efficient technique for processing these large amounts of data. Convolutional neural networks have proven to be very successful in tackling image processing tasks. However, the design of a network structure for a given problem entails a fine-tuning of the hyperparameters in order to achieve better accuracy. This process takes much time and requires effort and expertise from the domain. Designing convolutional neural networks’ architecture represents a typical NP-hard optimization problem, and some frameworks for generating network structures for a specific image classification tasks have been proposed. To address this issue, in this paper, we propose the hybridized monarch butterfly optimization algorithm. Based on the observed deficiencies of the original monarch butterfly optimization approach, we performed hybridization with two other state-of-the-art swarm intelligence algorithms. The proposed hybrid algorithm was firstly tested on a set of standard unconstrained benchmark instances, and later on, it was adapted for a convolutional neural network design problem. Comparative analysis with other state-of-the-art methods and algorithms, as well as with the original monarch butterfly optimization implementation was performed for both groups of simulations. Experimental results proved that our proposed method managed to obtain higher classification accuracy than other approaches, the results of which were published in the modern computer science literature.


Author(s):  
Mohammad Amimul Ihsan Aquil ◽  
Wan Hussain Wan Ishak

<span id="docs-internal-guid-01580d49-7fff-6f2a-70d1-7893ec0a6e14"><span>Plant diseases are a major cause of destruction and death of most plants and especially trees. However, with the help of early detection, this issue can be solved and treated appropriately. A timely and accurate diagnosis is critical in maintaining the quality of crops. Recent innovations in the field of deep learning (DL), especially in convolutional neural networks (CNNs) have achieved great breakthroughs across different applications such as the classification of plant diseases. This study aims to evaluate scratch and pre-trained CNNs in the classification of tomato plant diseases by comparing some of the state-of-the-art architectures including densely connected convolutional network (Densenet) 120, residual network (ResNet) 101, ResNet 50, ReseNet 30, ResNet 18, squeezenet and Vgg.net. The comparison was then evaluated using a multiclass statistical analysis based on the F-Score, specificity, sensitivity, precision, and accuracy. The dataset used for the experiments was drawn from 9 classes of tomato diseases and a healthy class from PlantVillage. The findings show that the pretrained Densenet-120 performed excellently with 99.68% precision, 99.84% F-1 score, and 99.81% accuracy, which is higher compared to its non-trained based model showing the effectiveness of using a combination of a CNN model with fine-tuning adjustment in classifying crop diseases.</span></span>


2018 ◽  
Author(s):  
Rodrigo C. Moraes ◽  
Elloá B. Guedes ◽  
Carlos Maurício S. Figueiredo

Facial Expression is a very important factor in the social interaction of human beings. And technologies that can automatically interpret and respond to stimuli of facial expressions already find a wide variety of applications, from antidepressant drug testing to fatigue analysis of drivers and pilots. In this context, the following work presents a model for Automatic Classification of Facial Expression using as a training base the dataset Challenges in Representation Learning (FER2013), characterized by examples of spontaneous facial expressions in uncontrolled environments. The presented method is composed by a Convolutional Neural Networks Ensemble architecture, using a non-trivial voting system, based on a smart model, Xtreme Gradient Boosting - XGBoost. As performance criteria for validation of the proposed model, were used K-fold and F1 Score Micro techniques to guarantee robustness and reliability of the results, which are competitive with state-of-the-art works.


Sign in / Sign up

Export Citation Format

Share Document