scholarly journals An Enhanced CNN-2D for Audio-Visual Emotion Recognition (AVER) Using ADAM Optimizer

Author(s):  
D.N.V.S.L.S. Indira, Et. al.

The importance of integrating visual components into the speech recognition process for improving robustness has been identified by recent developments in audio visual emotion recognition (AVER). Visual characteristics have a strong potential to boost the accuracy of current techniques for speech recognition and have become increasingly important when modelling speech recognizers. CNN is very good to work with images. An audio file can be converted into image file like a spectrogram with good frequency to extract hidden knowledge. This paper provides a method for emotional expression recognition using Spectrograms and CNN-2D. Spectrograms formed from the signals of speech it’s a CNN-2D input. The proposed model, which consists of three layers of CNN and those are convolution layers, pooling layers and fully connected layers extract discriminatory characteristics from the representations of spectrograms and for the seven feelings, performance estimates. This article compares the output with the existing SER using audio files and CNN. The accuracy is improved by 6.5% when CNN-2D is used.


Due to the highly variant face geometry and appearances, Facial Expression Recognition (FER) is still a challenging problem. CNN can characterize 2-D signals. Therefore, for emotion recognition in a video, the authors propose a feature selection model in AlexNet architecture to extract and filter facial features automatically. Similarly, for emotion recognition in audio, the authors use a deep LSTM-RNN. Finally, they propose a probabilistic model for the fusion of audio and visual models using facial features and speech of a subject. The model combines all the extracted features and use them to train the linear SVM (Support Vector Machine) classifiers. The proposed model outperforms the other existing models and achieves state-of-the-art performance for audio, visual and fusion models. The model classifies the seven known facial expressions, namely anger, happy, surprise, fear, disgust, sad, and neutral on the eNTERFACE’05 dataset with an overall accuracy of 76.61%.



Author(s):  
Tanoy Debnath ◽  
Md. Mahfuz Reza ◽  
Anichur Rahman ◽  
Shahab Band ◽  
Hamid Alinejad Rokny

Emotion recognition defined as identifying human emotion and is directly related to different fields such as human-computer interfaces, human emotional processing, irrational analysis, medical diagnostics, data-driven animation, human-robot communi- cation and many more. The purpose of this study is to propose a new facial emotional recognition model using convolutional neural network. Our proposed model, “ConvNet”, detects seven specific emotions from image data including anger, disgust, fear, happiness, neutrality, sadness, and surprise. This research focuses on the model’s training accuracy in a short number of epoch which the authors can develop a real-time schema that can easily fit the model and sense emotions. Furthermore, this work focuses on the mental or emotional stuff of a man or woman using the behavioral aspects. To complete the training of the CNN network model, we use the FER2013 databases, and we test the system’s success by identifying facial expressions in the real-time. ConvNet consists of four layers of convolution together with two fully connected layers. The experimental results show that the ConvNet is able to achieve 96% training accuracy which is much better than current existing models. ConvNet also achieved validation accuracy of 65% to 70% (considering different datasets used for experiments), resulting in a higher classification accuracy compared to other existing models. We also made all the materials publicly accessible for the research community at: https://github.com/Tanoy004/Emotion-recognition-through-CNN.



2021 ◽  
Author(s):  
Tanoy Debnath ◽  
Md. Mahfuz Reza ◽  
Anichur Rahman ◽  
Shahab S. Band ◽  
Hamid Alinejad-Rokny

Abstract Emotion recognition defined as identifying human emotion and is directly related to different fields such as human-computer interfaces, human emotional processing, irrational analysis, medical diagnostics, data-driven animation, human-robot communi- cation and many more. The purpose of this study is to propose a new facial emotional recognition model using convolutional neural network. Our proposed model, “ConvNet”, detects seven specific emotions from image data including anger, disgust, fear, happiness, neutrality, sadness, and surprise. This research focuses on the model’s training accuracy in a short number of epoch which the authors can develop a real-time schema that can easily fit the model and sense emotions. Furthermore, this work focuses on the mental or emotional stuff of a man or woman using the behavioral aspects. To complete the training of the CNN network model, we use the FER2013 databases, and we test the system’s success by identifying facial expressions in the real-time. ConvNet consists of four layers of convolution together with two fully connected layers. The experimental results show that the ConvNet is able to achieve 96% training accuracy which is much better than current existing models. ConvNet also achieved validation accuracy of 65% to 70% (considering different datasets used for experiments), resulting in a higher classification accuracy compared to other existing models. We also made all the materials publicly accessible for the research community at: https://github.com/Tanoy004/Emotion-recognition-through-CNN.



Author(s):  
Anand Handa ◽  
Rashi Agarwal ◽  
Narendra Kohli

Due to the highly variant face geometry and appearances, Facial Expression Recognition (FER) is still a challenging problem. CNN can characterize 2-D signals. Therefore, for emotion recognition in a video, the authors propose a feature selection model in AlexNet architecture to extract and filter facial features automatically. Similarly, for emotion recognition in audio, the authors use a deep LSTM-RNN. Finally, they propose a probabilistic model for the fusion of audio and visual models using facial features and speech of a subject. The model combines all the extracted features and use them to train the linear SVM (Support Vector Machine) classifiers. The proposed model outperforms the other existing models and achieves state-of-the-art performance for audio, visual and fusion models. The model classifies the seven known facial expressions, namely anger, happy, surprise, fear, disgust, sad, and neutral on the eNTERFACE’05 dataset with an overall accuracy of 76.61%.



2021 ◽  
pp. 1-9
Author(s):  
Harshadkumar B. Prajapati ◽  
Ankit S. Vyas ◽  
Vipul K. Dabhi

Face expression recognition (FER) has gained very much attraction to researchers in the field of computer vision because of its major usefulness in security, robotics, and HMI (Human-Machine Interaction) systems. We propose a CNN (Convolutional Neural Network) architecture to address FER. To show the effectiveness of the proposed model, we evaluate the performance of the model on JAFFE dataset. We derive a concise CNN architecture to address the issue of expression classification. Objective of various experiments is to achieve convincing performance by reducing computational overhead. The proposed CNN model is very compact as compared to other state-of-the-art models. We could achieve highest accuracy of 97.10% and average accuracy of 90.43% for top 10 best runs without any pre-processing methods applied, which justifies the effectiveness of our model. Furthermore, we have also included visualization of CNN layers to observe the learning of CNN.



2014 ◽  
Vol 571-572 ◽  
pp. 665-671 ◽  
Author(s):  
Sen Xu ◽  
Xu Zhao ◽  
Cheng Hua Duan ◽  
Xiao Lin Cao ◽  
Hui Yan Li ◽  
...  

As One of Features from other Languages, the Chinese Tone Changes of Chinese are Mainly Decided by its Vowels, so the Vowel Variation of Chinese Tone Becomes Important in Speech Recognition Research. the Normal Tone Recognition Ways are Always Based on Fundamental Frequency of Signal, which can Not Keep Integrity of Tone Signal. we Bring Forward to a Mathematical Morphological Processing of Spectrograms for the Tone of Chinese Vowels. Firstly, we will have Pretreatment to Recording Good Tone Signal by Using Cooledit Pro Software, and Converted into Spectrograms; Secondly, we will do Smooth and the Normalized Pretreatment to Spectrograms by Mathematical Morphological Processing; Finally, we get Whole Direction Angle Statistics of Tone Signal by Skeletonization way. the Neural Networks Stimulation Shows that the Speech Emotion Recognition Rate can Reach 92.50%.



2019 ◽  
Vol 15 (1) ◽  
pp. 19-36 ◽  
Author(s):  
Wiliam Acar ◽  
Rami al-Gharaibeh

Practical applications of knowledge management are hindered by a lack of linkage between the accepted data-information-knowledge hierarchy with using pragmatic approaches. Specifically, the authors seek to clarify the use of the tacit-explicit dichotomy with a deductive synthesis of complementary concepts. The authors review appropriate segments of the KM/OL literature with an emphasis on the SECI model of Nonaka and Takeuchi. Looking beyond equating the sharing of knowledge with mere socialization, the authors deduce from more recent developments a knowledge creation, nurturing and control framework. Based on a cyclic and upward-spiraling data-information-knowledge structure, the authors' proposed model affords top managers and their consultants opportunities for capturing, debating and storing richer information – as well as monitoring their progress and controlling their learning process.



2022 ◽  
Vol 12 (2) ◽  
pp. 807
Author(s):  
Huafei Xiao ◽  
Wenbo Li ◽  
Guanzhong Zeng ◽  
Yingzhang Wu ◽  
Jiyong Xue ◽  
...  

With the development of intelligent automotive human-machine systems, driver emotion detection and recognition has become an emerging research topic. Facial expression-based emotion recognition approaches have achieved outstanding results on laboratory-controlled data. However, these studies cannot represent the environment of real driving situations. In order to address this, this paper proposes a facial expression-based on-road driver emotion recognition network called FERDERnet. This method divides the on-road driver facial expression recognition task into three modules: a face detection module that detects the driver’s face, an augmentation-based resampling module that performs data augmentation and resampling, and an emotion recognition module that adopts a deep convolutional neural network pre-trained on FER and CK+ datasets and then fine-tuned as a backbone for driver emotion recognition. This method adopts five different backbone networks as well as an ensemble method. Furthermore, to evaluate the proposed method, this paper collected an on-road driver facial expression dataset, which contains various road scenarios and the corresponding driver’s facial expression during the driving task. Experiments were performed on the on-road driver facial expression dataset that this paper collected. Based on efficiency and accuracy, the proposed FERDERnet with Xception backbone was effective in identifying on-road driver facial expressions and obtained superior performance compared to the baseline networks and some state-of-the-art networks.



Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 429
Author(s):  
Linhui Li ◽  
Xin Sui ◽  
Jing Lian ◽  
Fengning Yu ◽  
Yafu Zhou

The structured road is a scene with high interaction between vehicles, but due to the high uncertainty of behavior, the prediction of vehicle interaction behavior is still a challenge. This prediction is significant for controlling the ego-vehicle. We propose an interaction behavior prediction model based on vehicle cluster (VC) by self-attention (VC-Attention) to improve the prediction performance. Firstly, a five-vehicle based cluster structure is designed to extract the interactive features between ego-vehicle and target vehicle, such as Deceleration Rate to Avoid a Crash (DRAC) and the lane gap. In addition, the proposed model utilizes the sliding window algorithm to extract VC behavior information. Then the temporal characteristics of the three interactive features mentioned above will be caught by two layers of self-attention encoder with six heads respectively. Finally, target vehicle’s future behavior will be predicted by a sub-network consists of a fully connected layer and SoftMax module. The experimental results show that this method has achieved accuracy, precision, recall, and F1 score of more than 92% and time to event of 2.9 s on a Next Generation Simulation (NGSIM) dataset. It accurately predicts the interactive behaviors in class-imbalance prediction and adapts to various driving scenarios.



Sign in / Sign up

Export Citation Format

Share Document