scholarly journals Hierarchical Spatiotemporal Electroencephalogram Feature Learning and Emotion Recognition With Attention-Based Antagonism Neural Network

2021 ◽  
Vol 15 ◽  
Author(s):  
Pengwei Zhang ◽  
Chongdan Min ◽  
Kangjia Zhang ◽  
Wen Xue ◽  
Jingxia Chen

Inspired by the neuroscience research results that the human brain can produce dynamic responses to different emotions, a new electroencephalogram (EEG)-based human emotion classification model was proposed, named R2G-ST-BiLSTM, which uses a hierarchical neural network model to learn more discriminative spatiotemporal EEG features from local to global brain regions. First, the bidirectional long- and short-term memory (BiLSTM) network is used to obtain the internal spatial relationship of EEG signals on different channels within and between regions of the brain. Considering the different effects of various cerebral regions on emotions, the regional attention mechanism is introduced in the R2G-ST-BiLSTM model to determine the weight of different brain regions, which could enhance or weaken the contribution of each brain area to emotion recognition. Then a hierarchical BiLSTM network is again used to learn the spatiotemporal EEG features from regional to global brain areas, which are then input into an emotion classifier. Especially, we introduce a domain discriminator to work together with the classifier to reduce the domain offset between the training and testing data. Finally, we make experiments on the EEG data of the DEAP and SEED datasets to test and compare the performance of the models. It is proven that our method achieves higher accuracy than those of the state-of-the-art methods. Our method provides a good way to develop affective brain–computer interface applications.

2021 ◽  
Author(s):  
Farhad Zamani ◽  
Retno Wulansari

Recently, emotion recognition began to be implemented in the industry and human resource field. In the time we can perceive the emotional state of the employee, the employer could gain benefits from it as they could improve the quality of decision makings regarding their employee. Hence, this subject would become an embryo for emotion recognition tasks in the human resource field. In a fact, emotion recognition has become an important topic of research, especially one based on physiological signals, such as EEG. One of the reasons is due to the availability of EEG datasets that can be widely used by researchers. Moreover, the development of many machine learning methods has been significantly contributed to this research topic over time. Here, we investigated the classification method for emotion and propose two models to address this task, which are a hybrid of two deep learning architectures: One-Dimensional Convolutional Neural Network (CNN-1D) and Recurrent Neural Network (RNN). We implement Gated Recurrent Unit (GRU) and Long Short-Term Memory (LSTM) in the RNN architecture, that specifically designed to address the vanishing gradient problem which usually becomes an issue in the time-series dataset. We use this model to classify four emotional regions from the valence-arousal plane: High Valence High Arousal (HVHA), High Valence Low Arousal (HVLA), Low Valence High Arousal (LVHA), and Low Valence Low Arousal (LVLA). This experiment was implemented on the well-known DEAP dataset. Experimental results show that proposed methods achieve a training accuracy of 96.3% and 97.8% in the 1DCNN-GRU model and 1DCNN-LSTM model, respectively. Therefore, both models are quite robust to perform this emotion classification task.


2021 ◽  
Vol 12 ◽  
Author(s):  
Hua Zhang ◽  
Ruoyun Gou ◽  
Jili Shang ◽  
Fangyao Shen ◽  
Yifan Wu ◽  
...  

Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1579 ◽  
Author(s):  
Kyoung Ju Noh ◽  
Chi Yoon Jeong ◽  
Jiyoun Lim ◽  
Seungeun Chung ◽  
Gague Kim ◽  
...  

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.


Author(s):  
Revathi A. ◽  
Sasikaladevi N.

This chapter on multi speaker independent emotion recognition encompasses the use of perceptual features with filters spaced in Equivalent rectangular bandwidth (ERB) and BARK scale and vector quantization (VQ) classifier for classifying groups and artificial neural network with back propagation algorithm for emotion classification in a group. Performance can be improved by using the large amount of data in a pertinent emotion to adequately train the system. With the limited set of data, this proposed system has provided consistently better accuracy for the perceptual feature with critical band analysis done in ERB scale.


2018 ◽  
Vol 10 (11) ◽  
pp. 113 ◽  
Author(s):  
Yue Li ◽  
Xutao Wang ◽  
Pengjian Xu

Text classification is of importance in natural language processing, as the massive text information containing huge amounts of value needs to be classified into different categories for further use. In order to better classify text, our paper tries to build a deep learning model which achieves better classification results in Chinese text than those of other researchers’ models. After comparing different methods, long short-term memory (LSTM) and convolutional neural network (CNN) methods were selected as deep learning methods to classify Chinese text. LSTM is a special kind of recurrent neural network (RNN), which is capable of processing serialized information through its recurrent structure. By contrast, CNN has shown its ability to extract features from visual imagery. Therefore, two layers of LSTM and one layer of CNN were integrated to our new model: the BLSTM-C model (BLSTM stands for bi-directional long short-term memory while C stands for CNN.) LSTM was responsible for obtaining a sequence output based on past and future contexts, which was then input to the convolutional layer for extracting features. In our experiments, the proposed BLSTM-C model was evaluated in several ways. In the results, the model exhibited remarkable performance in text classification, especially in Chinese texts.


Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 145 ◽  
Author(s):  
Zhenglong Xiang ◽  
Xialei Dong ◽  
Yuanxiang Li ◽  
Fei Yu ◽  
Xing Xu ◽  
...  

Most of the existing research papers study the emotion recognition of Minnan songs from the perspectives of music analysis theory and music appreciation. However, these investigations do not explore any possibility of carrying out an automatic emotion recognition of Minnan songs. In this paper, we propose a model that consists of four main modules to classify the emotion of Minnan songs by using the bimodal data—song lyrics and audio. In the proposed model, an attention-based Long Short-Term Memory (LSTM) neural network is applied to extract lyrical features, and a Convolutional Neural Network (CNN) is used to extract the audio features from the spectrum. Then, two kinds of extracted features are concatenated by multimodal compact bilinear pooling, and finally, the concatenated features are input to the classifying module to determine the song emotion. We designed three experiment groups to investigate the classifying performance of combinations of the four main parts, the comparisons of proposed model with the current approaches and the influence of a few key parameters on the performance of emotion recognition. The results show that the proposed model exhibits better performance over all other experimental groups. The accuracy, precision and recall of the proposed model exceed 0.80 in a combination of appropriate parameters.


2020 ◽  
Author(s):  
Taweesak Emsawas ◽  
Tsukasa Kimura ◽  
Ken-ichi Fukui ◽  
Masayuki Numao

Abstract Brain-Computer Interface (BCI) is a communication tool between humans and systems using electroencephalography (EEG) to predict certain cognitive state aspects, such as attention or emotion. For brainwave recording, there are many types of acquisition devices created for different purposes. The wet system conducts the recording with electrode gel and can obtain high-quality brainwave signals, while the dry system expressly proposes the practical and ease of use. In this paper, we study a comparative study of wet and dry systems using two cognitive tasks: attention and music-emotion. The 3-back task is used as an assessment to measure attention and working memory in attention studies. Comparatively, the music-emotion experiments are used to predict the emotion according to the subject's questionnaires. Our analysis shows the similarities and differences between dry and wet electrodes by calculating the statistical values and frequency bands. Besides, we further study the relative characteristics by conducting the classification experiments. We proposed the end-to-end models of EEG classification, which are constructed by combining EEG-based feature extractors and classification networks. A deep convolution neural network (Deep ConvNet) and a shallow convolution neural network (Shallow ConvNet) were applied as the feature extractor of temporal and spatial filtering from raw EEG signals. The extracted feature is then forwardly conveyed to a long short-term memory ( LSTM ) to learn the dependencies of convolved features and classify attention states or emotional states. Additionally, transfer learning was utilized to improve the performance of the dry system by using transferred knowledge from the wet system. We applied the model not only on our dataset but also on the existing dataset to verify the model performance compared with the baseline techniques and the-state-of-the-art models. Using our proposed model, the result shows the significant differences between accuracy and chance level in attention classification (92.0%, S.D. 6.8%) and SEED dataset's emotion classification (75.3%, S.D. 9.3%).


Symmetry ◽  
2019 ◽  
Vol 11 (9) ◽  
pp. 1160
Author(s):  
Sangmin Park ◽  
Byung-Won On ◽  
Ryong Lee ◽  
Min-Woo Park ◽  
Sang-Hwan Lee

Overloaded vehicles such as large cargo trucks tend to cause large traffic accidents. Such traffic accidents often bring high mortality rates, including injuries and deaths, and cause fatal damage to road structures such as roads and bridges. Therefore, there is a vicious circle in which a lot of the budgets is spent for accident restoration and road maintenance. It is important to control overloaded vehicles that are around roads in urban areas. However, it often takes a lot of manpower to track down on overloaded vehicles at appropriate interception points during a specific time. Moreover, the drivers tend to avoid interception by bypassing the interception point, while exchanging interception information with each other. In this work, the main bridges in a city are chosen as the interception point. Since installing vehicle-weighing devices on the road surface is expensive and the devices cause frequent faults after the installation, inexpensive general-purpose Internet of Things (IoT) sensors, such as acceleration and gyroscope sensors, are installed on the bridges. First, assuming that the sensing value of the overloaded vehicle is different from the nonoverloaded vehicle, we investigate the difference in the sensed values between the overloaded and nonoverloaded vehicles. Then, based on the hypothesis, we propose a new method to identify prime time zones with overloaded vehicles. Technically, the proposed method comprises two steps. In the first step, we propose a new bridge traffic classification model using Bidirectional Long Short-Term Memory (Bi–LSTM) that automatically classifies time series data to either high or low traffic condition. The Bi–LSTM model has higher accuracy than existing neural network models because it has a symmetric neural network structure, by which input information can be processed in forward and backward directions. In the second step, we propose a new method of automatically identifying top-k time zones with many overloaded vehicles under the high traffic condition. It first uses the k-Nearest Neighbor (NN) algorithm to find the sensing value, most similar to the actual sensing value of the overloaded vehicle, in the high traffic cluster. According to the experimental results, there is a high difference of the sensing values between the overloaded and the nonoverloaded vehicle, through statistical verification. Also, the accuracy of the proposed method in the first step is ~75%, and the top-k time zones in which overloaded vehicles are crowded are identified automatically.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6719
Author(s):  
Longbin Jin ◽  
Eun Yi Kim

Electroencephalogram (EEG)-based emotion recognition is receiving significant attention in research on brain-computer interfaces (BCI) and health care. To recognize cross-subject emotion from EEG data accurately, a technique capable of finding an effective representation robust to the subject-specific variability associated with EEG data collection processes is necessary. In this paper, a new method to predict cross-subject emotion using time-series analysis and spatial correlation is proposed. To represent the spatial connectivity between brain regions, a channel-wise feature is proposed, which can effectively handle the correlation between all channels. The channel-wise feature is defined by a symmetric matrix, the elements of which are calculated by the Pearson correlation coefficient between two-pair channels capable of complementarily handling subject-specific variability. The channel-wise features are then fed to two-layer stacked long short-term memory (LSTM), which can extract temporal features and learn an emotional model. Extensive experiments on two publicly available datasets, the Dataset for Emotion Analysis using Physiological Signals (DEAP) and the SJTU (Shanghai Jiao Tong University) Emotion EEG Dataset (SEED), demonstrate the effectiveness of the combined use of channel-wise features and LSTM. Experimental results achieve state-of-the-art classification rates of 98.93% and 99.10% during the two-class classification of valence and arousal in DEAP, respectively, with an accuracy of 99.63% during three-class classification in SEED.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 124928-124938 ◽  
Author(s):  
Simin Wang ◽  
Junhuai Li ◽  
Ting Cao ◽  
Huaijun Wang ◽  
Pengjia Tu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document