JDAT: Joint-Dimension-Aware Transformer with Strong Flexibility for EEG Emotion Recognition

10.36227/techrxiv.17056961.v1 ◽

2021 ◽

Author(s):

Zeyu Wang ◽

Ziqun Zhou ◽

Haibin Shen ◽

Qi Xu ◽

Kejie Huang

Keyword(s):

Emotion Recognition ◽

State Of The Art ◽

Compact Model ◽

Inference Process ◽

Robust Model ◽

Space Frequency ◽

Conventional Models ◽

Signal Activation ◽

Eeg Features ◽

Activation Phase

<div>Electroencephalography (EEG) emotion recognition, an important task in Human-Computer Interaction (HCI), has made a great breakthrough with the help of deep learning algorithms. Although the application of attention mechanism on conventional models has improved its performance, most previous research rarely focused on multiplex EEG features jointly, lacking a compact model with unified attention modules. This study proposes Joint-Dimension-Aware Transformer (JDAT), a robust model based on squeezed Multi-head Self-Attention (MSA) mechanism for EEG emotion recognition. The adaptive squeezed MSA applied on multidimensional features enables JDAT to focus on diverse EEG information, including space, frequency, and time. Under the joint attention, JDAT is sensitive to the complicated brain activities, such as signal activation, phase-intensity couplings, and resonance. Moreover, its gradually compressed structure contains no recurrent or parallel modules, greatly reducing the memory and complexity, and accelerating the inference process. The proposed JDAT is evaluated on DEAP, DREAMER, and SEED datasets, and experimental results show that it outperforms state-of-the-art methods along with stronger flexibility.</div>

Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition

Sensors ◽

10.3390/s21124233 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4233

Author(s):

Bogdan Mocanu ◽

Ruxandra Tapu ◽

Titus Zaharia

Keyword(s):

Emotion Recognition ◽

Loss Function ◽

State Of The Art ◽

Disease Diagnosis ◽

Data Representation ◽

Speech Emotion Recognition ◽

Audio Features ◽

Global Accuracy ◽

Space Data ◽

Art Techniques

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%.

Deep learning approaches for speech emotion recognition: state of the art and research challenges

Multimedia Tools and Applications ◽

10.1007/s11042-020-09874-7 ◽

2021 ◽

Author(s):

Rashid Jahangir ◽

Ying Wah Teh ◽

Faiqa Hanif ◽

Ghulam Mujtaba

Keyword(s):

Deep Learning ◽

Emotion Recognition ◽

State Of The Art ◽

Speech Emotion Recognition ◽

Learning Approaches ◽

Research Challenges

A robust framework for shoulder implant X-ray image classification

Data Technologies and Applications ◽

10.1108/dta-08-2021-0210 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Minh Thanh Vo ◽

Anh H. Vo ◽

Tuong Le

Keyword(s):

Image Classification ◽

Performance Metrics ◽

State Of The Art ◽

Area Under The Curve ◽

Predictive Performance ◽

Extraction Process ◽

Content Type ◽

X Ray ◽

Robust Model ◽

Input Dataset

PurposeMedical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.Design/methodology/approachThis study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.FindingsExperiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.Originality/valueThe proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.

Label Enhancement for Label Distribution Learning via Prior Knowledge

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/446 ◽

2020 ◽

Author(s):

Yongbiao Gao ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Emotion Recognition ◽

Prior Knowledge ◽

Decision Process ◽

Age Estimation ◽

State Of The Art ◽

Learning Agent ◽

Label Distribution Learning ◽

Label Distribution

Label distribution learning (LDL) is a novel machine learning paradigm that gives a description degree of each label to an instance. However, most of training datasets only contain simple logical labels rather than label distributions due to the difficulty of obtaining the label distributions directly. We propose to use the prior knowledge to recover the label distributions. The process of recovering the label distributions from the logical labels is called label enhancement. In this paper, we formulate the label enhancement as a dynamic decision process. Thus, the label distribution is adjusted by a series of actions conducted by a reinforcement learning agent according to sequential state representations. The target state is defined by the prior knowledge. Experimental results show that the proposed approach outperforms the state-of-the-art methods in both age estimation and image emotion recognition.

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015466 ◽

2019 ◽

Vol 33 ◽

pp. 5466-5473 ◽

Cited By ~ 2

Author(s):

Yingce Xia ◽

Tianyu He ◽

Xu Tan ◽

Fei Tian ◽

Di He ◽

...

Keyword(s):

Machine Translation ◽

English Translation ◽

State Of The Art ◽

Compact Model ◽

Word Embeddings ◽

Simple Method ◽

Neural Machine Translation ◽

German Translation ◽

One Step ◽

Target Side

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Emotion Assessment Using Feature Fusion and Decision Fusion Classification Based on Physiological Data: Are We There Yet?

Sensors ◽

10.3390/s20174723 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4723

Author(s):

Patrícia Bota ◽

Chen Wang ◽

Ana Fred ◽

Hugo Silva

Keyword(s):

Emotion Recognition ◽

Classification Accuracy ◽

Feature Fusion ◽

State Of The Art ◽

Electrodermal Activity ◽

Decision Fusion ◽

Classification Performance ◽

Physiological Data ◽

Systematic Analysis ◽

Public Datasets

Emotion recognition based on physiological data classification has been a topic of increasingly growing interest for more than a decade. However, there is a lack of systematic analysis in literature regarding the selection of classifiers to use, sensor modalities, features and range of expected accuracy, just to name a few limitations. In this work, we evaluate emotion in terms of low/high arousal and valence classification through Supervised Learning (SL), Decision Fusion (DF) and Feature Fusion (FF) techniques using multimodal physiological data, namely, Electrocardiography (ECG), Electrodermal Activity (EDA), Respiration (RESP), or Blood Volume Pulse (BVP). The main contribution of our work is a systematic study across five public datasets commonly used in the Emotion Recognition (ER) state-of-the-art, namely: (1) Classification performance analysis of ER benchmarking datasets in the arousal/valence space; (2) Summarising the ranges of the classification accuracy reported across the existing literature; (3) Characterising the results for diverse classifiers, sensor modalities and feature set combinations for ER using accuracy and F1-score; (4) Exploration of an extended feature set for each modality; (5) Systematic analysis of multimodal classification in DF and FF approaches. The experimental results showed that FF is the most competitive technique in terms of classification accuracy and computational complexity. We obtain superior or comparable results to those reported in the state-of-the-art for the selected datasets.

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5364 ◽

2020 ◽

Vol 34 (01) ◽

pp. 303-311 ◽

Cited By ~ 3

Author(s):

Sicheng Zhao ◽

Yunsheng Ma ◽

Yang Gu ◽

Jufeng Yang ◽

Tengfei Xing ◽

...

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

State Of The Art ◽

Source Code ◽

Cross Entropy ◽

Attention Network ◽

Audio Features ◽

End To End ◽

3D Cnn ◽

And Training

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

Dynamic Network Pruning with Interpretable Layerwise Channel Selection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6098 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6299-6306

Author(s):

Yulong Wang ◽

Xiaolu Zhang ◽

Xiaolin Hu ◽

Bo Zhang ◽

Hang Su

Keyword(s):

Prediction Accuracy ◽

State Of The Art ◽

Dynamic Network ◽

Detection Algorithm ◽

Network Pruning ◽

Robust Model ◽

Adversarial Examples ◽

Adversarial Example ◽

Decision Paths ◽

Pruning Methods

Dynamic network pruning achieves runtime acceleration by dynamically determining the inference paths based on different inputs. However, previous methods directly generate continuous decision values for each weight channel, which cannot reflect a clear and interpretable pruning process. In this paper, we propose to explicitly model the discrete weight channel selections, which encourages more diverse weights utilization, and achieves more sparse runtime inference paths. Meanwhile, with the help of interpretable layerwise channel selections in the dynamic network, we can visualize the network decision paths explicitly for model interpretability. We observe that there are clear differences in the layerwise decisions between normal and adversarial examples. Therefore, we propose a novel adversarial example detection algorithm by discriminating the runtime decision features. Experiments show that our dynamic network achieves higher prediction accuracy under the similar computing budgets on CIFAR10 and ImageNet datasets compared to traditional static pruning methods and other dynamic pruning approaches. The proposed adversarial detection algorithm can significantly improve the state-of-the-art detection rate across multiple attacks, which provides an opportunity to build an interpretable and robust model.

EEG-Based Emotion Recognition: A State-of-the-Art Review of Current Trends and Opportunities

Computational Intelligence and Neuroscience ◽

10.1155/2020/8875426 ◽

2020 ◽

Vol 2020 ◽

pp. 1-19

Author(s):

Nazmi Sofian Suhaimi ◽

James Mountstephens ◽

Jason Teo

Keyword(s):

Emotion Recognition ◽

State Of The Art ◽

Research Community ◽

Human Interaction ◽

Human Cognition ◽

Future Research ◽

Emotional States ◽

Human Beings ◽

Eeg Signals ◽

Recent Developments

Emotions are fundamental for human beings and play an important role in human cognition. Emotion is commonly associated with logical decision making, perception, human interaction, and to a certain extent, human intelligence itself. With the growing interest of the research community towards establishing some meaningful “emotional” interactions between humans and computers, the need for reliable and deployable solutions for the identification of human emotional states is required. Recent developments in using electroencephalography (EEG) for emotion recognition have garnered strong interest from the research community as the latest developments in consumer-grade wearable EEG solutions can provide a cheap, portable, and simple solution for identifying emotions. Since the last comprehensive review was conducted back from the years 2009 to 2016, this paper will update on the current progress of emotion recognition using EEG signals from 2016 to 2019. The focus on this state-of-the-art review focuses on the elements of emotion stimuli type and presentation approach, study size, EEG hardware, machine learning classifiers, and classification approach. From this state-of-the-art review, we suggest several future research opportunities including proposing a different approach in presenting the stimuli in the form of virtual reality (VR). To this end, an additional section devoted specifically to reviewing only VR studies within this research domain is presented as the motivation for this proposed new approach using VR as the stimuli presentation device. This review paper is intended to be useful for the research community working on emotion recognition using EEG signals as well as for those who are venturing into this field of research.