MS-MDA: Multisource Marginal Distribution Adaptation for Cross-Subject and Cross-Session EEG Emotion Recognition

Frontiers in Neuroscience ◽

10.3389/fnins.2021.778488 ◽

2021 ◽

Vol 15 ◽

Author(s):

Hao Chen ◽

Ming Jin ◽

Zhunan Li ◽

Cunhang Fan ◽

Jinpeng Li ◽

...

Keyword(s):

Emotion Recognition ◽

Marginal Distribution ◽

Domain Adaptation ◽

State Of The Art ◽

Essential Element ◽

Data Share ◽

Domain Specific ◽

Eeg Data ◽

Data Source ◽

Electroencephalogram Eeg

As an essential element for the diagnosis and rehabilitation of psychiatric disorders, the electroencephalogram (EEG) based emotion recognition has achieved significant progress due to its high precision and reliability. However, one obstacle to practicality lies in the variability between subjects and sessions. Although several studies have adopted domain adaptation (DA) approaches to tackle this problem, most of them treat multiple EEG data from different subjects and sessions together as a single source domain for transfer, which either fails to satisfy the assumption of domain adaptation that the source has a certain marginal distribution, or increases the difficulty of adaptation. We therefore propose the multi-source marginal distribution adaptation (MS-MDA) for EEG emotion recognition, which takes both domain-invariant and domain-specific features into consideration. First, we assume that different EEG data share the same low-level features, then we construct independent branches for multiple EEG data source domains to adopt one-to-one domain adaptation and extract domain-specific features. Finally, the inference is made by multiple branches. We evaluate our method on SEED and SEED-IV for recognizing three and four emotions, respectively. Experimental results show that the MS-MDA outperforms the comparison methods and state-of-the-art models in cross-session and cross-subject transfer scenarios in our settings. Codes at https://github.com/VoiceBeer/MS-MDA.

Download Full-text

Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction

International Journal of Computer Vision ◽

10.1007/s11263-021-01519-y ◽

2021 ◽

Author(s):

G. Bellitto ◽

F. Proietto Salanitri ◽

S. Palazzo ◽

F. Rundo ◽

D. Giordano ◽

...

Keyword(s):

Multiple Scales ◽

Domain Adaptation ◽

State Of The Art ◽

Feature Learning ◽

Hierarchical Learning ◽

Domain Specific ◽

Proposed Model ◽

Saliency Prediction ◽

Video Saliency ◽

Abstraction Levels

AbstractIn this work, we propose a 3D fully convolutional architecture for video saliency prediction that employs hierarchical supervision on intermediate maps (referred to as conspicuity maps) generated using features extracted at different abstraction levels. We provide the base hierarchical learning mechanism with two techniques for domain adaptation and domain-specific learning. For the former, we encourage the model to unsupervisedly learn hierarchical general features using gradient reversal at multiple scales, to enhance generalization capabilities on datasets for which no annotations are provided during training. As for domain specialization, we employ domain-specific operations (namely, priors, smoothing and batch normalization) by specializing the learned features on individual datasets in order to maximize performance. The results of our experiments show that the proposed model yields state-of-the-art accuracy on supervised saliency prediction. When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two. When, instead, we test it in an unsupervised domain adaptation setting, by enabling hierarchical gradient reversal layers, we obtain performance comparable to supervised state-of-the-art. Source code, trained models and example outputs are publicly available at https://github.com/perceivelab/hd2s.

Download Full-text

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6952 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12613-12620 ◽

Cited By ~ 3

Author(s):

Jihan Yang ◽

Ruijia Xu ◽

Ruiyu Li ◽

Xiaojuan Qi ◽

Xiaoyong Shen ◽

...

Keyword(s):

Marginal Distribution ◽

Domain Adaptation ◽

State Of The Art ◽

Feature Space ◽

Semantic Segmentation ◽

The State ◽

Object Size ◽

Feature Maps ◽

Feature Representations ◽

Unsupervised Domain Adaptation

We focus on Unsupervised Domain Adaptation (UDA) for the task of semantic segmentation. Recently, adversarial alignment has been widely adopted to match the marginal distribution of feature representations across two domains globally. However, this strategy fails in adapting the representations of the tail classes or small objects for semantic segmentation since the alignment objective is dominated by head categories or large objects. In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations. Specifically, we firstly perturb the intermediate feature maps with several attack objectives (i.e., discriminator and classifier) on each individual position for both domains, and then the classifier is trained to be invariant to the perturbations. By perturbing each position individually, our model treats each location evenly regardless of the category or object size and thus circumvents the aforementioned issue. Moreover, the domain gap in feature space is reduced by extrapolating source and target perturbed features towards each other with attack on the domain discriminator. Our approach achieves the state-of-the-art performance on two challenging domain adaptation tasks for semantic segmentation: GTA5 → Cityscapes and SYNTHIA → Cityscapes.

Download Full-text

Multi-Source and Multi-Representation Adaptation for Cross-Domain Electroencephalography Emotion Recognition

Frontiers in Psychology ◽

10.3389/fpsyg.2021.809459 ◽

2022 ◽

Vol 12 ◽

Author(s):

Jiangsheng Cao ◽

Xueqin He ◽

Chenhui Yang ◽

Sifang Chen ◽

Zhangyu Li ◽

...

Keyword(s):

Emotion Recognition ◽

Partial Information ◽

Domain Adaptation ◽

Hybrid Structure ◽

Superior Performance ◽

Cross Domain ◽

Eeg Data ◽

Attractive Option ◽

Single Structure ◽

Multiple Domains

Due to the non-invasiveness and high precision of electroencephalography (EEG), the combination of EEG and artificial intelligence (AI) is often used for emotion recognition. However, the internal differences in EEG data have become an obstacle to classification accuracy. To solve this problem, considering labeled data from similar nature but different domains, domain adaptation usually provides an attractive option. Most of the existing researches aggregate the EEG data from different subjects and sessions as a source domain, which ignores the assumption that the source has a certain marginal distribution. Moreover, existing methods often only align the representation distributions extracted from a single structure, and may only contain partial information. Therefore, we propose the multi-source and multi-representation adaptation (MSMRA) for cross-domain EEG emotion recognition, which divides the EEG data from different subjects and sessions into multiple domains and aligns the distribution of multiple representations extracted from a hybrid structure. Two datasets, i.e., SEED and SEED IV, are used to validate the proposed method in cross-session and cross-subject transfer scenarios, experimental results demonstrate the superior performance of our model to state-of-the-art models in most settings.

Download Full-text

EEG-Based Emotion Recognition by Convolutional Neural Network with Multi-Scale Kernels

Sensors ◽

10.3390/s21155092 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5092

Author(s):

Tran-Dac-Thinh Phan ◽

Soo-Hyung Kim ◽

Hyung-Jeong Yang ◽

Guee-Sang Lee

Keyword(s):

Emotion Recognition ◽

Binary Classification ◽

Eeg Signals ◽

Frequency Bands ◽

Multi Scale ◽

Eeg Data ◽

Crucial Information ◽

The Time Domain ◽

Electroencephalogram Eeg ◽

Selection Of

Besides facial or gesture-based emotion recognition, Electroencephalogram (EEG) data have been drawing attention thanks to their capability in countering the effect of deceptive external expressions of humans, like faces or speeches. Emotion recognition based on EEG signals heavily relies on the features and their delineation, which requires the selection of feature categories converted from the raw signals and types of expressions that could display the intrinsic properties of an individual signal or a group of them. Moreover, the correlation or interaction among channels and frequency bands also contain crucial information for emotional state prediction, and it is commonly disregarded in conventional approaches. Therefore, in our method, the correlation between 32 channels and frequency bands were put into use to enhance the emotion prediction performance. The extracted features chosen from the time domain were arranged into feature-homogeneous matrices, with their positions following the corresponding electrodes placed on the scalp. Based on this 3D representation of EEG signals, the model must have the ability to learn the local and global patterns that describe the short and long-range relations of EEG channels, along with the embedded features. To deal with this problem, we proposed the 2D CNN with different kernel-size of convolutional layers assembled into a convolution block, combining features that were distributed in small and large regions. Ten-fold cross validation was conducted on the DEAP dataset to prove the effectiveness of our approach. We achieved the average accuracies of 98.27% and 98.36% for arousal and valence binary classification, respectively.

Download Full-text

The challenges of emotion recognition methods based on electroencephalogram signals: a literature review

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v12i2.pp1508-1519 ◽

2022 ◽

Vol 12 (2) ◽

pp. 1508

Author(s):

I Made Agus Wirawan ◽

Retantyo Wardoyo ◽

Danang Lelono

Keyword(s):

Emotion Recognition ◽

Eeg Signals ◽

Data Set ◽

Eeg Data ◽

Network Method ◽

Reduction Methods ◽

Research Questions ◽

Participant Characteristics ◽

Diverse Data ◽

Electroencephalogram Eeg

Electroencephalogram (EEG) signals in recognizing emotions have several advantages. Still, the success of this study, however, is strongly influenced by: i) the distribution of the data used, ii) consider of differences in participant characteristics, and iii) consider the characteristics of the EEG signals. In response to these issues, this study will examine three important points that affect the success of emotion recognition packaged in several research questions: i) What factors need to be considered to generate and distribute EEG data?, ii) How can EEG signals be generated with consideration of differences in participant characteristics?, and iii) How do EEG signals with characteristics exist among its features for emotion recognition? The results, therefore, indicate some important challenges to be studied further in EEG signals-based emotion recognition research. These include i) determine robust methods for imbalanced EEG signals data, ii) determine the appropriate smoothing method to eliminate disturbances on the baseline signals, iii) determine the best baseline reduction methods to reduce the differences in the characteristics of the participants on the EEG signals, iv) determine the robust architecture of the capsule network method to overcome the loss of knowledge information and apply it in more diverse data set.

Download Full-text

Removal of Artifacts from EEG Signals: A Review

Sensors ◽

10.3390/s19050987 ◽

2019 ◽

Vol 19 (5) ◽

pp. 987 ◽

Cited By ~ 51

Author(s):

Xiao Jiang ◽

Gui-Bin Bian ◽

Zean Tian

Keyword(s):

Comparative Analysis ◽

Open Problem ◽

State Of The Art ◽

Brain Activity ◽

Eeg Signal ◽

Eeg Signals ◽

Artifact Removal ◽

Eeg Data ◽

And Behavior ◽

Electroencephalogram Eeg

Electroencephalogram (EEG) plays an important role in identifying brain activity and behavior. However, the recorded electrical activity always be contaminated with artifacts and then affect the analysis of EEG signal. Hence, it is essential to develop methods to effectively detect and extract the clean EEG data during encephalogram recordings. Several methods have been proposed to remove artifacts, but the research on artifact removal continues to be an open problem. This paper tends to review the current artifact removal of various contaminations. We first discuss the characteristics of EEG data and the types of different artifacts. Then, a general overview of the state-of-the-art methods and their detail analysis are presented. Lastly, a comparative analysis is provided for choosing a suitable methods according to particular application.

Download Full-text

Bottom-up and Layerwise Domain Adaptation for Pedestrian Detection in Thermal Images

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3418213 ◽

2021 ◽

Vol 17 (1) ◽

pp. 1-19

Author(s):

My Kieu ◽

Andrew D. Bagdanov ◽

Marco Bertini

Keyword(s):

Domain Adaptation ◽

State Of The Art ◽

Pedestrian Detection ◽

Challenging Problem ◽

Top Down ◽

Bottom Up ◽

Security Applications ◽

Lighting Conditions ◽

Initial Layers ◽

Single Modality

Pedestrian detection is a canonical problem for safety and security applications, and it remains a challenging problem due to the highly variable lighting conditions in which pedestrians must be detected. This article investigates several domain adaptation approaches to adapt RGB-trained detectors to the thermal domain. Building on our earlier work on domain adaptation for privacy-preserving pedestrian detection, we conducted an extensive experimental evaluation comparing top-down and bottom-up domain adaptation and also propose two new bottom-up domain adaptation strategies. For top-down domain adaptation, we leverage a detector pre-trained on RGB imagery and efficiently adapt it to perform pedestrian detection in the thermal domain. Our bottom-up domain adaptation approaches include two steps: first, training an adapter segment corresponding to initial layers of the RGB-trained detector adapts to the new input distribution; then, we reconnect the adapter segment to the original RGB-trained detector for final adaptation with a top-down loss. To the best of our knowledge, our bottom-up domain adaptation approaches outperform the best-performing single-modality pedestrian detection results on KAIST and outperform the state of the art on FLIR.

Download Full-text

Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition

Sensors ◽

10.3390/s21124233 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4233

Author(s):

Bogdan Mocanu ◽

Ruxandra Tapu ◽

Titus Zaharia

Keyword(s):

Emotion Recognition ◽

Loss Function ◽

State Of The Art ◽

Disease Diagnosis ◽

Data Representation ◽

Speech Emotion Recognition ◽

Audio Features ◽

Global Accuracy ◽

Space Data ◽

Art Techniques

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%.

Download Full-text

Multi-Path and Group-Loss-Based Network for Speech Emotion Recognition in Multi-Domain Datasets

Sensors ◽

10.3390/s21051579 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1579 ◽

Cited By ~ 1

Author(s):

Kyoung Ju Noh ◽

Chi Yoon Jeong ◽

Jiyoun Lim ◽

Seungeun Chung ◽

Gague Kim ◽

...

Keyword(s):

Emotion Recognition ◽

Short Term Memory ◽

Domain Adaptation ◽

Classification Model ◽

Speech Emotion Recognition ◽

Target Domain ◽

Model Generalization ◽

Speech Database ◽

Emotion Labels ◽

Temporal Feature

Speech emotion recognition (SER) is a natural method of recognizing individual emotions in everyday life. To distribute SER models to real-world applications, some key challenges must be overcome, such as the lack of datasets tagged with emotion labels and the weak generalization of the SER model for an unseen target domain. This study proposes a multi-path and group-loss-based network (MPGLN) for SER to support multi-domain adaptation. The proposed model includes a bidirectional long short-term memory-based temporal feature generator and a transferred feature extractor from the pre-trained VGG-like audio classification model (VGGish), and it learns simultaneously based on multiple losses according to the association of emotion labels in the discrete and dimensional models. For the evaluation of the MPGLN SER as applied to multi-cultural domain datasets, the Korean Emotional Speech Database (KESD), including KESDy18 and KESDy19, is constructed, and the English-speaking Interactive Emotional Dyadic Motion Capture database (IEMOCAP) is used. The evaluation of multi-domain adaptation and domain generalization showed 3.7% and 3.5% improvements, respectively, of the F1 score when comparing the performance of MPGLN SER with a baseline SER model that uses a temporal feature generator. We show that the MPGLN SER efficiently supports multi-domain adaptation and reinforces model generalization.

Download Full-text

An Efficient and General Framework for Aerial Point Cloud Classification in Urban Scenarios

Remote Sensing ◽

10.3390/rs13101985 ◽

2021 ◽

Vol 13 (10) ◽

pp. 1985

Author(s):

Emre Özdemir ◽

Fabio Remondino ◽

Alessandro Golkar

Keyword(s):

Point Cloud ◽

State Of The Art ◽

Computational Power ◽

Specific Data ◽

Cloud Processing ◽

Point Cloud Processing ◽

Current State ◽

Data Source ◽

Point Cloud Classification ◽

And Training

With recent advances in technologies, deep learning is being applied more and more to different tasks. In particular, point cloud processing and classification have been studied for a while now, with various methods developed. Some of the available classification approaches are based on specific data source, like LiDAR, while others are focused on specific scenarios, like indoor. A general major issue is the computational efficiency (in terms of power consumption, memory requirement, and training/inference time). In this study, we propose an efficient framework (named TONIC) that can work with any kind of aerial data source (LiDAR or photogrammetry) and does not require high computational power while achieving accuracy on par with the current state of the art methods. We also test our framework for its generalization ability, showing capabilities to learn from one dataset and predict on unseen aerial scenarios.

Download Full-text