Speech Representation Learning for Emotion Recognition Using End-to-End ASR with Factorized Adaptation

Adieu recurrence? End-to-end speech emotion recognition using a context stacking dilated convolutional network

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287667 ◽

2021 ◽

Author(s):

Duowei Tang ◽

Peter Kuppens ◽

Luc Geurts ◽

Toon van Waterschoot

Keyword(s):

Emotion Recognition ◽

Speech Emotion Recognition ◽

Convolutional Network ◽

End To End

Download Full-text

Representation Learning with Spectro-Temporal-Channel Attention for Speech Emotion Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414006 ◽

2021 ◽

Author(s):

Lili Guo ◽

Longbiao Wang ◽

Chenglin Xu ◽

Jianwu Dang ◽

Eng Siong Chng ◽

...

Keyword(s):

Emotion Recognition ◽

Representation Learning ◽

Speech Emotion Recognition

Download Full-text

Conditional Independence for Pretext Task Selection in Self-Supervised Speech Representation Learning

10.21437/interspeech.2021-1027 ◽

2021 ◽

Author(s):

Salah Zaiem ◽

Titouan Parcollet ◽

Slim Essid

Keyword(s):

Conditional Independence ◽

Representation Learning ◽

Task Selection ◽

Speech Representation

Download Full-text

Multimodal End-to-End Sparse Model for Emotion Recognition

10.18653/v1/2021.naacl-main.417 ◽

2021 ◽

Author(s):

Wenliang Dai ◽

Samuel Cahyawijaya ◽

Zihan Liu ◽

Pascale Fung

Keyword(s):

Emotion Recognition ◽

Sparse Model ◽

End To End

Download Full-text

Time-Frequency Representation Learning with Graph Convolutional Network for Dialogue-Level Speech Emotion Recognition

10.21437/interspeech.2021-2067 ◽

2021 ◽

Author(s):

Jiaxing Liu ◽

Yaodong Song ◽

Longbiao Wang ◽

Jianwu Dang ◽

Ruiguo Yu

Keyword(s):

Emotion Recognition ◽

Representation Learning ◽

Speech Emotion Recognition ◽

Convolutional Network ◽

Time Frequency ◽

Frequency Representation

Download Full-text

Exploring Effective Speech Representation via ASR for High-Quality End-to-End Multispeaker TTS

10.1007/978-3-030-92310-5_13 ◽

2021 ◽

pp. 110-118

Author(s):

Dawei Liu ◽

Longbiao Wang ◽

Sheng Li ◽

Haoyu Li ◽

Chenchen Ding ◽

...

Keyword(s):

High Quality ◽

Speech Representation ◽

End To End

Download Full-text

End-To-End Efficient Representation Learning via Cascading Combinatorial Optimization

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ◽

10.1109/cvpr.2019.01164 ◽

2019 ◽

Author(s):

Yeonwoo Jeong ◽

Yoonsung Kim ◽

Hyun Oh Song

Keyword(s):

Combinatorial Optimization ◽

Representation Learning ◽

Efficient Representation ◽

End To End

Download Full-text

An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5364 ◽

2020 ◽

Vol 34 (01) ◽

pp. 303-311 ◽

Cited By ~ 3

Author(s):

Sicheng Zhao ◽

Yunsheng Ma ◽

Yang Gu ◽

Jufeng Yang ◽

Tengfei Xing ◽

...

Keyword(s):

Neural Networks ◽

Emotion Recognition ◽

State Of The Art ◽

Source Code ◽

Cross Entropy ◽

Attention Network ◽

Audio Features ◽

End To End ◽

3D Cnn ◽

And Training

Emotion recognition in user-generated videos plays an important role in human-centered computing. Existing methods mainly employ traditional two-stage shallow pipeline, i.e. extracting visual and/or audio features and training classifiers. In this paper, we propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs). Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN. Further, we design a special classification loss, i.e. polarity-consistent cross-entropy loss, based on the polarity-emotion hierarchy constraint to guide the attention generation. Extensive experiments conducted on the challenging VideoEmotion-8 and Ekman-6 datasets demonstrate that the proposed VAANet outperforms the state-of-the-art approaches for video emotion recognition. Our source code is released at: https://github.com/maysonma/VAANet.

Download Full-text