Multimodal emotion recognition with hierarchical memory networks

2021 ◽  
Vol 25 (4) ◽  
pp. 1031-1045
Author(s):  
Helang Lai ◽  
Keke Wu ◽  
Lingli Li

Emotion recognition in conversations is crucial as there is an urgent need to improve the overall experience of human-computer interactions. A promising improvement in this field is to develop a model that can effectively extract adequate contexts of a test utterance. We introduce a novel model, termed hierarchical memory networks (HMN), to address the issues of recognizing utterance level emotions. HMN divides the contexts into different aspects and employs different step lengths to represent the weights of these aspects. To model the self dependencies, HMN takes independent local memory networks to model these aspects. Further, to capture the interpersonal dependencies, HMN employs global memory networks to integrate the local outputs into global storages. Such storages can generate contextual summaries and help to find the emotional dependent utterance that is most relevant to the test utterance. With an attention-based multi-hops scheme, these storages are then merged with the test utterance using an addition operation in the iterations. Experiments on the IEMOCAP dataset show our model outperforms the compared methods with accuracy improvement.

Author(s):  
Yashaswi Alva M ◽  
Nachamai M ◽  
Joy Paulose

AbstractResearchers have studied that human-computer interactions (HCIs) can be more effective only when machines understand the emotions conveyed in speech. Speech emotion recognition has seen growing interest in research due to its usefulness in different applications. Building a neutral speech model becomes an important and challenging task as it can help in identifying different emotions from stuttered speech. This paper suggests two different approaches for identifying neutral pitch from stuttered speech. The implementation has proved through its accuracy the best model that can be adopted for neutral speech pitch identification.


2019 ◽  
Vol 7 (3) ◽  
pp. 919-925
Author(s):  
Kelvin Wambani Siovi ◽  
Cheruiyot Willison Kipruto ◽  
Agnes Mindila

2021 ◽  
Vol 5 (3) ◽  
pp. 13
Author(s):  
Heting Wang ◽  
Vidya Gaddy ◽  
James Ross Beveridge ◽  
Francisco R. Ortega

The role of affect has been long studied in human–computer interactions. Unlike previous studies that focused on seven basic emotions, an avatar named Diana was introduced who expresses a higher level of emotional intelligence. To adapt to the users various affects during interaction, Diana simulates emotions with dynamic facial expressions. When two people collaborated to build blocks, their affects were recognized and labeled using the Affdex SDK and a descriptive analysis was provided. When participants turned to collaborate with Diana, their subjective responses were collected and the length of completion was recorded. Three modes of Diana were involved: a flat-faced Diana, a Diana that used mimicry facial expressions, and a Diana that used emotionally responsive facial expressions. Twenty-one responses were collected through a five-point Likert scale questionnaire and the NASA TLX. Results from questionnaires were not statistically different. However, the emotionally responsive Diana obtained more positive responses, and people spent the longest time with the mimicry Diana. In post-study comments, most participants perceived facial expressions on Diana’s face as natural, four mentioned uncomfortable feelings caused by the Uncanny Valley effect.


Sign in / Sign up

Export Citation Format

Share Document