Combining Textual Clues with Audio-Visual Information for Multimodal Sentiment Analysis

As a fundamental task of sentiment analysis, aspect-level sentiment analysis aims to identify the sentiment polarity of a specific aspect in the context. Previous work on aspect-level sentiment analysis is text-based. With the prevalence of multimodal user-generated content (e.g. text and image) on the Internet, multimodal sentiment analysis has attracted increasing research attention in recent years. In the context of aspect-level sentiment analysis, multimodal data are often more important than text-only data, and have various correlations including impacts that aspect brings to text and image as well as the interactions associated with text and image. However, there has not been any related work carried out so far at the intersection of aspect-level and multimodal sentiment analysis. To fill this gap, we are among the first to put forward the new task, aspect based multimodal sentiment analysis, and propose a novel Multi-Interactive Memory Network (MIMN) model for this task. Our model includes two interactive memory networks to supervise the textual and visual information with the given aspect, and learns not only the interactive influences between cross-modality data but also the self influences in single-modality data. We provide a new publicly available multimodal aspect-level sentiment dataset to evaluate our model, and the experimental results demonstrate the effectiveness of our proposed model for this new task.

Download Full-text

VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301305 ◽

2019 ◽

Vol 33 ◽

pp. 305-312 ◽

Cited By ~ 6

Author(s):

Quoc-Tuan Truong ◽

Hady W. Lauw

Keyword(s):

Sentiment Analysis ◽

Visual Information ◽

User Preferences ◽

Attention Network ◽

Visual Component ◽

Analysis Task ◽

Sentiment Detection ◽

Restaurant Reviews ◽

Multimodal Sentiment Analysis ◽

Textual Content

Detecting the sentiment expressed by a document is a key task for many applications, e.g., modeling user preferences, monitoring consumer behaviors, assessing product quality. Traditionally, the sentiment analysis task primarily relies on textual content. Fueled by the rise of mobile phones that are often the only cameras on hand, documents on the Web (e.g., reviews, blog posts, tweets) are increasingly multimodal in nature, with photos in addition to textual content. A question arises whether the visual component could be useful for sentiment analysis as well. In this work, we propose Visual Aspect Attention Network or VistaNet, leveraging both textual and visual components. We observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently of the text. Therefore, instead of using visual information as features, VistaNet relies on visual information as alignment for pointing out the important sentences of a document using attention. Experiments on restaurant reviews showcase the effectiveness of visual aspect attention, vis-à-vis visual features or textual attention.

Download Full-text

Adaptive Modality Distillation for Separable Multimodal Sentiment Analysis

IEEE Intelligent Systems ◽

10.1109/mis.2021.3057757 ◽

2021 ◽

pp. 1-1

Author(s):

Wei Peng ◽

Xiaopeng Hong ◽

Guoying Zhao

Keyword(s):

Sentiment Analysis ◽

Multimodal Sentiment Analysis

Download Full-text

Deep Learning approach for text, image, and GIF multimodal sentiment analysis

2020 10th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke50421.2020.9303676 ◽

2020 ◽

Author(s):

Amirhossein Shirzad ◽

Hadi Zare ◽

Mehdi Teimouri

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Learning Approach ◽

Multimodal Sentiment Analysis

Download Full-text

A residual merged neutral network for multimodal sentiment analysis

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( ◽

10.1109/icbda.2017.8078794 ◽

2017 ◽

Cited By ~ 3

Author(s):

Nan Xu ◽

Wenji Mao

Keyword(s):

Sentiment Analysis ◽

Neutral Network ◽

Multimodal Sentiment Analysis

Download Full-text

Multimodal Sentiment Analysis with Temporal Modality Attention

10.21437/interspeech.2021-487 ◽

2021 ◽

Author(s):

Fan Qian ◽

Jiqing Han

Keyword(s):

Sentiment Analysis ◽

Multimodal Sentiment Analysis ◽

Temporal Modality

Download Full-text

Multi-Tensor Fusion Network with Hybrid Attention for Multimodal Sentiment Analysis

2020 International Conference on Machine Learning and Cybernetics (ICMLC) ◽

10.1109/icmlc51923.2020.9469572 ◽

2020 ◽

Author(s):

HAIWEI XUE ◽

XUEMING YAN ◽

SHENGYI JIANG ◽

HELANG LAI

Keyword(s):

Sentiment Analysis ◽

Multimodal Sentiment Analysis

Download Full-text

UXmood - A Tool to Investigate the User Experience (UX) Based on Multimodal Sentiment Analysis and Information Visualization (InfoVis)

2019 23rd International Conference Information Visualisation (IV) ◽

10.1109/iv.2019.00038 ◽

2019 ◽

Author(s):

Roberto Yuri Da Silva Franco ◽

Alexandre Abreu De Freitas ◽

Rodrigo Santos Do Amor Divino Lima ◽

Marcelle Pereira Mota ◽

Carlos Gustavo Resque Dos Santos ◽

...

Keyword(s):

Sentiment Analysis ◽

User Experience ◽

Information Visualization ◽

Multimodal Sentiment Analysis

Download Full-text

Deep Learning Techniques for Polarity Classification in Multimodal Sentiment Analysis

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622018500128 ◽

2018 ◽

Vol 17 (03) ◽

pp. 883-910 ◽

Cited By ~ 4

Author(s):

P. D. Mahendhiran ◽

S. Kannimuthu

Keyword(s):

Social Media ◽

Deep Learning ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Relevant Information ◽

Neural Net ◽

Video Information ◽

Automatic Feature Extraction ◽

Multimodal Sentiment Analysis

Contemporary research in Multimodal Sentiment Analysis (MSA) using deep learning is becoming popular in Natural Language Processing. Enormous amount of data are obtainable from social media such as Facebook, WhatsApp, YouTube, Twitter and microblogs every day. In order to deal with these large multimodal data, it is difficult to identify the relevant information from social media websites. Hence, there is a need to improve an intellectual MSA. Here, Deep Learning is used to improve the understanding and performance of MSA better. Deep Learning delivers automatic feature extraction and supports to achieve the best performance to enhance the combined model that integrates Linguistic, Acoustic and Video information extraction method. This paper focuses on the various techniques used for classifying the given portion of natural language text, audio and video according to the thoughts, feelings or opinions expressed in it, i.e., whether the general attitude is Neutral, Positive or Negative. From the results, it is perceived that Deep Learning classification algorithm gives better results compared to other machine learning classifiers such as KNN, Naive Bayes, Random Forest, Random Tree and Neural Net model. The proposed MSA in deep learning is to identify sentiment in web videos which conduct the poof-of-concept experiments that proved, in preliminary experiments using the ICT-YouTube dataset, our proposed multimodal system achieves an accuracy of 96.07%.

Download Full-text