scholarly journals Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis

Author(s):  
Nan Xu ◽  
Wenji Mao ◽  
Guandan Chen

As a fundamental task of sentiment analysis, aspect-level sentiment analysis aims to identify the sentiment polarity of a specific aspect in the context. Previous work on aspect-level sentiment analysis is text-based. With the prevalence of multimodal user-generated content (e.g. text and image) on the Internet, multimodal sentiment analysis has attracted increasing research attention in recent years. In the context of aspect-level sentiment analysis, multimodal data are often more important than text-only data, and have various correlations including impacts that aspect brings to text and image as well as the interactions associated with text and image. However, there has not been any related work carried out so far at the intersection of aspect-level and multimodal sentiment analysis. To fill this gap, we are among the first to put forward the new task, aspect based multimodal sentiment analysis, and propose a novel Multi-Interactive Memory Network (MIMN) model for this task. Our model includes two interactive memory networks to supervise the textual and visual information with the given aspect, and learns not only the interactive influences between cross-modality data but also the self influences in single-modality data. We provide a new publicly available multimodal aspect-level sentiment dataset to evaluate our model, and the experimental results demonstrate the effectiveness of our proposed model for this new task.

Author(s):  
Hai Pham ◽  
Paul Pu Liang ◽  
Thomas Manzini ◽  
Louis-Philippe Morency ◽  
Barnabás Póczos

Multimodal sentiment analysis is a core research area that studies speaker sentiment expressed from the language, visual, and acoustic modalities. The central challenge in multimodal learning involves inferring joint representations that can process and relate information from these modalities. However, existing work learns joint representations by requiring all modalities as input and as a result, the learned representations may be sensitive to noisy or missing modalities at test time. With the recent success of sequence to sequence (Seq2Seq) models in machine translation, there is an opportunity to explore new ways of learning joint representations that may not require all input modalities at test time. In this paper, we propose a method to learn robust joint representations by translating between modalities. Our method is based on the key insight that translation from a source to a target modality provides a method of learning joint representations using only the source modality as input. We augment modality translations with a cycle consistency loss to ensure that our joint representations retain maximal information from all modalities. Once our translation model is trained with paired multimodal data, we only need data from the source modality at test time for final sentiment prediction. This ensures that our model remains robust from perturbations or missing information in the other modalities. We train our model with a coupled translationprediction objective and it achieves new state-of-the-art results on multimodal sentiment analysis datasets: CMU-MOSI, ICTMMMO, and YouTube. Additional experiments show that our model learns increasingly discriminative joint representations with more input modalities while maintaining robustness to missing or perturbed modalities.


2019 ◽  
Vol 9 (16) ◽  
pp. 3239
Author(s):  
Yunseok Noh ◽  
Seyoung Park ◽  
Seong-Bae Park

Aspect-based sentiment analysis (ABSA) is the task of classifying the sentiment of a specific aspect in a text. Because a single text usually has multiple aspects which are expressed independently, ABSA is a crucial task for in-depth opinion mining. A key point of solving ABSA is to align sentiment expressions with their proper target aspect in a text. Thus, many recent neural models have applied attention mechanisms to learning the alignment. However, it is problematic to depend solely on attention mechanisms to achieve this, because most sentiment expressions such as “nice” and “bad” are too general to be aligned with a proper aspect even through an attention mechanism. To solve this problem, this paper proposes a novel convolutional neural network (CNN)-based aspect-level sentiment classification model, which consists of two CNNs. Because sentiment expressions relevant to an aspect usually appear near the aspect expressions of the aspect, the proposed model first finds the aspect expressions for a given aspect and then focuses on the sentiment expressions around the aspect expressions to determine the final sentiment of an aspect. Thus, the first CNN extracts the positional information of aspect expressions for a target aspect and expresses the information as an aspect map. Even if there exist no data with annotations on direct relation between aspects and their expressions, the aspect map can be obtained effectively by learning it in a weakly supervised manner. Then, the second CNN classifies the sentiment of the target aspect in a text using the aspect map. The proposed model is evaluated on SemEval 2016 Task 5 dataset and is compared with several baseline models. According to the experimental results, the proposed model does not only outperform the baseline models but also shows state-of-the-art performance for the dataset.


Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 74
Author(s):  
Sun Zhang ◽  
Bo Li ◽  
Chunyong Yin

The rising use of online media has changed the social customs of the public. Users have become accustomed to sharing daily experiences and publishing personal opinions on social networks. Social data carrying emotion and attitude has provided significant decision support for numerous tasks in sentiment analysis. Conventional methods for sentiment classification only concern textual modality and are vulnerable to the multimodal scenario, while common multimodal approaches only focus on the interactive relationship among modalities without considering unique intra-modal information. A hybrid fusion network is proposed in this paper to capture both inter-modal and intra-modal features. Firstly, in the stage of representation fusion, a multi-head visual attention is proposed to extract accurate semantic and sentimental information from textual contents, with the guidance of visual features. Then, multiple base classifiers are trained to learn independent and diverse discriminative information from different modal representations in the stage of decision fusion. The final decision is determined based on fusing the decision supports from base classifiers via a decision fusion method. To improve the generalization of our hybrid fusion network, a similarity loss is employed to inject decision diversity into the whole model. Empiric results on five multimodal datasets have demonstrated that the proposed model achieves higher accuracy and better generalization capacity for multimodal sentiment analysis.


Author(s):  
Xiangying Ran ◽  
Yuanyuan Pan ◽  
Wei Sun ◽  
Chongjun Wang

Aspect-based sentiment analysis (ABSA) is a fine-grained task. Recurrent Neural Network (RNN) model armed with attention mechanism seems a natural fit for this task, and actually it achieves the state-of-the-art performance recently. However, previous attention mechanisms proposed for ABSA may attend irrelevant words and thus downgrade the performance, especially when dealing with long and complex sentences with multiple aspects. In this paper, we propose a novel architecture named Hierarchical Gate Memory Network (HGMN) for ABSA: firstly, we employ the proposed hierarchical gate mechanism to learn to select the related part about the given aspect, which can keep the original sequence structure of sentence at the same time. After that, we apply Convolutional Neural Network (CNN) on the final aspect-specific memory. We conduct extensive experiments on the SemEval 2014 and Twitter dataset, and results demonstrate that our model outperforms attention based state-of-the-art baselines.


Author(s):  
Quoc-Tuan Truong ◽  
Hady W. Lauw

Detecting the sentiment expressed by a document is a key task for many applications, e.g., modeling user preferences, monitoring consumer behaviors, assessing product quality. Traditionally, the sentiment analysis task primarily relies on textual content. Fueled by the rise of mobile phones that are often the only cameras on hand, documents on the Web (e.g., reviews, blog posts, tweets) are increasingly multimodal in nature, with photos in addition to textual content. A question arises whether the visual component could be useful for sentiment analysis as well. In this work, we propose Visual Aspect Attention Network or VistaNet, leveraging both textual and visual components. We observe that in many cases, with respect to sentiment detection, images play a supporting role to text, highlighting the salient aspects of an entity, rather than expressing sentiments independently of the text. Therefore, instead of using visual information as features, VistaNet relies on visual information as alignment for pointing out the important sentences of a document using attention. Experiments on restaurant reviews showcase the effectiveness of visual aspect attention, vis-à-vis visual features or textual attention.


2019 ◽  
Vol 11 (7) ◽  
pp. 157 ◽  
Author(s):  
Qiuyue Zhang ◽  
Ran Lu

Aspect-level sentiment analysis (ASA) aims at determining the sentiment polarity of specific aspect term with a given sentence. Recent advances in attention mechanisms suggest that attention models are useful in ASA tasks and can help identify focus words. Or combining attention mechanisms with neural networks are also common methods. However, according to the latest research, they often fail to extract text representations efficiently and to achieve interaction between aspect terms and contexts. In order to solve the complete task of ASA, this paper proposes a Multi-Attention Network (MAN) model which adopts several attention networks. This model not only preprocesses data by Bidirectional Encoder Representations from Transformers (BERT), but a number of measures have been taken. First, the MAN model utilizes the partial Transformer after transformation to obtain hidden sequence information. Second, because words in different location have different effects on aspect terms, we introduce location encoding to analyze the impact on distance from ASA tasks, then we obtain the influence of different words with aspect terms through the bidirectional attention network. From the experimental results of three datasets, we could find that the proposed model could achieve consistently superior results.


Symmetry ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 2010
Author(s):  
Kang Zhang ◽  
Yushui Geng ◽  
Jing Zhao ◽  
Jianxin Liu ◽  
Wenxiao Li

In recent years, with the popularity of social media, users are increasingly keen to express their feelings and opinions in the form of pictures and text, which makes multimodal data with text and pictures the con tent type with the most growth. Most of the information posted by users on social media has obvious sentimental aspects, and multimodal sentiment analysis has become an important research field. Previous studies on multimodal sentiment analysis have primarily focused on extracting text and image features separately and then combining them for sentiment classification. These studies often ignore the interaction between text and images. Therefore, this paper proposes a new multimodal sentiment analysis model. The model first eliminates noise interference in textual data and extracts more important image features. Then, in the feature-fusion part based on the attention mechanism, the text and images learn the internal features from each other through symmetry. Then the fusion features are applied to sentiment classification tasks. The experimental results on two common multimodal sentiment datasets demonstrate the effectiveness of the proposed model.


2021 ◽  
Vol 7 (4) ◽  
pp. 123
Author(s):  
Yingxue Sun ◽  
Junbo Gao

<p>In recent years, more and more people express their feelings through both images and texts, boosting the growth of multimodal data. Multimodal data contains richer semantics and is more conducive to judging the real emotions of people. To fully learn the features of every single modality and integrate modal information, this paper proposes a fine-grained multimodal sentiment analysis method FCLAG based on gating and attention mechanism. First, the method is carried out from the character level and the word level in the text aspect. CNN is used to extract more fine-grained emotional information from characters, and the attention mechanism is used to improve the expressiveness of the keywords. In terms of images, a gating mechanism is added to control the flow of image information between networks. The images and text vectors represent the original data collectively. Then the bidirectional LSTM is used to complete further learning, which enhances the information interaction capability between the modalities. Finally, put the multimodal feature expression into the classifier. This method is verified on a self-built image and text dataset. The experimental results show that compared with other sentiment classification models, this method has greater improvement in accuracy and F1 score and it can effectively improve the performance of multimodal sentiment analysis.</p>


Sign in / Sign up

Export Citation Format

Share Document