TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data

Author(s):  
Binghua Li ◽  
Chao Li ◽  
Feng Duan ◽  
Ning Zheng ◽  
Qibin Zhao
2020 ◽  
Vol 34 (05) ◽  
pp. 8992-8999
Author(s):  
Zhongkai Sun ◽  
Prathusha Sarma ◽  
William Sethares ◽  
Yingyu Liang

Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment analysis and emotion recognition can be improved by learning (hidden) correlations between features extracted from the outer product of text and audio (we call this text-based audio) and analogous text-based video. This paper proposes a novel model, the Interaction Canonical Correlation Network (ICCN), to learn such multimodal embeddings. ICCN learns correlations between all three modes via deep canonical correlation analysis (DCCA) and the proposed embeddings are then tested on several benchmark datasets and against other state-of-the-art multimodal embedding algorithms. Empirical results and ablation studies confirm the effectiveness of ICCN in capturing useful information from all three views.


Author(s):  
Roberto Yuri Da Silva Franco ◽  
Alexandre Abreu De Freitas ◽  
Rodrigo Santos Do Amor Divino Lima ◽  
Marcelle Pereira Mota ◽  
Carlos Gustavo Resque Dos Santos ◽  
...  

2018 ◽  
Vol 17 (03) ◽  
pp. 883-910 ◽  
Author(s):  
P. D. Mahendhiran ◽  
S. Kannimuthu

Contemporary research in Multimodal Sentiment Analysis (MSA) using deep learning is becoming popular in Natural Language Processing. Enormous amount of data are obtainable from social media such as Facebook, WhatsApp, YouTube, Twitter and microblogs every day. In order to deal with these large multimodal data, it is difficult to identify the relevant information from social media websites. Hence, there is a need to improve an intellectual MSA. Here, Deep Learning is used to improve the understanding and performance of MSA better. Deep Learning delivers automatic feature extraction and supports to achieve the best performance to enhance the combined model that integrates Linguistic, Acoustic and Video information extraction method. This paper focuses on the various techniques used for classifying the given portion of natural language text, audio and video according to the thoughts, feelings or opinions expressed in it, i.e., whether the general attitude is Neutral, Positive or Negative. From the results, it is perceived that Deep Learning classification algorithm gives better results compared to other machine learning classifiers such as KNN, Naive Bayes, Random Forest, Random Tree and Neural Net model. The proposed MSA in deep learning is to identify sentiment in web videos which conduct the poof-of-concept experiments that proved, in preliminary experiments using the ICT-YouTube dataset, our proposed multimodal system achieves an accuracy of 96.07%.


Sign in / Sign up

Export Citation Format

Share Document