Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis

Multimodal language analysis often considers relationships between features based on text and those based on acoustical and visual properties. Text features typically outperform non-text features in sentiment analysis or emotion recognition tasks in part because the text features are derived from advanced language models or word embeddings trained on massive data sources while audio and video features are human-engineered and comparatively underdeveloped. Given that the text, audio, and video are describing the same utterance in different ways, we hypothesize that the multimodal sentiment analysis and emotion recognition can be improved by learning (hidden) correlations between features extracted from the outer product of text and audio (we call this text-based audio) and analogous text-based video. This paper proposes a novel model, the Interaction Canonical Correlation Network (ICCN), to learn such multimodal embeddings. ICCN learns correlations between all three modes via deep canonical correlation analysis (DCCA) and the proposed embeddings are then tested on several benchmark datasets and against other state-of-the-art multimodal embedding algorithms. Empirical results and ablation studies confirm the effectiveness of ICCN in capturing useful information from all three views.

Download Full-text

TPFN: Applying Outer Product Along Time to Multimodal Sentiment Analysis Fusion on Incomplete Data

Computer Vision – ECCV 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-58586-0_26 ◽

2020 ◽

pp. 431-447

Author(s):

Binghua Li ◽

Chao Li ◽

Feng Duan ◽

Ning Zheng ◽

Qibin Zhao

Keyword(s):

Sentiment Analysis ◽

Incomplete Data ◽

Outer Product ◽

Multimodal Sentiment Analysis

Download Full-text

Multimodal Embeddings from Language Models for Emotion Recognition in the Wild

IEEE Signal Processing Letters ◽

10.1109/lsp.2021.3065598 ◽

2021 ◽

pp. 1-1

Author(s):

Shao-Yen Tseng ◽

Shrikanth Narayanan ◽

Panayiotis Georgiou

Keyword(s):

Emotion Recognition ◽

Language Models ◽

In The Wild

Download Full-text

Towards corpus and model: Hierarchical structured-attention-based features for Indonesian named entity recognition

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202286 ◽

2021 ◽

pp. 1-12

Author(s):

Yingwen Fu ◽

Nankai Lin ◽

Xiaotian Lin ◽

Shengyi Jiang

Keyword(s):

Language Processing ◽

State Of The Art ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Neural Models ◽

Performance Models ◽

Named Entity ◽

High Resource ◽

Benchmark Datasets

Named entity recognition (NER) is fundamental to natural language processing (NLP). Most state-of-the-art researches on NER are based on pre-trained language models (PLMs) or classic neural models. However, these researches are mainly oriented to high-resource languages such as English. While for Indonesian, related resources (both in dataset and technology) are not yet well-developed. Besides, affix is an important word composition for Indonesian language, indicating the essentiality of character and token features for token-wise Indonesian NLP tasks. However, features extracted by currently top-performance models are insufficient. Aiming at Indonesian NER task, in this paper, we build an Indonesian NER dataset (IDNER) comprising over 50 thousand sentences (over 670 thousand tokens) to alleviate the shortage of labeled resources in Indonesian. Furthermore, we construct a hierarchical structured-attention-based model (HSA) for Indonesian NER to extract sequence features from different perspectives. Specifically, we use an enhanced convolutional structure as well as an enhanced attention structure to extract deeper features from characters and tokens. Experimental results show that HSA establishes competitive performance on IDNER and three benchmark datasets.

Download Full-text

Adaptive Modality Distillation for Separable Multimodal Sentiment Analysis

IEEE Intelligent Systems ◽

10.1109/mis.2021.3057757 ◽

2021 ◽

pp. 1-1

Author(s):

Wei Peng ◽

Xiaopeng Hong ◽

Guoying Zhao

Keyword(s):

Sentiment Analysis ◽

Multimodal Sentiment Analysis

Download Full-text

Extractive Summarization Based on Dynamic Memory Network

Symmetry ◽

10.3390/sym13040600 ◽

2021 ◽

Vol 13 (4) ◽

pp. 600

Author(s):

Ping Li ◽

Jiong Yu

Keyword(s):

Dynamic Memory ◽

Extractive Summarization ◽

Model Based ◽

Network Method ◽

Comparable Performance ◽

Benchmark Datasets ◽

Memory Network ◽

Text Features

We present an extractive summarization model based on the Bert and dynamic memory network. The model based on Bert uses the transformer to extract text features and uses the pre-trained model to construct the sentence embeddings. The model based on Bert labels the sentences automatically without using any hand-crafted features and the datasets are symmetry labeled. We also present a dynamic memory network method for extractive summarization. Experiments are conducted on several summarization benchmark datasets. Our model shows comparable performance compared with other extractive summarization methods.

Download Full-text

Deep Learning approach for text, image, and GIF multimodal sentiment analysis

2020 10th International Conference on Computer and Knowledge Engineering (ICCKE) ◽

10.1109/iccke50421.2020.9303676 ◽

2020 ◽

Author(s):

Amirhossein Shirzad ◽

Hadi Zare ◽

Mehdi Teimouri

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Learning Approach ◽

Multimodal Sentiment Analysis

Download Full-text

Astrid

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436907 ◽

2020 ◽

Vol 14 (4) ◽

pp. 471-484

Author(s):

Suraj Shetiya ◽

Saravanan Thirumuruganathan ◽

Nick Koudas ◽

Gautam Das

Keyword(s):

Deep Learning ◽

Objective Function ◽

Pattern Matching ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Selectivity Estimation ◽

Statistical Correlations ◽

Benchmark Datasets ◽

Traditional Approaches

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.

Download Full-text