Know Yourself and Know Others: Efficient Common Representation Learning for Few-shot Cross-modal Retrieval

DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very useful for cross-modal retrieval, which can provide rich general semantic information that can be shared across different modalities. However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain. Knowledge in source domain cannot be directly transferred to both two different modalities in target domain, and the inherent cross-modal correlation contained in target domain provides key hints for cross-modal retrieval which should be preserved during transfer process. This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiment on 3 datasets shows its effectiveness.

Download Full-text

Common Representation Learning Using Step-Based Correlation Multi-modal CNN

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) ◽

10.1109/acpr.2017.112 ◽

2017 ◽

Author(s):

Gaurav Bhatt ◽

Piyush Jha ◽

Balasubramanian Raman

Keyword(s):

Representation Learning ◽

Common Representation

Download Full-text

Task-Driven Common Representation Learning via Bridge Neural Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015573 ◽

2019 ◽

Vol 33 ◽

pp. 5573-5580

Author(s):

Yao Xu ◽

Xueshuang Xiang ◽

Meiyu Huang

Keyword(s):

Neural Network ◽

Canonical Correlation ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Data Sources ◽

Common Representation ◽

Total Correlation ◽

Pair Matching ◽

Training Objective

This paper introduces a novel deep learning based method, named bridge neural network (BNN) to dig the potential relationship between two given data sources task by task. The proposed approach employs two convolutional neural networks that project the two data sources into a feature space to learn the desired common representation required by the specific task. The training objective with artificial negative samples is introduced with the ability of mini-batch training and it’s asymptotically equivalent to maximizing the total correlation of the two data sources, which is verified by the theoretical analysis. The experiments on the tasks, including pair matching, canonical correlation analysis, transfer learning, and reconstruction demonstrate the state-of-the-art performance of BNN, which may provide new insights into the aspect of common representation learning.

Download Full-text

Correlational Neural Networks

Neural Computation ◽

10.1162/neco_a_00801 ◽

2016 ◽

Vol 28 (2) ◽

pp. 257-285 ◽

Cited By ~ 33

Author(s):

Sarath Chandar ◽

Mitesh M. Khapra ◽

Hugo Larochelle ◽

Balaraman Ravindran

Keyword(s):

Canonical Correlation ◽

State Of The Art ◽

Representation Learning ◽

Advantages And Disadvantages ◽

Common Representation ◽

Series Of Experiments ◽

The Common ◽

Cross Language ◽

Joint Representation ◽

Better Than

Common representation learning (CRL), wherein different descriptions (or views) of the data are embedded in a common subspace, has been receiving a lot of attention recently. Two popular paradigms here are canonical correlation analysis (CCA)–based approaches and autoencoder (AE)–based approaches. CCA-based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE-based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA-based approaches outperform AE-based approaches for the task of transfer learning, they are not as scalable as the latter. In this work, we propose an AE-based approach, correlational neural network (CorrNet), that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than AE and CCA with respect to its ability to learn correlated common representations. We employ CorrNet for several cross-language tasks and show that the representations learned using it perform better than the ones learned using other state-of-the-art approaches.

Download Full-text

Representation Learning with Multiple Lipschitz-Constrained Alignments on Partially-Labeled Cross-Domain Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5856 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4320-4327 ◽

Cited By ~ 1

Author(s):

Songlei Jian ◽

Liang Hu ◽

Longbing Cao ◽

Kai Lu

Keyword(s):

Domain Adaptation ◽

Representation Learning ◽

Superior Performance ◽

Representation Space ◽

Topological Information ◽

Source Domain ◽

Cross Domain ◽

Common Representation ◽

Cluster Assumption ◽

Structure Heterogeneity

The cross-domain representation learning plays an important role in tasks including domain adaptation and transfer learning. However, existing cross-domain representation learning focuses on building one shared space and ignores the unlabeled data in the source domain, which cannot effectively capture the distribution and structure heterogeneities in cross-domain data. To address this challenge, we propose a new cross-domain representation learning approach: MUltiple Lipschitz-constrained AligNments (MULAN) on partially-labeled cross-domain data. MULAN produces two representation spaces: a common representation space to incorporate knowledge from the source domain and a complementary representation space to complement the common representation with target local topological information by Lipschitz-constrained representation transformation. MULAN utilizes both unlabeled and labeled data in the source and target domains to address distribution heterogeneity by Lipschitz-constrained adversarial distribution alignment and structure heterogeneity by cluster assumption-based class alignment while keeping the target local topological information in complementary representation by self alignment. Moreover, MULAN is effectively equipped with a customized learning process and an iterative parameter updating process. MULAN shows its superior performance on partially-labeled semi-supervised domain adaptation and few-shot domain adaptation and outperforms the state-of-the-art visual domain adaptation models by up to 12.1%.

Download Full-text