Task-Driven Common Representation Learning via Bridge Neural Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015573 ◽

2019 ◽

Vol 33 ◽

pp. 5573-5580

Author(s):

Yao Xu ◽

Xueshuang Xiang ◽

Meiyu Huang

Keyword(s):

Neural Network ◽

Canonical Correlation ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Data Sources ◽

Common Representation ◽

Total Correlation ◽

Pair Matching ◽

Training Objective

This paper introduces a novel deep learning based method, named bridge neural network (BNN) to dig the potential relationship between two given data sources task by task. The proposed approach employs two convolutional neural networks that project the two data sources into a feature space to learn the desired common representation required by the specific task. The training objective with artificial negative samples is introduced with the ability of mini-batch training and it’s asymptotically equivalent to maximizing the total correlation of the two data sources, which is verified by the theoretical analysis. The experiments on the tasks, including pair matching, canonical correlation analysis, transfer learning, and reconstruction demonstrate the state-of-the-art performance of BNN, which may provide new insights into the aspect of common representation learning.

Download Full-text

Correlational Neural Networks

Neural Computation ◽

10.1162/neco_a_00801 ◽

2016 ◽

Vol 28 (2) ◽

pp. 257-285 ◽

Cited By ~ 33

Author(s):

Sarath Chandar ◽

Mitesh M. Khapra ◽

Hugo Larochelle ◽

Balaraman Ravindran

Keyword(s):

Canonical Correlation ◽

State Of The Art ◽

Representation Learning ◽

Advantages And Disadvantages ◽

Common Representation ◽

Series Of Experiments ◽

The Common ◽

Cross Language ◽

Joint Representation ◽

Better Than

Common representation learning (CRL), wherein different descriptions (or views) of the data are embedded in a common subspace, has been receiving a lot of attention recently. Two popular paradigms here are canonical correlation analysis (CCA)–based approaches and autoencoder (AE)–based approaches. CCA-based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE-based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA-based approaches outperform AE-based approaches for the task of transfer learning, they are not as scalable as the latter. In this work, we propose an AE-based approach, correlational neural network (CorrNet), that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than AE and CCA with respect to its ability to learn correlated common representations. We employ CorrNet for several cross-language tasks and show that the representations learned using it perform better than the ones learned using other state-of-the-art approaches.

Download Full-text

A Bridge Neural Network-Based Optical-SAR Image Joint Intelligent Interpretation Framework

Space: Science & Technology ◽

10.34133/2021/9841456 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Meiyu Huang ◽

Yao Xu ◽

Lixin Qian ◽

Weili Shi ◽

Yaqin Zhang ◽

...

Keyword(s):

Neural Network ◽

Feature Space ◽

Representation Learning ◽

Learning Ability ◽

Sar Image ◽

Sar Images ◽

Optical Images ◽

Image Patches ◽

Pair Matching ◽

Correlated Information

The current interpretation technology of remote sensing images is mainly focused on single-modal data, which cannot fully utilize the complementary and correlated information of multimodal data with heterogeneous characteristics, especially for synthetic aperture radar (SAR) data and optical imagery. To solve this problem, we propose a bridge neural network- (BNN-) based optical-SAR image joint intelligent interpretation framework, optimizing the feature correlation between optical and SAR images through optical-SAR matching tasks. It adopts BNN to effectively improve the capability of common feature extraction of optical and SAR images and thus improving the accuracy and application scenarios of specific intelligent interpretation tasks for optical-SAR/SAR/optical images. Specifically, BNN projects optical and SAR images into a common feature space and mines their correlation through pair matching. Further, to deeply exploit the correlation between optical and SAR images and ensure the great representation learning ability of BNN, we build the QXS-SAROPT dataset containing 20,000 pairs of perfectly aligned optical-SAR image patches with diverse scenes of high resolutions. Experimental results on optical-to-SAR crossmodal object detection demonstrate the effectiveness and superiority of our framework. In particular, based on the QXS-SAROPT dataset, our framework can achieve up to 96% high accuracy on four benchmark SAR ship detection datasets.

Download Full-text

A Hybrid Network for Large-Scale Action Recognition from RGB and Depth Modalities

Sensors ◽

10.3390/s20113305 ◽

2020 ◽

Vol 20 (11) ◽

pp. 3305 ◽

Cited By ~ 1

Author(s):

Huogen Wang ◽

Zhanjie Song ◽

Wanqing Li ◽

Pichao Wang

Keyword(s):

Neural Network ◽

Action Recognition ◽

Canonical Correlation ◽

Large Scale ◽

State Of The Art ◽

Hybrid Network ◽

Support Vector ◽

Multiple Modalities ◽

Large Margin ◽

Percentage Points

The paper presents a novel hybrid network for large-scale action recognition from multiple modalities. The network is built upon the proposed weighted dynamic images. It effectively leverages the strengths of the emerging Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based approaches to specifically address the challenges that occur in large-scale action recognition and are not fully dealt with by the state-of-the-art methods. Specifically, the proposed hybrid network consists of a CNN based component and an RNN based component. Features extracted by the two components are fused through canonical correlation analysis and then fed to a linear Support Vector Machine (SVM) for classification. The proposed network achieved state-of-the-art results on the ChaLearn LAP IsoGD, NTU RGB+D and Multi-modal & Multi-view & Interactive ( M 2 I ) datasets and outperformed existing methods by a large margin (over 10 percentage points in some cases).

Download Full-text

Star Topology Convolution for Graph Representation Learning

10.36227/techrxiv.12805799.v2 ◽

2020 ◽

Author(s):

Chong Wu ◽

Zhenan Feng ◽

Jiangbin Zheng ◽

Houwang Zhang ◽

Jiawang Cao ◽

...

Keyword(s):

Protein Identification ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Graph Representation ◽

Global Features ◽

Star Topology ◽

Identification Methods ◽

Benchmark Datasets ◽

Deep Layers

<div><div><div><p>We present a novel graph convolutional method called star topology convolution (STC). This method makes graph convolution more similar to conventional convolutional neural networks (CNNs) in Euclidean feature space. Unlike most existing spectral convolutional methods, this method learns subgraphs which have a star topology rather than a fixed graph. It has fewer parameters in its convolutional filter and is inductive so that it is more flexible and can be applied to large and evolving graphs. As for CNNs in Euclidean feature space, the convolutional filter is localized and maintains a good weight sharing property. By introducing deep layers, the method can learn global features like a CNN. To validate the method, STC was compared to state-of-the-art spectral convolutional and spatial convolutional methods in a supervised learning setting on three benchmark datasets: Cora, Citeseer and Pubmed. The experimental results show that STC outperforms the other methods. STC was also applied to protein identification tasks and outperformed traditional and advanced protein identification methods.</p></div></div></div>

Download Full-text

Domain Adaptation and Domain Generalization with Representation Learning

10.26686/wgtn.17014700 ◽

2021 ◽

Author(s):

◽

Muhammad Ghifary

Keyword(s):

Neural Network ◽

Object Recognition ◽

Domain Adaptation ◽

State Of The Art ◽

Representation Learning ◽

Training Data ◽

Data Representations ◽

Source Data ◽

Target Environment ◽

Target Data

<p>Machine learning has achieved great successes in the area of computer vision, especially in object recognition or classification. One of the core factors of the successes is the availability of massive labeled image or video data for training, collected manually by human. Labeling source training data, however, can be expensive and time consuming. Furthermore, a large amount of labeled source data may not always guarantee traditional machine learning techniques to generalize well; there is a potential bias or mismatch in the data, i.e., the training data do not represent the target environment. To mitigate the above dataset bias/mismatch, one can consider domain adaptation: utilizing labeled training data and unlabeled target data to develop a well-performing classifier on the target environment. In some cases, however, the unlabeled target data are nonexistent, but multiple labeled sources of data exist. Such situations can be addressed by domain generalization: using multiple source training sets to produce a classifier that generalizes on the unseen target domain. Although several domain adaptation and generalization approaches have been proposed, the domain mismatch in object recognition remains a challenging, open problem – the model performance has yet reached to a satisfactory level in real world applications. The overall goal of this thesis is to progress towards solving dataset bias in visual object recognition through representation learning in the context of domain adaptation and domain generalization. Representation learning is concerned with finding proper data representations or features via learning rather than via engineering by human experts. This thesis proposes several representation learning solutions based on deep learning and kernel methods. This thesis introduces a robust-to-noise deep neural network for handwritten digit classification trained on “clean” images only, which we name Deep Hybrid Network (DHN). DHNs are based on a particular combination of sparse autoencoders and restricted Boltzmann machines. The results show that DHN performs better than the standard deep neural network in recognizing digits with Gaussian and impulse noise, block and border occlusions. This thesis proposes the Domain Adaptive Neural Network (DaNN), a neural network based domain adaptation algorithm that minimizes the classification error and the domain discrepancy between the source and target data representations. The experiments show the competitiveness of DaNN against several state-of-the-art methods on a benchmark object dataset. This thesis develops the Multi-task Autoencoder (MTAE), a domain generalization algorithm based on autoencoders trained via multi-task learning. MTAE learns to transform the original image into its analogs in multiple related domains simultaneously. The results show that the MTAE’s representations provide better classification performance than some alternative autoencoder-based models as well as the current state-of-the-art domain generalization algorithms. This thesis proposes a fast kernel-based representation learning algorithm for both domain adaptation and domain generalization, Scatter Component Analysis (SCA). SCA finds a data representation that trades between maximizing the separability of classes, minimizing the mismatch between domains, and maximizing the separability of the whole data points. The results show that SCA performs much faster than some competitive algorithms, while providing state-of-the-art accuracy in both domain adaptation and domain generalization. Finally, this thesis presents the Deep Reconstruction-Classification Network (DRCN), a deep convolutional network for domain adaptation. DRCN learns to classify labeled source data and also to reconstruct unlabeled target data via a shared encoding representation. The results show that DRCN provides competitive or better performance than the prior state-of-the-art model on several cross-domain object datasets.</p>

Download Full-text

Regularized Chained Deep Neural Network Classifier for Multiple Annotators

Applied Sciences ◽

10.3390/app11125409 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5409

Author(s):

Julián Gil-González ◽

Andrés Valencia-Duque ◽

Andrés Álvarez-Meza ◽

Álvaro Orozco-Gutiérrez ◽

Andrea García-Moreno

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

State Of The Art ◽

Feature Space ◽

Ground Truth ◽

Amazon Mechanical Turk ◽

Input Feature ◽

Multiple Annotators ◽

Classification Tasks ◽

Multiple Experts

The increasing popularity of crowdsourcing platforms, i.e., Amazon Mechanical Turk, changes how datasets for supervised learning are built. In these cases, instead of having datasets labeled by one source (which is supposed to be an expert who provided the absolute gold standard), databases holding multiple annotators are provided. However, most state-of-the-art methods devoted to learning from multiple experts assume that the labeler’s behavior is homogeneous across the input feature space. Besides, independence constraints are imposed on annotators’ outputs. This paper presents a regularized chained deep neural network to deal with classification tasks from multiple annotators. The introduced method, termed RCDNN, jointly predicts the ground truth label and the annotators’ performance from input space samples. In turn, RCDNN codes interdependencies among the experts by analyzing the layers’ weights and includes l1, l2, and Monte-Carlo Dropout-based regularizers to deal with the over-fitting issue in deep learning models. Obtained results (using both simulated and real-world annotators) demonstrate that RCDNN can deal with multi-labelers scenarios for classification tasks, defeating state-of-the-art techniques.

Download Full-text

Siamese CNN-BiLSTM Architecture for 3D Shape Representation Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/93 ◽

2018 ◽

Cited By ~ 11

Author(s):

Guoxian Dai ◽

Jin Xie ◽

Yi Fang

Keyword(s):

Neural Network ◽

Loss Function ◽

Short Term Memory ◽

Shape Representation ◽

Feature Space ◽

Representation Learning ◽

3D Shape ◽

Aggregate Information ◽

3D Shapes ◽

2D Images

Learning a 3D shape representation from a collection of its rendered 2D images has been extensively studied. However, existing view-based techniques have not yet fully exploited the information among all the views of projections. In this paper, by employing recurrent neural network to efficiently capture features across different views, we propose a siamese CNN-BiLSTM network for 3D shape representation learning. The proposed method minimizes a discriminative loss function to learn a deep nonlinear transformation, mapping 3D shapes from the original space into a nonlinear feature space. In the transformed space, the distance of 3D shapes with the same label is minimized, otherwise the distance is maximized to a large margin. Specifically, the 3D shapes are first projected into a group of 2D images from different views. Then convolutional neural network (CNN) is adopted to extract features from different view images, followed by a bidirectional long short-term memory (LSTM) to aggregate information across different views. Finally, we construct the whole CNN-BiLSTM network into a siamese structure with contrastive loss function. Our proposed method is evaluated on two benchmarks, ModelNet40 and SHREC 2014, demonstrating superiority over the state-of-the-art methods.

Download Full-text

Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5347 ◽

2020 ◽

Vol 34 (01) ◽

pp. 164-172

Author(s):

Sijie Mai ◽

Haifeng Hu ◽

Songlong Xing

Keyword(s):

Neural Network ◽

State Of The Art ◽

Representation Learning ◽

Multimodal Fusion ◽

Multiple Datasets ◽

Multi Stage ◽

Invariant Embedding ◽

Joint Embedding ◽

Adversarial Training ◽

Additional Constraints

Learning joint embedding space for various modalities is of vital importance for multimodal fusion. Mainstream modality fusion approaches fail to achieve this goal, leaving a modality gap which heavily affects cross-modal fusion. In this paper, we propose a novel adversarial encoder-decoder-classifier framework to learn a modality-invariant embedding space. Since the distributions of various modalities vary in nature, to reduce the modality gap, we translate the distributions of source modalities into that of target modality via their respective encoders using adversarial training. Furthermore, we exert additional constraints on embedding space by introducing reconstruction loss and classification loss. Then we fuse the encoded representations using hierarchical graph neural network which explicitly explores unimodal, bimodal and trimodal interactions in multi-stage. Our method achieves state-of-the-art performance on multiple datasets. Visualization of the learned embeddings suggests that the joint embedding space learned by our method is discriminative.

Download Full-text

Temporal Pyramid Pooling Convolutional Neural Network for Cover Song Identification

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/673 ◽

2019 ◽

Cited By ~ 5

Author(s):

Zhesong Yu ◽

Xiaoshuo Xu ◽

Xiaoou Chen ◽

Deshun Yang

Keyword(s):

Neural Network ◽

Time Complexity ◽

State Of The Art ◽

Representation Learning ◽

Training Scheme ◽

Cover Song ◽

Music Information ◽

Cover Songs ◽

Musical Variations ◽

Extract Information

Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.

Download Full-text

Star Topology Convolution for Graph Representation Learning

10.36227/techrxiv.12805799.v1 ◽

2020 ◽

Author(s):

Chong Wu ◽

Zhenan Feng ◽

Jiangbin Zheng ◽

Houwang Zhang ◽

Jiawang Cao ◽

...

Keyword(s):

Protein Identification ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Graph Representation ◽

Convolution Kernel ◽

Star Topology ◽

Identification Methods ◽

Feature Spaces ◽

Benchmark Datasets

<div><div><div><p>We present a novel graph convolutional method called star topology convolution (STC). This method makes graph convolution more similar to conventional convolutional in neural networks (CNNs) in Euclidean feature space. Unlike most existing spectral convolution methods, this method learns subgraphs which have a star topology rather than a fixed graph. It has fewer parameters in its convolution kernel and is inductive so that it is more flexible and can be applied to large and evolving graphs. As for CNNs in Euclidean feature spaces, the convolution kernel is localized and maintains good sharing. By increasing the depth of a layer, the method can learn lobal features like a CNN. To validate the method, STC was compared to state-of-the-art spectral convolution and spatial convolution methods in a supervised learning setting on three benchmark datasets: Cora, Citeseer and Pubmed. The experimental results show that STC outperforms the other methods. STC was also applied to protein identification tasks and outperformed traditional and advanced protein identification methods.</p></div></div></div>

Download Full-text