Correlational Neural Networks

Common representation learning (CRL), wherein different descriptions (or views) of the data are embedded in a common subspace, has been receiving a lot of attention recently. Two popular paradigms here are canonical correlation analysis (CCA)–based approaches and autoencoder (AE)–based approaches. CCA-based approaches learn a joint representation by maximizing correlation of the views when projected to the common subspace. AE-based methods learn a common representation by minimizing the error of reconstructing the two views. Each of these approaches has its own advantages and disadvantages. For example, while CCA-based approaches outperform AE-based approaches for the task of transfer learning, they are not as scalable as the latter. In this work, we propose an AE-based approach, correlational neural network (CorrNet), that explicitly maximizes correlation among the views when projected to the common subspace. Through a series of experiments, we demonstrate that the proposed CorrNet is better than AE and CCA with respect to its ability to learn correlated common representations. We employ CorrNet for several cross-language tasks and show that the representations learned using it perform better than the ones learned using other state-of-the-art approaches.

Download Full-text

Task-Driven Common Representation Learning via Bridge Neural Network

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015573 ◽

2019 ◽

Vol 33 ◽

pp. 5573-5580

Author(s):

Yao Xu ◽

Xueshuang Xiang ◽

Meiyu Huang

Keyword(s):

Neural Network ◽

Canonical Correlation ◽

State Of The Art ◽

Feature Space ◽

Representation Learning ◽

Data Sources ◽

Common Representation ◽

Total Correlation ◽

Pair Matching ◽

Training Objective

This paper introduces a novel deep learning based method, named bridge neural network (BNN) to dig the potential relationship between two given data sources task by task. The proposed approach employs two convolutional neural networks that project the two data sources into a feature space to learn the desired common representation required by the specific task. The training objective with artificial negative samples is introduced with the ability of mini-batch training and it’s asymptotically equivalent to maximizing the total correlation of the two data sources, which is verified by the theoretical analysis. The experiments on the tasks, including pair matching, canonical correlation analysis, transfer learning, and reconstruction demonstrate the state-of-the-art performance of BNN, which may provide new insights into the aspect of common representation learning.

Download Full-text

Joint Representation Learning of Legislator and Legislation for Roll Call Prediction

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/198 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yuqiao Yang ◽

Xiaoqiang Lin ◽

Geng Lin ◽

Zengfeng Huang ◽

Changjian Jiang ◽

...

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Ideal Point ◽

Representation Learning ◽

Context Information ◽

Roll Call ◽

Triplet Loss ◽

Joint Representation ◽

Narrative Description ◽

The Ideal

In this paper, we explore to learn representations of legislation and legislator for the prediction of roll call results. The most popular approach for this topic is named the ideal point model that relies on historical voting information for representation learning of legislators. It largely ignores the context information of the legislative data. We, therefore, propose to incorporate context information to learn dense representations for both legislators and legislation. For legislators, we incorporate relations among them via graph convolutional neural networks (GCN) for their representation learning. For legislation, we utilize its narrative description via recurrent neural networks (RNN) for representation learning. In order to align two kinds of representations in the same vector space, we introduce a triplet loss for the joint training. Experimental results on a self-constructed dataset show the effectiveness of our model for roll call results prediction compared to some state-of-the-art baselines.

Download Full-text

Graph Debiased Contrastive Learning with Joint Representation Clustering

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/473 ◽

2021 ◽

Author(s):

Han Zhao ◽

Xu Yang ◽

Zhenru Wang ◽

Erkun Yang ◽

Cheng Deng

Keyword(s):

State Of The Art ◽

False Negative ◽

Poor Performance ◽

Representation Learning ◽

Graph Representation ◽

Learning Framework ◽

Clustering And Classification ◽

Class Information ◽

Classification Tasks ◽

Joint Representation

By contrasting positive-negative counterparts, graph contrastive learning has become a prominent technique for unsupervised graph representation learning. However, existing methods fail to consider the class information and will introduce false-negative samples in the random negative sampling, causing poor performance. To this end, we propose a graph debiased contrastive learning framework, which can jointly perform representation learning and clustering. Specifically, representations can be optimized by aligning with clustered class information, and simultaneously, the optimized representations can promote clustering, leading to more powerful representations and clustering results. More importantly, we randomly select negative samples from the clusters which are different from the positive sample's cluster. In this way, as the supervisory signals, the clustering results can be utilized to effectively decrease the false-negative samples. Extensive experiments on five datasets demonstrate that our method achieves new state-of-the-art results on graph clustering and classification tasks.

Download Full-text

Critical Properties of the Assembly Call of the Common American Crow

Behaviour ◽

10.1163/156853978x00026 ◽

1978 ◽

Vol 64 (3-4) ◽

pp. 184-203 ◽

Cited By ~ 10

Author(s):

Nicholas S. Thompson ◽

David B. Richards

Keyword(s):

Sound Source ◽

Traditional View ◽

High Rate ◽

American Crow ◽

Corvus Brachyrhynchos ◽

Temporal Properties ◽

Series Of Experiments ◽

The Common ◽

Functional Context ◽

Better Than

AbstractAccording to tradition, the communication system of the American crow, Corvus brachyrhynchos, consists of an assortment of distinct sounds each of which is used in a particular context and has a unique meaning. Despite this traditional view, we have made field observations which suggested that the sounds employed in various different functional contexts overlap considerably. These observations further suggested that each sound does not have a single unique meaning, but that its meaning varies depending upon how it and similar sounds are temporally organized into calling sequences. In order to investigate this idea, a series of experiments were performed in which the temporal properties of natural sounds recorded from crows in the field were changed. These experiments were concerned primarily with the vocalization known as the assembly call. The assembly call consists of series of sounds which are low, harsh, and variable in pitch and timing. Broadcast to crows in the field, recorded assembly calls provoke an aggregation of crows to the sound source about twenty-five percent of the time. The recordings broadcasted were of two sorts: sequences made up by modifying the temporal properties of a natural assembly call and sequences of sounds derived from calls given in other functional contexts which were then rearranged to approximate the temporal properties of an assembly call. These calls were tested on wild crows in the field. A presentation of a call was counted successful if at least one crow approached the sound source on a direct line. Different calls were compared with respect to the proportion of successful presentations. The results show that not all types of crow sounds can be manufactured into effective assembly calls. A high pitched call, even when arranged to approximate the temporal properties of the assembly call does not assemble crows at rates approaching the rate of assembly to natural assembly calls. On the other hand, the results also show that a sound need not be derived from an assembly call in order to be arranged into an effective assembly call. A call recorded in another functional context, but which has a harsh, grainy quality will assemble crows as well as or better than an assembly call if it is presented in the proper temporal arrangement. In fact, the highest rates of success were provoked by a sequence of such sounds having a high rate of emission and organized into short cycles of increasing rate. Such a call is two to four times more effective than a natural assembly call. These results are inconsistent with the traditional view that each particular caw in the repertoire of a crow has a discrete stable meaning. An alternate hypothesis is suggested in which the meaning of a sequence of crow sounds is thought to depend not only on the properties of the caws but upon the temporal properties of the sequence as well.

Download Full-text

A Bag of Concepts Approach for Biomedical Document Classification Using Wikipedia Knowledge

Methods of Information in Medicine ◽

10.3414/me17-01-0028 ◽

2017 ◽

Vol 56 (05) ◽

pp. 370-376 ◽

Cited By ~ 1

Author(s):

Roberto Pérez-Rodríguez ◽

Luis E. Anido-Rifón ◽

Marcos A. Mouriño-García

Keyword(s):

Machine Translation ◽

Semantic Analysis ◽

State Of The Art ◽

Statistical Significance ◽

Text Documents ◽

Domain Specific ◽

Specific Concept ◽

Cross Language ◽

Better Than

SummaryObjectives: The ability to efficiently review the existing literature is essential for the rapid progress of research. This paper describes a classifier of text documents, represented as vectors in spaces of Wikipedia concepts, and analyses its suitability for classification of Spanish biomedical documents when only English documents are available for training. We propose the cross-language concept matching (CLCM) technique, which relies on Wikipedia interlanguage links to convert concept vectors from the Spanish to the English space.Methods: The performance of the classifier is compared to several baselines: a classifier based on machine translation, a classifier that represents documents after performing Explicit Semantic Analysis (ESA), and a classifier that uses a domain-specific semantic an- notator (MetaMap). The corpus used for the experiments (Cross-Language UVigoMED) was purpose-built for this study, and it is composed of 12,832 English and 2,184 Spanish MEDLINE abstracts.Results: The performance of our approach is superior to any other state-of-the art classifier in the benchmark, with performance increases up to: 124% over classical machine translation, 332% over MetaMap, and 60 times over the classifier based on ESA. The results have statistical significance, showing p-values < 0.0001.Conclusion: Using knowledge mined from Wikipedia to represent documents as vectors in a space of Wikipedia concepts and translating vectors between language-specific concept spaces, a cross-language classifier can be built, and it performs better than several state-of-the-art classifiers.

Download Full-text

Network Representation Learning Enhanced by Partial Community Information That Is Found Using Game Theory

Information ◽

10.3390/info12050186 ◽

2021 ◽

Vol 12 (5) ◽

pp. 186

Author(s):

Hanlin Sun ◽

Wei Jie ◽

Jonathan Loo ◽

Liang Chen ◽

Zhongmin Wang ◽

...

Keyword(s):

Game Theory ◽

Community Structure ◽

State Of The Art ◽

Representation Learning ◽

Network Embedding ◽

Network Representation ◽

Low Dimension ◽

Community Information ◽

Model Community ◽

Series Of Experiments

Presently, data that are collected from real systems and organized as information networks are universal. Mining hidden information from these data is generally helpful to understand and benefit the corresponding systems. The challenges of analyzing such data include high computational complexity and low parallelizability because of the nature of complicated interconnected structure of their nodes. Network representation learning, also called network embedding, provides a practical and promising way to solve these issues. One of the foremost requirements of network embedding is preserving network topology properties in learned low-dimension representations. Community structure is a prominent characteristic of complex networks and thus should be well maintained. However, the difficulty lies in the fact that the properties of community structure are multivariate and complicated; therefore, it is insufficient to model community structure using a predefined model, the way that is popular in most state-of-the-art network embedding algorithms explicitly considering community structure preservation. In this paper, we introduce a multi-process parallel framework for network embedding that is enhanced by found partial community information and can preserve community properties well. We also implement the framework and propose two node embedding methods that use game theory for detecting partial community information. A series of experiments are conducted to evaluate the performance of our methods and six state-of-the-art algorithms. The results demonstrate that our methods can effectively preserve community properties of networks in their low-dimension representations. Specifically, compared to the involved baselines, our algorithms behave the best and are the runners-up on networks with high overlapping diversity and density.

Download Full-text

A Probability Distribution and Location-aware ResNet Approach for QoS Prediction

Journal of Web Engineering ◽

10.13052/jwe1540-9589.20415 ◽

2021 ◽

Author(s):

Wenyan Zhang ◽

Ling Xu ◽

Meng Yan ◽

Ziliang Wang ◽

Chunlei Fu

Keyword(s):

Probability Distribution ◽

State Of The Art ◽

Cloud Platform ◽

Qos Prediction ◽

Location Aware ◽

Single Feature ◽

Nonlinear Features ◽

Series Of Experiments ◽

Low Dimensional ◽

Better Than

In recent years, the number of online services has grown rapidly, invoking the required services through the cloud platform has become the primary trend. How to help users choose and recommend high-quality services among huge amounts of unused services has become a hot issue in research. Among the existing QoS prediction methods, the collaborative filtering (CF) method can only learn low-dimensional linear characteristics, and its effect is limited by sparse data. Although existing deep learning methods could capture high-dimensional nonlinear features better, most of them only use the single feature of identity, and the problem of network deepening gradient disappearance is serious, so the effect of QoS prediction is unsatisfactory. To address these problems, we propose an advanced probability distribution and location-aware ResNet approach for QoS Prediction (PLRes). This approach considers the historical invocations probability distribution and location characteristics of users and services, and first uses the ResNet in QoS prediction to reuses the features, which alleviates the problems of gradient disappearance and model degradation. A series of experiments are conducted on a real-world web service dataset WS-DREAM. At the density of 5%–30%, the experimental results on both QoS attribute response time and throughput indicate that PLRes performs better than the existing five state-of-the-art QoS prediction approaches.

Download Full-text

Practical Lessons on 12-Lead ECG Classification: Meta-Analysis of Methods From PhysioNet/Computing in Cardiology Challenge 2020

Frontiers in Physiology ◽

10.3389/fphys.2021.811661 ◽

2022 ◽

Vol 12 ◽

Author(s):

Shenda Hong ◽

Wenrui Zhang ◽

Chenxi Sun ◽

Yuxi Zhou ◽

Hongyan Li

Keyword(s):

Data Augmentation ◽

Meta Analysis ◽

Single Type ◽

Training Strategy ◽

Advantages And Disadvantages ◽

The Common ◽

Classification Tasks ◽

Electrocardiogram Ecg ◽

And Training ◽

Better Than

Cardiovascular diseases (CVDs) are one of the most fatal disease groups worldwide. Electrocardiogram (ECG) is a widely used tool for automatically detecting cardiac abnormalities, thereby helping to control and manage CVDs. To encourage more multidisciplinary researches, PhysioNet/Computing in Cardiology Challenge 2020 (Challenge 2020) provided a public platform involving multi-center databases and automatic evaluations for ECG classification tasks. As a result, 41 teams successfully submitted their solutions and were qualified for rankings. Although Challenge 2020 was a success, there has been no in-depth methodological meta-analysis of these solutions, making it difficult for researchers to benefit from the solutions and results. In this study, we aim to systematically review the 41 solutions in terms of data processing, feature engineering, model architecture, and training strategy. For each perspective, we visualize and statistically analyze the effectiveness of the common techniques, and discuss the methodological advantages and disadvantages. Finally, we summarize five practical lessons based on the aforementioned analysis: (1) Data augmentation should be employed and adapted to specific scenarios; (2) Combining different features can improve performance; (3) A hybrid design of different types of deep neural networks (DNNs) is better than using a single type; (4) The use of end-to-end architectures should depend on the task being solved; (5) Multiple models are better than one. We expect that our meta-analysis will help accelerate the research related to ECG classification based on machine-learning models.

Download Full-text

NPALOSS: NEIGHBORING PIXEL AFFINITY LOSS FOR SEMANTIC SEGMENTATION IN HIGH-RESOLUTION AERIAL IMAGERY

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-475-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 475-482

Author(s):

Y. Feng ◽

W. Diao ◽

X. Sun ◽

J. Li ◽

K. Chen ◽

...

Keyword(s):

High Resolution ◽

State Of The Art ◽

Semantic Segmentation ◽

Aerial Imagery ◽

Normalization Strategy ◽

Pixel Pair ◽

Neighboring Pixel ◽

The Common ◽

Object Shapes ◽

Better Than

Abstract. The performance of semantic segmentation in high-resolution aerial imagery has been improved rapidly through the introduction of deep fully convolutional neural network (FCN). However, due to the complexity of object shapes and sizes, the labeling accuracy of small-sized objects and object boundaries still need to be improved. In this paper, we propose a neighboring pixel affinity loss (NPALoss) to improve the segmentation performance of these hard pixels. Specifically, we address the issues of how to determine the classifying difficulty of one pixel and how to get the suitable weight margin between well-classified pixels and hard pixels. Firstly, we convert the first problem into a problem that the pixel categories in the neighborhood are the same or different. Based on this idea, we build a neighboring pixel affinity map by counting the pixel-pair relationships for each pixel in the search region. Secondly, we investigate different weight transformation strategies for the affinity map to explore the suitable weight margin and avoid gradient overflow. The logarithm compression strategy is better than the normalization strategy, especially the common logarithm. Finally, combining the affinity map and logarithm compression strategy, we build NPALoss to adaptively assign different weights for each pixel. Comparative experiments are conducted on the ISPRS Vaihingen dataset and several commonly-used state-of-the-art networks. We demonstrate that our proposed approach can achieve promising results.

Download Full-text

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/91 ◽

2021 ◽

Author(s):

Zhihao Fan ◽

Zhongyu Wei ◽

Siyuan Wang ◽

Ruize Wang ◽

Zejun Li ◽

...

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Experimental Results ◽

Text Representation ◽

Image Captioning ◽

Scene Graph ◽

Low Level ◽

Language And Vision ◽

High Level ◽

Cross Language

Existing research for image captioning usually represents an image using a scene graph with low-level facts (objects and relations) and fails to capture the high-level semantics. In this paper, we propose a Theme Concepts extended Image Captioning (TCIC) framework that incorporates theme concepts to represent high-level cross-modality semantics. In practice, we model theme concepts as memory vectors and propose Transformer with Theme Nodes (TTN) to incorporate those vectors for image captioning. Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN. On the vision side, TTN is configured to take both scene graph based features and theme concepts as input for visual representation learning. On the language side, TTN is configured to take both captions and theme concepts as input for text representation re-construction. Both settings aim to generate target captions with the same transformer-based decoder. During the training, we further align representations of theme concepts learned from images and corresponding captions to enforce the cross-modality learning. Experimental results on MS COCO show the effectiveness of our approach compared to some state-of-the-art models.

Download Full-text