Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/194 ◽

2020 ◽

Author(s):

Shuo Zhang ◽

Lei Xie

Keyword(s):

Neural Networks ◽

Theoretical Analysis ◽

Message Passing ◽

Representation Learning ◽

Attention Mechanism ◽

Structured Data ◽

Clear Understanding ◽

Graph Classification ◽

Competitive Performance ◽

Graph Neural Networks

Graph Neural Networks (GNNs) are powerful for the representation learning of graph-structured data. Most of the GNNs use a message-passing scheme, where the embedding of a node is iteratively updated by aggregating the information from its neighbors. To achieve a better expressive capability of node influences, attention mechanism has grown to be popular to assign trainable weights to the nodes in aggregation. Though the attention-based GNNs have achieved remarkable results in various tasks, a clear understanding of their discriminative capacities is missing. In this work, we present a theoretical analysis of the representational properties of the GNN that adopts the attention mechanism as an aggregator. Our analysis determines all cases when those attention-based GNNs can always fail to distinguish certain distinct structures. Those cases appear due to the ignorance of cardinality information in attention-based aggregation. To improve the performance of attention-based GNNs, we propose cardinality preserved attention (CPA) models that can be applied to any kind of attention mechanisms. Our experiments on node and graph classification confirm our theoretical analysis and show the competitive performance of our CPA models. The code is available online: https://github.com/zetayue/CPA.

Download Full-text

Coloring Graph Neural Networks for Node Disambiguation

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/294 ◽

2020 ◽

Author(s):

George Dasoulas ◽

Ludovic Dos Santos ◽

Kevin Scaman ◽

Aladin Virmaux

Keyword(s):

Neural Networks ◽

Message Passing ◽

State Of The Art ◽

Structural Characteristics ◽

Expressive Power ◽

Continuous Functions ◽

Graph Classification ◽

Node Attributes ◽

Graph Neural Networks ◽

Coloring Graph

In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks (MPNNs). More specifically, we introduce a graph neural network called Colored Local Iterative Procedure (CLIP) that uses colors to disambiguate identical node attributes, and show that this representation is a universal approximator of continuous functions on graphs with node attributes. Our method relies on separability, a key topological characteristic that allows to extend well-chosen neural networks into universal representations. Finally, we show experimentally that CLIP is capable of capturing structural characteristics that traditional MPNNs fail to distinguish, while being state-of-the-art on benchmark graph classification datasets.

Download Full-text

Pairwise Half-graph Discrimination: A Simple Graph-level Self-supervised Strategy for Pre-training Graph Neural Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/371 ◽

2021 ◽

Author(s):

Pengyong Li ◽

Jun Wang ◽

Ziliang Li ◽

Yixuan Qiao ◽

Xianggen Liu ◽

...

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Learning Strategy ◽

Binary Classification ◽

Representation Learning ◽

Graph Representation ◽

Superior Performance ◽

Graph Classification ◽

Training Strategy ◽

Graph Neural Networks

Self-supervised learning has gradually emerged as a powerful technique for graph representation learning. However, transferable, generalizable, and robust representation learning on graph data still remains a challenge for pre-training graph neural networks. In this paper, we propose a simple and effective self-supervised pre-training strategy, named Pairwise Half-graph Discrimination (PHD), that explicitly pre-trains a graph neural network at graph-level. PHD is designed as a simple binary classification task to discriminate whether two half-graphs come from the same source. Experiments demonstrate that the PHD is an effective pre-training strategy that offers comparable or superior performance on 13 graph classification tasks compared with state-of-the-art strategies, and achieves notable improvements when combined with node-level strategies. Moreover, the visualization of learned representation revealed that PHD strategy indeed empowers the model to learn graph-level knowledge like the molecular scaffold. These results have established PHD as a powerful and effective self-supervised learning strategy in graph-level representation learning.

Download Full-text

UniGNN: a Unified Framework for Graph and Hypergraph Neural Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/353 ◽

2021 ◽

Author(s):

Jing Huang ◽

Jie Yang

Keyword(s):

Neural Networks ◽

Message Passing ◽

State Of The Art ◽

Representation Learning ◽

Graph Representation ◽

Challenging Problem ◽

Unified Framework ◽

Real World Datasets ◽

Graph Neural Networks ◽

Research Domains

Hypergraph, an expressive structure with flexibility to model the higher-order correlations among entities, has recently attracted increasing attention from various research domains. Despite the success of Graph Neural Networks (GNNs) for graph representation learning, how to adapt the powerful GNN-variants directly into hypergraphs remains a challenging problem. In this paper, we propose UniGNN, a unified framework for interpreting the message passing process in graph and hypergraph neural networks, which can generalize general GNN models into hypergraphs. In this framework, meticulously-designed architectures aiming to deepen GNNs can also be incorporated into hypergraphs with the least effort. Extensive experiments have been conducted to demonstrate the effectiveness of UniGNN on multiple real-world datasets, which outperform the state-of-the-art approaches with a large margin. Especially for the DBLP dataset, we increase the accuracy from 77.4% to 88.8% in the semi-supervised hypernode classification task. We further prove that the proposed message-passing based UniGNN models are at most as powerful as the 1-dimensional Generalized Weisfeiler-Leman (1-GWL) algorithm in terms of distinguishing non-isomorphic hypergraphs. Our code is available at https://github.com/OneForward/UniGNN.

Download Full-text

GSSNN: Graph Smoothing Splines Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6185 ◽

2020 ◽

Vol 34 (04) ◽

pp. 7007-7014

Author(s):

Shichao Zhu ◽

Lewei Zhou ◽

Shirui Pan ◽

Chuan Zhou ◽

Guiying Yan ◽

...

Keyword(s):

Neural Networks ◽

State Of The Art ◽

Representation Learning ◽

Smoothing Splines ◽

Graph Representation ◽

Graph Classification ◽

Topological Information ◽

Graph Representations ◽

The Arts ◽

Graph Neural Networks

Graph Neural Networks (GNNs) have achieved state-of-the-art performance in many graph data analysis tasks. However, they still suffer from two limitations for graph representation learning. First, they exploit non-smoothing node features which may result in suboptimal embedding and degenerated performance for graph classification. Second, they only exploit neighbor information but ignore global topological knowledge. Aiming to overcome these limitations simultaneously, in this paper, we propose a novel, flexible, and end-to-end framework, Graph Smoothing Splines Neural Networks (GSSNN), for graph classification. By exploiting the smoothing splines, which are widely used to learn smoothing fitting function in regression, we develop an effective feature smoothing and enhancement module Scaled Smoothing Splines (S3) to learn graph embedding. To integrate global topological information, we design a novel scoring module, which exploits closeness, degree, as well as self-attention values, to select important node features as knots for smoothing splines. These knots can be potentially used for interpreting classification results. In extensive experiments on biological and social datasets, we demonstrate that our model achieves state-of-the-arts and GSSNN is superior in learning more robust graph representations. Furthermore, we show that S3 module is easily plugged into existing GNNs to improve their performance.

Download Full-text

TSI-GNN: Extending Graph Neural Networks to Handle Missing Data in Temporal Settings

Frontiers in Big Data ◽

10.3389/fdata.2021.693869 ◽

2021 ◽

Vol 4 ◽

Author(s):

David Gordon ◽

Panayiotis Petousis ◽

Henry Zheng ◽

Davina Zamanzadeh ◽

Alex A.T. Bui

Keyword(s):

Neural Networks ◽

Missing Data ◽

Bipartite Graph ◽

Message Passing ◽

Representation Learning ◽

Temporal Information ◽

Graph Representation ◽

Sequence Information ◽

Novel Approach ◽

Graph Neural Networks

We present a novel approach for imputing missing data that incorporates temporal information into bipartite graphs through an extension of graph representation learning. Missing data is abundant in several domains, particularly when observations are made over time. Most imputation methods make strong assumptions about the distribution of the data. While novel methods may relax some assumptions, they may not consider temporality. Moreover, when such methods are extended to handle time, they may not generalize without retraining. We propose using a joint bipartite graph approach to incorporate temporal sequence information. Specifically, the observation nodes and edges with temporal information are used in message passing to learn node and edge embeddings and to inform the imputation task. Our proposed method, temporal setting imputation using graph neural networks (TSI-GNN), captures sequence information that can then be used within an aggregation function of a graph neural network. To the best of our knowledge, this is the first effort to use a joint bipartite graph approach that captures sequence information to handle missing data. We use several benchmark datasets to test the performance of our method against a variety of conditions, comparing to both classic and contemporary methods. We further provide insight to manage the size of the generated TSI-GNN model. Through our analysis we show that incorporating temporal information into a bipartite graph improves the representation at the 30% and 60% missing rate, specifically when using a nonlinear model for downstream prediction tasks in regularly sampled datasets and is competitive with existing temporal methods under different scenarios.

Download Full-text

Message Passing Attention Networks for Document Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6376 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8544-8551 ◽

Cited By ~ 2

Author(s):

Giannis Nikolentzos ◽

Antoine Tixier ◽

Michalis Vazirgiannis

Keyword(s):

Neural Networks ◽

Text Classification ◽

Message Passing ◽

State Of The Art ◽

Structured Data ◽

Attention Networks ◽

Document Understanding ◽

Standard Text ◽

Graph Neural Networks ◽

The Impact

Graph neural networks have recently emerged as a very effective framework for processing graph-structured data. These models have achieved state-of-the-art performance in many tasks. Most graph neural networks can be described in terms of message passing, vertex update, and readout functions. In this paper, we represent documents as word co-occurrence networks and propose an application of the message passing framework to NLP, the Message Passing Attention network for Document understanding (MPAD). We also propose several hierarchical variants of MPAD. Experiments conducted on 10 standard text classification datasets show that our architectures are competitive with the state-of-the-art. Ablation studies reveal further insights about the impact of the different components on performance. Code is publicly available at: https://github.com/giannisnik/mpad.

Download Full-text

Reinforced Neighborhood Selection Guided Multi-Relational Graph Neural Networks

ACM Transactions on Information Systems ◽

10.1145/3490181 ◽

2022 ◽

Vol 40 (4) ◽

pp. 1-46

Author(s):

Hao Peng ◽

Ruitong Zhang ◽

Yingtong Dou ◽

Renyu Yang ◽

Jingyi Zhang ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Message Passing ◽

Representation Learning ◽

Graph Representations ◽

Graph Data ◽

Neighborhood Information ◽

Neighborhood Selection ◽

Graph Neural Networks

Graph Neural Networks (GNNs) have been widely used for the representation learning of various structured graph data, typically through message passing among nodes by aggregating their neighborhood information via different operations. While promising, most existing GNNs oversimplify the complexity and diversity of the edges in the graph and thus are inefficient to cope with ubiquitous heterogeneous graphs, which are typically in the form of multi-relational graph representations. In this article, we propose RioGNN , a novel Reinforced, recursive, and flexible neighborhood selection guided multi-relational Graph Neural Network architecture, to navigate complexity of neural network structures whilst maintaining relation-dependent representations. We first construct a multi-relational graph, according to the practical task, to reflect the heterogeneity of nodes, edges, attributes, and labels. To avoid the embedding over-assimilation among different types of nodes, we employ a label-aware neural similarity measure to ascertain the most similar neighbors based on node attributes. A reinforced relation-aware neighbor selection mechanism is developed to choose the most similar neighbors of a targeting node within a relation before aggregating all neighborhood information from different relations to obtain the eventual node embedding. Particularly, to improve the efficiency of neighbor selecting, we propose a new recursive and scalable reinforcement learning framework with estimable depth and width for different scales of multi-relational graphs. RioGNN can learn more discriminative node embedding with enhanced explainability due to the recognition of individual importance of each relation via the filtering threshold mechanism. Comprehensive experiments on real-world graph data and practical tasks demonstrate the advancements of effectiveness, efficiency, and the model explainability, as opposed to other comparative GNN models.

Download Full-text

Multi-hop Attention Graph Neural Networks

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/425 ◽

2021 ◽

Author(s):

Guangtao Wang ◽

Rex Ying ◽

Jing Huang ◽

Jure Leskovec

Keyword(s):

Neural Networks ◽

Large Scale ◽

Performance Metrics ◽

State Of The Art ◽

Structural Information ◽

Representation Learning ◽

Attention Mechanism ◽

Graph Representation ◽

Knowledge Graph ◽

Graph Neural Networks

Self-attention mechanism in graph neural networks (GNNs) led to state-of-the-art performance on many graph representation learning tasks. Currently, at every layer, attention is computed between connected pairs of nodes and depends solely on the representation of the two nodes. However, such attention mechanism does not account for nodes that are not directly connected but provide important network context. Here we propose Multi-hop Attention Graph Neural Network (MAGNA), a principled way to incorporate multi-hop context information into every layer of attention computation. MAGNA diffuses the attention scores across the network, which increases the receptive field for every layer of the GNN. Unlike previous approaches, MAGNA uses a diffusion prior on attention values, to efficiently account for all paths between the pair of disconnected nodes. We demonstrate in theory and experiments that MAGNA captures large-scale structural information in every layer, and has a low-pass effect that eliminates noisy high-frequency information from graph data. Experimental results on node classification as well as the knowledge graph completion benchmarks show that MAGNA achieves state-of-the-art results: MAGNA achieves up to 5.7% relative error reduction over the previous state-of-the-art on Cora, Citeseer, and Pubmed. MAGNA also obtains the best performance on a large-scale Open Graph Benchmark dataset. On knowledge graph completion MAGNA advances state-of-the-art on WN18RR and FB15k-237 across four different performance metrics.

Download Full-text