Fairness in Network Representation by Latent Structural Heterogeneity in Observational Data

Xin Du; Yulong Pei; Wouter Duivesteijn; Mykola Pechenizkiy

doi:10.1609/aaai.v34i04.5792

Fairness in Network Representation by Latent Structural Heterogeneity in Observational Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5792 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3809-3816

Author(s):

Xin Du ◽

Yulong Pei ◽

Wouter Duivesteijn ◽

Mykola Pechenizkiy

Keyword(s):

Machine Learning ◽

Observational Data ◽

Representation Learning ◽

Structural Heterogeneity ◽

Heterogeneous Distribution ◽

Network Representation ◽

Representation Model ◽

Real World Datasets ◽

Synthetic Datasets ◽

Low Dimensional

While recent advances in machine learning put many focuses on fairness of algorithmic decision making, topics about fairness of representation, especially fairness of network representation, are still underexplored. Network representation learning learns a function mapping nodes to low-dimensional vectors. Structural properties, e.g. communities and roles, are preserved in the latent embedding space. In this paper, we argue that latent structural heterogeneity in the observational data could bias the classical network representation model. The unknown heterogeneous distribution across subgroups raises new challenges for fairness in machine learning. Pre-defined groups with sensitive attributes cannot properly tackle the potential unfairness of network representation. We propose a method which can automatically discover subgroups which are unfairly treated by the network representation model. The fairness measure we propose can evaluate complex targets with multi-degree interactions. We conduct randomly controlled experiments on synthetic datasets and verify our methods on real-world datasets. Both quantitative and quantitative results show that our method is effective to recover the fairness of network representations. Our research draws insight on how structural heterogeneity across subgroups restricted by attributes would affect the fairness of network representation learning.

Download Full-text

Exponential Family Graph Embeddings

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5737 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3357-3364

Author(s):

Abdulkadir Celikkanat ◽

Fragkiskos D. Malliaros

Keyword(s):

Random Walk ◽

Exponential Family ◽

Representation Learning ◽

Learning Problems ◽

Interaction Patterns ◽

Network Representation ◽

Learning Tasks ◽

Learning Techniques ◽

Real World Datasets ◽

Low Dimensional

Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional Skip-Gram model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.

Download Full-text

DeepMicro: deep representation learning for disease prediction based on microbiome data

10.1101/785626 ◽

2019 ◽

Author(s):

Min Oh ◽

Liqing Zhang

Keyword(s):

Machine Learning ◽

Optimization Procedure ◽

Representation Learning ◽

Disease Prediction ◽

Machine Learning Classification ◽

Evaluation Scheme ◽

Marker Profile ◽

Model Training ◽

Low Dimensional ◽

Microbiome Data

AbstractHuman microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.

Download Full-text

Network Representation Learning-Based Drug Mechanism Discovery and Anti-Inflammatory Response Against COVID-19

10.26434/chemrxiv.12531314.v3 ◽

2021 ◽

Author(s):

Wang Xiaoqi ◽

Bin Xin ◽

Zhijian Xu ◽

Kenli LI ◽

Fei Li ◽

...

Keyword(s):

Inflammatory Response ◽

Inflammatory Responses ◽

Representation Learning ◽

Binding Modes ◽

Network Representation ◽

Drug Mechanism ◽

Docking Program ◽

Therapeutic Development ◽

Anti Inflammatory ◽

Low Dimensional

<p>Recent studies have been demonstrated that the excessive inflammatory response is an important factor of death in COVID-19 patients. In this study, we proposed a network representation learning-based methodology, termed AIdrug2cov, to discover drug mechanism and anti-inflammatory response for patients with COVID-19. This work explores the multi-hub characteristic of a heterogeneous drug network integrating 8 unique networks. Inspired by the multi-hub characteristic, we design three billion special meta paths to train a deep representation model for learning low-dimensional vectors that integrate long-range structure dependency and complex semantic relation among network nodes. Using the representation vectors, AIdrug2cov identifies 40 potential targets and 22 high-confidence drugs that bind to tumor necrosis factor(TNF)-α or interleukin(IL)-6 to prevent excessive inflammatory responses in COVID-19 patients. Finally, we analyze mechanisms of action based on PubMed publications and ongoing clinical trials, and explore the possible binding modes between the new predicted drugs and targets via docking program. In addition, the results in 5 pharmacological application suggested that AIdrug2cov significantly outperforms 5 other state-of-the-art network representation approaches, future demonstrating the availability of AIdrug2cov in drug development field. In summary, AIdrug2cov is practically useful for accelerating COVID-19 therapeutic development. The source code and data can be downloaded from https://github.com/pengsl-lab/AIdrug2cov.git.</p>

Download Full-text

Feature Hashing for Network Representation Learning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/390 ◽

2018 ◽

Cited By ~ 2

Author(s):

Qixiang Wang ◽

Shanfeng Wang ◽

Maoguo Gong ◽

Yue Wu

Keyword(s):

Link Prediction ◽

Feature Space ◽

Representation Learning ◽

Learning Approaches ◽

Network Representation ◽

Proximity Matrix ◽

Low Dimensional ◽

Vector Representations ◽

Feature Hashing ◽

Node Embeddings

The goal of network representation learning is to embed nodes so as to encode the proximity structures of a graph into a continuous low-dimensional feature space. In this paper, we propose a novel algorithm called node2hash based on feature hashing for generating node embeddings. This approach follows the encoder-decoder framework. There are two main mapping functions in this framework. The first is an encoder to map each node into high-dimensional vectors. The second is a decoder to hash these vectors into a lower dimensional feature space. More specifically, we firstly derive a proximity measurement called expected distance as target which combines position distribution and co-occurrence statistics of nodes over random walks so as to build a proximity matrix, then introduce a set of T different hash functions into feature hashing to generate uniformly distributed vector representations of nodes from the proximity matrix. Compared with the existing state-of-the-art network representation learning approaches, node2hash shows a competitive performance on multi-class node classification and link prediction tasks on three real-world networks from various domains.

Download Full-text

TransPath: Representation Learning for Heterogeneous Information Networks via Translation Mechanism

10.20944/preprints201801.0147.v1 ◽

2018 ◽

Author(s):

Yang Fang ◽

Xiang Zhao ◽

Zhen Tan

Keyword(s):

Large Scale ◽

Representation Learning ◽

Information Networks ◽

Heterogeneous Information ◽

Structure Information ◽

Heterogeneous Information Networks ◽

Network Representation ◽

Meta Path ◽

Translation Mechanism ◽

Real World Datasets

In this paper, we propose a novel network representation learning model TransPath to encode heterogeneous information networks (HINs). Traditional network representation learning models aim to learn the embeddings of a homogeneous network. TransPath is able to capture the rich semantic and structure information of a HIN via meta-paths. We take advantage of the concept of translation mechanism in knowledge graph which regards a meta-path, instead of an edge, as a translating operation from the first node to the last node. Moreover, we propose a user-guided meta-path sampling strategy which takes users' preference as a guidance, which could explore the semantics of a path more precisely, and meanwhile improve model efficiency via the avoidance of other noisy and meaningless meta-paths. We evaluate our model on two large-scale real-world datasets DBLP and YELP, and two benchmark tasks similarity search and node classification. We observe that TransPath outperforms other state-of-the-art baselines consistently and significantly.

Download Full-text

An Information-Explainable Random Walk Based Unsupervised Network Representation Learning Framework on Node Classification Tasks

Mathematics ◽

10.3390/math9151767 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1767

Author(s):

Xin Xu ◽

Yang Lu ◽

Yupeng Zhou ◽

Zhiguo Fu ◽

Yanjie Fu ◽

...

Keyword(s):

Random Walk ◽

Representation Learning ◽

Local Information ◽

Learning Framework ◽

Network Representation ◽

Label Node ◽

Label Information ◽

Classification Tasks ◽

Node Classification ◽

Low Dimensional

Network representation learning aims to learn low-dimensional, compressible, and distributed representational vectors of nodes in networks. Due to the expensive costs of obtaining label information of nodes in networks, many unsupervised network representation learning methods have been proposed, where random walk strategy is one of the wildly utilized approaches. However, the existing random walk based methods have some challenges, including: 1. The insufficiency of explaining what network knowledge in the walking path-samplings; 2. The adverse effects caused by the mixture of different information in networks; 3. The poor generality of the methods with hyper-parameters on different networks. This paper proposes an information-explainable random walk based unsupervised network representation learning framework named Probabilistic Accepted Walk (PAW) to obtain network representation from the perspective of the stationary distribution of networks. In the framework, we design two stationary distributions based on nodes’ self-information and local-information of networks to guide our proposed random walk strategy to learn representational vectors of networks through sampling paths of nodes. Numerous experimental results demonstrated that the PAW could obtain more expressive representation than the other six widely used unsupervised network representation learning baselines on four real-world networks in single-label and multi-label node classification tasks.

Download Full-text

Inferring the Disease-Associated miRNAs Based on Network Representation Learning and Convolutional Neural Networks

International Journal of Molecular Sciences ◽

10.3390/ijms20153648 ◽

2019 ◽

Vol 20 (15) ◽

pp. 3648 ◽

Cited By ~ 9

Author(s):

Xuan ◽

Sun ◽

Wang ◽

Zhang ◽

Pan

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Prediction Models ◽

Prediction Method ◽

Feature Space ◽

Representation Learning ◽

Superior Performance ◽

Network Representation ◽

Disease Associations ◽

Low Dimensional

Identification of disease-associated miRNAs (disease miRNAs) are critical for understanding etiology and pathogenesis. Most previous methods focus on integrating similarities and associating information contained in heterogeneous miRNA-disease networks. However, these methods establish only shallow prediction models that fail to capture complex relationships among miRNA similarities, disease similarities, and miRNA-disease associations. We propose a prediction method on the basis of network representation learning and convolutional neural networks to predict disease miRNAs, called CNNMDA. CNNMDA deeply integrates the similarity information of miRNAs and diseases, miRNA-disease associations, and representations of miRNAs and diseases in low-dimensional feature space. The new framework based on deep learning was built to learn the original and global representation of a miRNA-disease pair. First, diverse biological premises about miRNAs and diseases were combined to construct the embedding layer in the left part of the framework, from a biological perspective. Second, the various connection edges in the miRNA-disease network, such as similarity and association connections, were dependent on each other. Therefore, it was necessary to learn the low-dimensional representations of the miRNA and disease nodes based on the entire network. The right part of the framework learnt the low-dimensional representation of each miRNA and disease node based on non-negative matrix factorization, and these representations were used to establish the corresponding embedding layer. Finally, the left and right embedding layers went through convolutional modules to deeply learn the complex and non-linear relationships among the similarities and associations between miRNAs and diseases. Experimental results based on cross validation indicated that CNNMDA yields superior performance compared to several state-of-the-art methods. Furthermore, case studies on lung, breast, and pancreatic neoplasms demonstrated the powerful ability of CNNMDA to discover potential disease miRNAs.

Download Full-text

An Optimized Network Representation Learning Algorithm Using Multi-Relational Data

Mathematics ◽

10.3390/math7050460 ◽

2019 ◽

Vol 7 (5) ◽

pp. 460

Author(s):

Zhonglin Ye ◽

Haixing Zhao ◽

Ke Zhang ◽

Yu Zhu ◽

Zhaoyang Wang

Keyword(s):

Knowledge Representation ◽

Learning Algorithm ◽

Representation Learning ◽

Knowledge Bases ◽

Relational Data ◽

New Approach ◽

Network Representation ◽

Structural Relationships ◽

Low Dimensional

Representation learning aims to encode the relationships of research objects into low-dimensional, compressible, and distributed representation vectors. The purpose of network representation learning is to learn the structural relationships between network vertices. Knowledge representation learning is oriented to model the entities and relationships in knowledge bases. In this paper, we first introduce the idea of knowledge representation learning into network representation learning, namely, we propose a new approach to model the vertex triplet relationships based on DeepWalk without TransE. Consequently, we propose an optimized network representation learning algorithm using multi-relational data, MRNR, which introduces the multi-relational data between vertices into the procedures of network representation learning. Importantly, we adopted a kind of higher order transformation strategy to optimize the learnt network representation vectors. The purpose of MRNR is that multi-relational data (triplets) can effectively guide and constrain the procedures of network representation learning. The experimental results demonstrate that the proposed MRNR can learn the discriminative network representations, which show better performance on network classification, visualization, and case study tasks compared to the proposed baseline algorithms in this paper.

Download Full-text

Network representation learning method embedding linear and nonlinear network structures

Semantic Web ◽

10.3233/sw-212968 ◽

2022 ◽

pp. 1-16

Author(s):

Hu Zhang ◽

Jingjing Zhou ◽

Ru Li ◽

Yue Fan

Keyword(s):

Rapid Development ◽

Linear Structure ◽

Representation Learning ◽

Nonlinear Structure ◽

Structure Information ◽

Nonlinear Network ◽

Network Representation ◽

Proposed Model ◽

Representation Method ◽

Low Dimensional

With the rapid development of neural networks, much attention has been focused on network embedding for complex network data, which aims to learn low-dimensional embedding of nodes in the network and how to effectively apply learned network representations to various graph-based analytical tasks. Two typical models exist namely the shallow random walk network representation method and deep learning models such as graph convolution networks (GCNs). The former one can be used to capture the linear structure of the network using depth-first search (DFS) and width-first search (BFS), whereas Hierarchical GCN (HGCN) is an unsupervised graph embedding that can be used to describe the global nonlinear structure of the network via aggregating node information. However, the two existing kinds of models cannot simultaneously capture the nonlinear and linear structure information of nodes. Thus, the nodal characteristics of nonlinear and linear structures are explored in this paper, and an unsupervised representation method based on HGCN that joins learning of shallow and deep models is proposed. Experiments on node classification and dimension reduction visualization are carried out on citation, language, and traffic networks. The results show that, compared with the existing shallow network representation model and deep network model, the proposed model achieves better performances in terms of micro-F1, macro-F1 and accuracy scores.

Download Full-text

DLDTI: A learning-based framework for identification of drug-target interaction using neural networks and network representation

10.1101/2020.07.31.230763 ◽

2020 ◽

Author(s):

Yihan Zhao ◽

Kai Zheng ◽

Baoyi Guan ◽

Mengmeng Guo ◽

Lei Song ◽

...

Keyword(s):

Neural Networks ◽

Drug Target ◽

Large Scale ◽

Molecular Mechanisms ◽

Representation Learning ◽

Heterogeneous Data ◽

Biological Data ◽

Mapping Space ◽

Network Representation ◽

Low Dimensional

AbstractTo elucidate novel molecular mechanisms of known drugs, efficient and feasible computational methods for predicting potential drug-target interactions (DTI) would be of great importance. A novel calculation model called DLDTI was generated for predicting DTI based on network representation learning and convolutional neural networks. The proposed approach simultaneously fuses the topology of complex networks and diverse information from heterogeneous data sources and copes with the noisy, incomplete, and high-dimensional nature of large-scale biological data by learning low-dimensional and rich depth features of drugs and proteins. Low-dimensional feature vectors were used to train DLDTI to obtain optimal mapping space and infer new DTIs by ranking DTI candidates based on their proximity to optimal mapping space. DLDTI achieves promising performance under 5-fold cross-validation with AUC values of 0.9172, which was higher than that of the method based on different classifiers or different feature combination technique. Moreover, biomedical experiments were also completed to validate DLDTI’s performance. Consistent with the predicted result, tetramethylpyrazine, a member of pyrazines, reduced atherosclerosis progression and inhibited signal transduction in platelets, via PI3K/Akt, cAMP and calcium signaling pathways. The source code and datasets explored in this work are available at https://github.com/CUMTzackGit/DLDTI

Download Full-text