scholarly journals Fairness in Network Representation by Latent Structural Heterogeneity in Observational Data

2020 ◽  
Vol 34 (04) ◽  
pp. 3809-3816
Author(s):  
Xin Du ◽  
Yulong Pei ◽  
Wouter Duivesteijn ◽  
Mykola Pechenizkiy

While recent advances in machine learning put many focuses on fairness of algorithmic decision making, topics about fairness of representation, especially fairness of network representation, are still underexplored. Network representation learning learns a function mapping nodes to low-dimensional vectors. Structural properties, e.g. communities and roles, are preserved in the latent embedding space. In this paper, we argue that latent structural heterogeneity in the observational data could bias the classical network representation model. The unknown heterogeneous distribution across subgroups raises new challenges for fairness in machine learning. Pre-defined groups with sensitive attributes cannot properly tackle the potential unfairness of network representation. We propose a method which can automatically discover subgroups which are unfairly treated by the network representation model. The fairness measure we propose can evaluate complex targets with multi-degree interactions. We conduct randomly controlled experiments on synthetic datasets and verify our methods on real-world datasets. Both quantitative and quantitative results show that our method is effective to recover the fairness of network representations. Our research draws insight on how structural heterogeneity across subgroups restricted by attributes would affect the fairness of network representation learning.

2020 ◽  
Vol 34 (04) ◽  
pp. 3357-3364
Author(s):  
Abdulkadir Celikkanat ◽  
Fragkiskos D. Malliaros

Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional Skip-Gram model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.


2019 ◽  
Author(s):  
Min Oh ◽  
Liqing Zhang

AbstractHuman microbiota plays a key role in human health and growing evidence supports the potential use of microbiome as a predictor of various diseases. However, the high-dimensionality of microbiome data, often in the order of hundreds of thousands, yet low sample sizes, poses great challenge for machine learning-based prediction algorithms. This imbalance induces the data to be highly sparse, preventing from learning a better prediction model. Also, there has been little work on deep learning applications to microbiome data with a rigorous evaluation scheme. To address these challenges, we propose DeepMicro, a deep representation learning framework allowing for an effective representation of microbiome profiles. DeepMicro successfully transforms high-dimensional microbiome data into a robust low-dimensional representation using various autoencoders and applies machine learning classification algorithms on the learned representation. In disease prediction, DeepMicro outperforms the current best approaches based on the strain-level marker profile in five different datasets. In addition, by significantly reducing the dimensionality of the marker profile, DeepMicro accelerates the model training and hyperparameter optimization procedure with 8X-30X speedup over the basic approach. DeepMicro is freely available at https://github.com/minoh0201/DeepMicro.


2021 ◽  
Author(s):  
Wang Xiaoqi ◽  
Bin Xin ◽  
Zhijian Xu ◽  
Kenli LI ◽  
Fei Li ◽  
...  

<p>Recent studies have been demonstrated that the excessive inflammatory response is an important factor of death in COVID-19 patients. In this study, we proposed a network representation learning-based methodology, termed AIdrug2cov, to discover drug mechanism and anti-inflammatory response for patients with COVID-19. This work explores the multi-hub characteristic of a heterogeneous drug network integrating 8 unique networks. Inspired by the multi-hub characteristic, we design three billion special meta paths to train a deep representation model for learning low-dimensional vectors that integrate long-range structure dependency and complex semantic relation among network nodes. Using the representation vectors, AIdrug2cov identifies 40 potential targets and 22 high-confidence drugs that bind to tumor necrosis factor(TNF)-α or interleukin(IL)-6 to prevent excessive inflammatory responses in COVID-19 patients. Finally, we analyze mechanisms of action based on PubMed publications and ongoing clinical trials, and explore the possible binding modes between the new predicted drugs and targets via docking program. In addition, the results in 5 pharmacological application suggested that AIdrug2cov significantly outperforms 5 other state-of-the-art network representation approaches, future demonstrating the availability of AIdrug2cov in drug development field. In summary, AIdrug2cov is practically useful for accelerating COVID-19 therapeutic development. The source code and data can be downloaded from https://github.com/pengsl-lab/AIdrug2cov.git.</p>


Author(s):  
Qixiang Wang ◽  
Shanfeng Wang ◽  
Maoguo Gong ◽  
Yue Wu

The goal of network representation learning is to embed nodes so as to encode the proximity structures of a graph into a continuous low-dimensional feature space. In this paper, we propose a novel algorithm called node2hash based on feature hashing for generating node embeddings. This approach follows the encoder-decoder framework. There are two main mapping functions in this framework. The first is an encoder to map each node into high-dimensional vectors. The second is a decoder to hash these vectors into a lower dimensional feature space. More specifically, we firstly derive a proximity measurement called expected distance as target which combines position distribution and co-occurrence statistics of nodes over random walks so as to build a proximity matrix, then introduce a set of T different hash functions into feature hashing to generate uniformly distributed vector representations of nodes from the proximity matrix. Compared with the existing state-of-the-art network representation learning approaches, node2hash shows a competitive performance on multi-class node classification and link prediction tasks on three real-world networks from various domains.


Author(s):  
Yang Fang ◽  
Xiang Zhao ◽  
Zhen Tan

In this paper, we propose a novel network representation learning model TransPath to encode heterogeneous information networks (HINs). Traditional network representation learning models aim to learn the embeddings of a homogeneous network. TransPath is able to capture the rich semantic and structure information of a HIN via meta-paths. We take advantage of the concept of translation mechanism in knowledge graph which regards a meta-path, instead of an edge, as a translating operation from the first node to the last node. Moreover, we propose a user-guided meta-path sampling strategy which takes users' preference as a guidance, which could explore the semantics of a path more precisely, and meanwhile improve model efficiency via the avoidance of other noisy and meaningless meta-paths. We evaluate our model on two large-scale real-world datasets DBLP and YELP, and two benchmark tasks similarity search and node classification. We observe that TransPath outperforms other state-of-the-art baselines consistently and significantly.


Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1767
Author(s):  
Xin Xu ◽  
Yang Lu ◽  
Yupeng Zhou ◽  
Zhiguo Fu ◽  
Yanjie Fu ◽  
...  

Network representation learning aims to learn low-dimensional, compressible, and distributed representational vectors of nodes in networks. Due to the expensive costs of obtaining label information of nodes in networks, many unsupervised network representation learning methods have been proposed, where random walk strategy is one of the wildly utilized approaches. However, the existing random walk based methods have some challenges, including: 1. The insufficiency of explaining what network knowledge in the walking path-samplings; 2. The adverse effects caused by the mixture of different information in networks; 3. The poor generality of the methods with hyper-parameters on different networks. This paper proposes an information-explainable random walk based unsupervised network representation learning framework named Probabilistic Accepted Walk (PAW) to obtain network representation from the perspective of the stationary distribution of networks. In the framework, we design two stationary distributions based on nodes’ self-information and local-information of networks to guide our proposed random walk strategy to learn representational vectors of networks through sampling paths of nodes. Numerous experimental results demonstrated that the PAW could obtain more expressive representation than the other six widely used unsupervised network representation learning baselines on four real-world networks in single-label and multi-label node classification tasks.


2019 ◽  
Vol 20 (15) ◽  
pp. 3648 ◽  
Author(s):  
Xuan ◽  
Sun ◽  
Wang ◽  
Zhang ◽  
Pan

Identification of disease-associated miRNAs (disease miRNAs) are critical for understanding etiology and pathogenesis. Most previous methods focus on integrating similarities and associating information contained in heterogeneous miRNA-disease networks. However, these methods establish only shallow prediction models that fail to capture complex relationships among miRNA similarities, disease similarities, and miRNA-disease associations. We propose a prediction method on the basis of network representation learning and convolutional neural networks to predict disease miRNAs, called CNNMDA. CNNMDA deeply integrates the similarity information of miRNAs and diseases, miRNA-disease associations, and representations of miRNAs and diseases in low-dimensional feature space. The new framework based on deep learning was built to learn the original and global representation of a miRNA-disease pair. First, diverse biological premises about miRNAs and diseases were combined to construct the embedding layer in the left part of the framework, from a biological perspective. Second, the various connection edges in the miRNA-disease network, such as similarity and association connections, were dependent on each other. Therefore, it was necessary to learn the low-dimensional representations of the miRNA and disease nodes based on the entire network. The right part of the framework learnt the low-dimensional representation of each miRNA and disease node based on non-negative matrix factorization, and these representations were used to establish the corresponding embedding layer. Finally, the left and right embedding layers went through convolutional modules to deeply learn the complex and non-linear relationships among the similarities and associations between miRNAs and diseases. Experimental results based on cross validation indicated that CNNMDA yields superior performance compared to several state-of-the-art methods. Furthermore, case studies on lung, breast, and pancreatic neoplasms demonstrated the powerful ability of CNNMDA to discover potential disease miRNAs.


Mathematics ◽  
2019 ◽  
Vol 7 (5) ◽  
pp. 460
Author(s):  
Zhonglin Ye ◽  
Haixing Zhao ◽  
Ke Zhang ◽  
Yu Zhu ◽  
Zhaoyang Wang

Representation learning aims to encode the relationships of research objects into low-dimensional, compressible, and distributed representation vectors. The purpose of network representation learning is to learn the structural relationships between network vertices. Knowledge representation learning is oriented to model the entities and relationships in knowledge bases. In this paper, we first introduce the idea of knowledge representation learning into network representation learning, namely, we propose a new approach to model the vertex triplet relationships based on DeepWalk without TransE. Consequently, we propose an optimized network representation learning algorithm using multi-relational data, MRNR, which introduces the multi-relational data between vertices into the procedures of network representation learning. Importantly, we adopted a kind of higher order transformation strategy to optimize the learnt network representation vectors. The purpose of MRNR is that multi-relational data (triplets) can effectively guide and constrain the procedures of network representation learning. The experimental results demonstrate that the proposed MRNR can learn the discriminative network representations, which show better performance on network classification, visualization, and case study tasks compared to the proposed baseline algorithms in this paper.


Semantic Web ◽  
2022 ◽  
pp. 1-16
Author(s):  
Hu Zhang ◽  
Jingjing Zhou ◽  
Ru Li ◽  
Yue Fan

With the rapid development of neural networks, much attention has been focused on network embedding for complex network data, which aims to learn low-dimensional embedding of nodes in the network and how to effectively apply learned network representations to various graph-based analytical tasks. Two typical models exist namely the shallow random walk network representation method and deep learning models such as graph convolution networks (GCNs). The former one can be used to capture the linear structure of the network using depth-first search (DFS) and width-first search (BFS), whereas Hierarchical GCN (HGCN) is an unsupervised graph embedding that can be used to describe the global nonlinear structure of the network via aggregating node information. However, the two existing kinds of models cannot simultaneously capture the nonlinear and linear structure information of nodes. Thus, the nodal characteristics of nonlinear and linear structures are explored in this paper, and an unsupervised representation method based on HGCN that joins learning of shallow and deep models is proposed. Experiments on node classification and dimension reduction visualization are carried out on citation, language, and traffic networks. The results show that, compared with the existing shallow network representation model and deep network model, the proposed model achieves better performances in terms of micro-F1, macro-F1 and accuracy scores.


2020 ◽  
Author(s):  
Yihan Zhao ◽  
Kai Zheng ◽  
Baoyi Guan ◽  
Mengmeng Guo ◽  
Lei Song ◽  
...  

AbstractTo elucidate novel molecular mechanisms of known drugs, efficient and feasible computational methods for predicting potential drug-target interactions (DTI) would be of great importance. A novel calculation model called DLDTI was generated for predicting DTI based on network representation learning and convolutional neural networks. The proposed approach simultaneously fuses the topology of complex networks and diverse information from heterogeneous data sources and copes with the noisy, incomplete, and high-dimensional nature of large-scale biological data by learning low-dimensional and rich depth features of drugs and proteins. Low-dimensional feature vectors were used to train DLDTI to obtain optimal mapping space and infer new DTIs by ranking DTI candidates based on their proximity to optimal mapping space. DLDTI achieves promising performance under 5-fold cross-validation with AUC values of 0.9172, which was higher than that of the method based on different classifiers or different feature combination technique. Moreover, biomedical experiments were also completed to validate DLDTI’s performance. Consistent with the predicted result, tetramethylpyrazine, a member of pyrazines, reduced atherosclerosis progression and inhibited signal transduction in platelets, via PI3K/Akt, cAMP and calcium signaling pathways. The source code and datasets explored in this work are available at https://github.com/CUMTzackGit/DLDTI


Sign in / Sign up

Export Citation Format

Share Document