scholarly journals Graph representation learning: a survey

Author(s):  
Fenxiao Chen ◽  
Yun-Cheng Wang ◽  
Bin Wang ◽  
C.-C. Jay Kuo

Abstract Research on graph representation learning has received great attention in recent years since most data in real-world applications come in the form of graphs. High-dimensional graph data are often in irregular forms. They are more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several stat-of-the-art methods against small and large data sets and compare their performance. Finally, potential applications and future directions are presented.

2015 ◽  
Vol 4 (2) ◽  
pp. 336
Author(s):  
Alaa Najim

<p><span lang="EN-GB">Using dimensionality reduction idea to visualize graph data sets can preserve the properties of the original space and reveal the underlying information shared among data points. Continuity Trustworthy Graph Embedding (CTGE) is new method we have introduced in this paper to improve the faithfulness of the graph visualization. We will use CTGE in graph field to find new understandable representation to be more easy to analyze and study. Several experiments on real graph data sets are applied to test the effectiveness and efficiency of the proposed method, which showed CTGE generates highly faithfulness graph representation when compared its representation with other methods.</span></p>


2020 ◽  
Vol 34 (04) ◽  
pp. 4132-4139
Author(s):  
Huiting Hong ◽  
Hantao Guo ◽  
Yucheng Lin ◽  
Xiaoqing Yang ◽  
Zang Li ◽  
...  

In this paper, we focus on graph representation learning of heterogeneous information network (HIN), in which various types of vertices are connected by various types of relations. Most of the existing methods conducted on HIN revise homogeneous graph embedding models via meta-paths to learn low-dimensional vector space of HIN. In this paper, we propose a novel Heterogeneous Graph Structural Attention Neural Network (HetSANN) to directly encode structural information of HIN without meta-path and achieve more informative representations. With this method, domain experts will not be needed to design meta-path schemes and the heterogeneous information can be processed automatically by our proposed model. Specifically, we implicitly represent heterogeneous information using the following two methods: 1) we model the transformation between heterogeneous vertices through a projection in low-dimensional entity spaces; 2) afterwards, we apply the graph neural network to aggregate multi-relational information of projected neighborhood by means of attention mechanism. We also present three extensions of HetSANN, i.e., voices-sharing product attention for the pairwise relationships in HIN, cycle-consistency loss to retain the transformation between heterogeneous entity spaces, and multi-task learning with full use of information. The experiments conducted on three public datasets demonstrate that our proposed models achieve significant and consistent improvements compared to state-of-the-art solutions.


2020 ◽  
Author(s):  
Robert L. Peach ◽  
Alexis Arnaudon ◽  
Julia A. Schmidt ◽  
Henry A. Palasciano ◽  
Nathan R. Bernier ◽  
...  

AbstractNetworks are widely used as mathematical models of complex systems across many scientific disciplines, not only in biology and medicine but also in the social sciences, physics, computing and engineering. Decades of work have produced a vast corpus of research characterising the topological, combinatorial, statistical and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and some times overlapping) characteristics of a network. In the analysis of real-world graphs, it is crucial to integrate systematically a large number of diverse graph features in order to characterise and classify networks, as well as to aid network-based scientific discovery. In this paper, we introduce HCGA, a framework for highly comparative analysis of graph data sets that computes several thousands of graph features from any given network. HCGA also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterisation of graph data sets. We show that HCGA outperforms other methodologies on supervised classification tasks on benchmark data sets whilst retaining the interpretability of network features. We also illustrate how HCGA can be used for network-based discovery through two examples where data is naturally represented as graphs: the clustering of a data set of images of neuronal morphologies, and a regression problem to predict charge transfer in organic semiconductors based on their structure. HCGA is an open platform that can be expanded to include further graph properties and statistical learning tools to allow researchers to leverage the wide breadth of graph-theoretical research to quantitatively analyse and draw insights from network data.


2021 ◽  
Author(s):  
Chen Qiao ◽  
Yuanhua Huang

RNA velocity is a promising technique to reveal transient cellular dynamics among a heterogeneous cell population and quantify their transitions from single-cell transcriptome experiments. However, the cell transitions estimated from high dimensional RNA velocity are often unstable or inaccurate, partly due to the high technical noise and less informative projection. Here, we present VeloAE, a tailored representation learning method to learn a low-dimensional representation of RNA velocity on which cell transitions can be robustly estimated. From various experimental datasets, we show that VeloAE can both accurately identify stimulation dynamics in time-series designs and effectively capture the expected cellular differentiation in different biological systems. VeloAE therefore enhances the usefulness of RNA velocity for studying a wide range of biological processes.


2020 ◽  
Vol 10 (8) ◽  
pp. 2651
Author(s):  
Su Jeong Choi ◽  
Hyun-Je Song ◽  
Seong-Bae Park

Knowledge bases such as Freebase, YAGO, DBPedia, and Nell contain a number of facts with various entities and relations. Since they store many facts, they are regarded as core resources for many natural language processing tasks. Nevertheless, they are not normally complete and have many missing facts. Such missing facts keep them from being used in diverse applications in spite of their usefulness. Therefore, it is significant to complete knowledge bases. Knowledge graph embedding is one of the promising approaches to completing a knowledge base and thus many variants of knowledge graph embedding have been proposed. It maps all entities and relations in knowledge base onto a low dimensional vector space. Then, candidate facts that are plausible in the space are determined as missing facts. However, any single knowledge graph embedding is insufficient to complete a knowledge base. As a solution to this problem, this paper defines knowledge base completion as a ranking task and proposes a committee-based knowledge graph embedding model for improving the performance of knowledge base completion. Since each knowledge graph embedding has its own idiosyncrasy, we make up a committee of various knowledge graph embeddings to reflect various perspectives. After ranking all candidate facts according to their plausibility computed by the committee, the top-k facts are chosen as missing facts. Our experimental results on two data sets show that the proposed model achieves higher performance than any single knowledge graph embedding and shows robust performances regardless of k. These results prove that the proposed model considers various perspectives in measuring the plausibility of candidate facts.


SLEEP ◽  
2020 ◽  
Vol 43 (Supplement_1) ◽  
pp. A24-A26
Author(s):  
J Hammarlund ◽  
R Anafi

Abstract Introduction We recently used unsupervised machine learning to order genome scale data along a circadian cycle. CYCLOPS (Anafi et al PNAS 2017) encodes high dimensional genomic data onto an ellipse and offers the potential to identify circadian patterns in large data-sets. This approach requires many samples from a wide range of circadian phases. Individual data-sets often lack sufficient samples. Composite expression repositories vastly increase the available data. However, these agglomerated datasets also introduce technical (e.g. processing site) and biological (e.g. age or disease) confounders that may hamper circadian ordering. Methods Using the FLUX machine learning library we expanded the CYCLOPS network. We incorporated additional encoding and decoding layers that model the influence of labeled confounding variables. These layers feed into a fully connected autoencoder with a circular bottleneck, encoding the estimated phase of each sample. The expanded network simultaneously estimates the influence of confounding variables along with circadian phase. We compared the performance of the original and expanded networks using both real and simulated expression data. In a first test, we used time-labeled data from a single-center describing human cortical samples obtained at autopsy. To generate a second, idealized processing center, we introduced gene specific biases in expression along with a bias in sample collection time. In a second test, we combined human lung biopsy data from two medical centers. Results The performance of the original CYCLOPS network degraded with the introduction of increasing, non-circadian confounds. The expanded network was able to more accurately assess circadian phase over a wider range of confounding influences. Conclusion The addition of labeled confounding variables into the network architecture improves circadian data ordering. The use of the expanded network should facilitate the application of CYCLOPS to multi-center data and expand the data available for circadian analysis. Support This work was supported by the National Cancer Institute (1R01CA227485-01)


2006 ◽  
Vol 2 (14) ◽  
pp. 592-592
Author(s):  
Paresh Prema ◽  
Nicholas A. Walton ◽  
Richard G. McMahon

Observational astronomy is entering an exciting new era with large surveys delivering deep multi-wavelength data over a wide range of the electromagnetic spectrum. The last ten years has seen a growth in the study of high redshift galaxies discovered with the method pioneered by Steidel et al. (1995) used to identify galaxies above z>1. The technique is designed to take advantage of the multi-wavelength data now available for astronomers that can extend from X-rays to radio wavelength. The technique is fast becoming a useful way to study large samples of objects at these high redshifts and we are currently designing and implementing an automated technique to study these samples of objects. However, large surveys produce large data sets that have now reached terabytes (e.g. for the Sloan Digital Sky Survey, <http://www.sdss.org>) in size and petabytes over the next 10yr (e.g., LSST, <http://www.lsst.org>). The Virtual Observatory is now providing a means to deal with this issue and users are now able to access many data sets in a quicker more useful form.


Author(s):  
Yuhan Wang ◽  
Weidong Xiao ◽  
Zhen Tan ◽  
Xiang Zhao

AbstractKnowledge graphs are typical multi-relational structures, which is consisted of many entities and relations. Nonetheless, existing knowledge graphs are still sparse and far from being complete. To refine the knowledge graphs, representation learning is utilized to embed entities and relations into low-dimensional spaces. Many existing knowledge graphs embedding models focus on learning latent features in close-world assumption but omit the changeable of each knowledge graph.In this paper, we propose a knowledge graph representation learning model, called Caps-OWKG, which leverages the capsule network to capture the both known and unknown triplets features in open-world knowledge graph. It combines the descriptive text and knowledge graph to get descriptive embedding and structural embedding, simultaneously. Then, the both above embeddings are used to calculate the probability of triplet authenticity. We verify the performance of Caps-OWKG on link prediction task with two common datasets FB15k-237-OWE and DBPedia50k. The experimental results are better than other baselines, and achieve the state-of-the-art performance.


2019 ◽  
Author(s):  
A. Viehweger ◽  
S. Krautwurst ◽  
D. H. Parks ◽  
B. König ◽  
M. Marz

AbstractAn ever-growing number of metagenomes can be used for biomining and the study of microbial functions. The use of learning algorithms in this context has been hindered, because they often need input in the form of low-dimensional, dense vectors of numbers. We propose such a representation for genomes callednanotextthat scales to very large data sets.The underlying model is learned from a corpus of nearly 150 thousand genomes spanning 750 million protein domains. We treat the protein domains in a genome like words in a document, assuming that protein domains in a similar context have similar “meaning”. This meaning can be distributed by a neural net over a vector of numbers.The resulting vectors efficiently encode function, preserve known phylogeny, capture subtle functional relationships and are robust against genome incompleteness. The “functional” distance between two vectors complements nucleotide-based distance, so that genomes can be identified as similar even though their nucleotide identity is low.nanotextcan thus encode (meta)genomes for direct use in downstream machine learning tasks. We show this by predicting plausible culture media for metagenome assembled genomes (MAGs) from theTara Oceans Expeditionusing their genome content only.nanotextis freely released under a BSD licence (https://github.com/phiweger/nanotext).


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Zhichao Hu ◽  
Likun Liu ◽  
Haining Yu ◽  
Xiangzhan Yu

Cybersecurity has become an important part of our daily lives. As an important part, there are many researches on intrusion detection based on host system call in recent years. Compared to sentences, a sequence of system calls has unique characteristics. It contains implicit pattern relationships that are less sensitive to the order of occurrence and that have less impact on the classification results when the frequency of system calls varies slightly. There are also various properties such as resource consumption, execution time, predefined rules, and empirical weights of system calls. Commonly used word embedding methods, such as Bow, TI-IDF, N-Gram, and Word2Vec, do not fully exploit such relationships in sequences as well as conveniently support attribute expansion. To solve these problems, we introduce Graph Representation based Intrusion Detection (GRID), an intrusion detection framework based on graph representation learning. It captures the potential relationships between system calls to learn better features, and it is applicable to a wide range of back-end classifiers. GRID utilizes a new sequence embedding method Graph Random State Embedding (GRSE) that uses graph structures to model a finite number of sequence items and represent the structural association relationships between them. A more efficient representation of sequence embeddings is generated by random walks, word embeddings, and graph pooling. Moreover, it can be easily extended to sequences with attributes. Our experimental results on the AFDA-LD dataset show that GRID has an average improvement of 2% using the GRSE embedding method comparing to others.


Sign in / Sign up

Export Citation Format

Share Document