Principled approach to the selection of the embedding dimension of networks

AbstractNetwork embedding is a general-purpose machine learning technique that encodes network structure in vector spaces with tunable dimension. Choosing an appropriate embedding dimension – small enough to be efficient and large enough to be effective – is challenging but necessary to generate embeddings applicable to a multitude of tasks. Existing strategies for the selection of the embedding dimension rely on performance maximization in downstream tasks. Here, we propose a principled method such that all structural information of a network is parsimoniously encoded. The method is validated on various embedding algorithms and a large corpus of real-world networks. The embedding dimension selected by our method in real-world networks suggest that efficient encoding in low-dimensional spaces is usually possible.

Download Full-text

Network Embedding on Hierarchical Community Structure Network

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3434747 ◽

2021 ◽

Vol 15 (4) ◽

pp. 1-23

Author(s):

Guojie Song ◽

Yun Wang ◽

Lun Du ◽

Yi Li ◽

Junshan Wang

Keyword(s):

Community Structure ◽

Structural Information ◽

Spherical Surface ◽

Network Embedding ◽

The Galaxy ◽

Community Information ◽

The Hierarchical Structure ◽

Network Properties ◽

Multi Class Classification ◽

Low Dimensional

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low-dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification, network visualization, and link prediction. The source code of GNE is available online.

Download Full-text

Relation Structure-Aware Heterogeneous Information Network Embedding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014456 ◽

2019 ◽

Vol 33 ◽

pp. 4456-4463 ◽

Cited By ~ 8

Author(s):

Yuanfu Lu ◽

Chuan Shi ◽

Linmei Hu ◽

Zhiyuan Liu

Keyword(s):

Real World ◽

Dimensional Space ◽

Structural Characteristics ◽

Information Network ◽

Network Embedding ◽

Heterogeneous Information Network ◽

Heterogeneous Information ◽

Real World Datasets ◽

Low Dimensional ◽

Embedding Methods

Heterogeneous information network (HIN) embedding aims to embed multiple types of nodes into a low-dimensional space. Although most existing HIN embedding methods consider heterogeneous relations in HINs, they usually employ one single model for all relations without distinction, which inevitably restricts the capability of network embedding. In this paper, we take the structural characteristics of heterogeneous relations into consideration and propose a novel Relation structure-aware Heterogeneous Information Network Embedding model (RHINE). By exploring the real-world networks with thorough mathematical analysis, we present two structure-related measures which can consistently distinguish heterogeneous relations into two categories: Affiliation Relations (ARs) and Interaction Relations (IRs). To respect the distinctive characteristics of relations, in our RHINE, we propose different models specifically tailored to handle ARs and IRs, which can better capture the structures and semantics of the networks. At last, we combine and optimize these models in a unified and elegant manner. Extensive experiments on three real-world datasets demonstrate that our model significantly outperforms the state-of-the-art methods in various tasks, including node clustering, link prediction, and node classification.

Download Full-text

P435 Real-world clinical outcomes after elective discontinuation of first-use biologics in IBD patients

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjz203.564 ◽

2020 ◽

Vol 14 (Supplement_1) ◽

pp. S395-S396

Author(s):

U N Shivaji ◽

A Bazarova ◽

T Critchlow ◽

O M Nardone ◽

S C Smith ◽

...

Keyword(s):

Real World ◽

Adverse Outcomes ◽

Machine Learning Technique ◽

Tertiary Referral Centre ◽

Kaplan Meier ◽

Learning Technique ◽

Logistic Regressions ◽

The Uk ◽

Almost All

Abstract Background In real-world clinical practice, biologics may be discontinued due to variety of reasons, including discontinuation by gastroenterologists, in the UK. The aim of the study was to report on outcomes after discontinuation in IBD patients after a minimum follow-up of 24 months. Methods All IBD patients who discontinued their first-use biologics between January 2013 and Dec 2016 were identified from the EMR at a tertiary referral centre to ensure at least 24 months follow-up. Reasons for discontinuation and pre-defined adverse outcomes (steroid and other rescue therapies, hospitalisations, surgery including perianal) were recorded. The data were analysed using multivariable and univariable logistic regressions within a machine learning technique in order to predict adverse outcomes within the stated timeframe. We performed Kaplan-Meier survival analysis for those patients who had biologics electively discontinued. Results In total, 147 IBD patients who discontinued biologics (M = 74.median age 39 years; CD = 110) were identified. Follow-up ranged from 24 to 60 months (median 40 months). One hundred and forty-four patients (98%) discontinued anti-TNF (1% biosimilar anti-TNF and 1% vedolizumab) and 69 (47%) were on thiopurines at start of biologics. Fifty-nine patients (40%) had biologics electively discontinued by their gastroenterologists and they were analysed separately. Of the 59 elective discontinuations, 22(37%) continued thiopurines after biologic stoppage; 37/59 (63%) patients had at least one adverse outcome (AO) within 6 months of discontinuation and 42/59 (71%) patients restarted biologics. Conclusion The majority of patients who electively discontinued biologics had IBD-related AO rapidly after the stoppage and needed to restart biologics. Almost all patients had AO by the end of the study follow-up period. This should be discussed with patients when considering discontinuation of biologics.

Download Full-text

Optimal Feature Selection of Technical Indicator and Stock Prediction Using Machine Learning Technique

Emerging Technologies in Computer Engineering: Microservices in Big Data Analytics - Communications in Computer and Information Science ◽

10.1007/978-981-13-8300-7_22 ◽

2019 ◽

pp. 261-268 ◽

Cited By ~ 3

Author(s):

Nagaraj Naik ◽

Biju R. Mohan

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Stock Prediction ◽

Machine Learning Technique ◽

Learning Technique ◽

Technical Indicator ◽

Optimal Feature Selection ◽

Optimal Feature ◽

Selection Of

Download Full-text

Learning Network Embedding with Community Structural Information

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/407 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yu Li ◽

Ying Wang ◽

Tingting Zhang ◽

Jiawei Zhang ◽

Yi Chang

Keyword(s):

Community Structure ◽

Link Prediction ◽

Structural Information ◽

Representation Learning ◽

Network Embedding ◽

Learning Network ◽

Optimization Framework ◽

Vertex Representation ◽

Low Dimensional ◽

Embedding Methods

Network embedding is an effective approach to learn the low-dimensional representations of vertices in networks, aiming to capture and preserve the structure and inherent properties of networks. The vast majority of existing network embedding methods exclusively focus on vertex proximity of networks, while ignoring the network internal community structure. However, the homophily principle indicates that vertices within the same community are more similar to each other than those from different communities, thus vertices within the same community should have similar vertex representations. Motivated by this, we propose a novel network embedding framework NECS to learn the Network Embedding with Community Structural information, which preserves the high-order proximity and incorporates the community structure in vertex representation learning. We formulate the problem into a principled optimization framework and provide an effective alternating algorithm to solve it. Extensive experimental results on several benchmark network datasets demonstrate the effectiveness of the proposed framework in various network analysis tasks including network reconstruction, link prediction and vertex classification.

Download Full-text

Galaxy Network Embedding: A Hierarchical Community Structure Preserving Approach

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/287 ◽

2018 ◽

Cited By ~ 12

Author(s):

Lun Du ◽

Zhicong Lu ◽

Yun Wang ◽

Guojie Song ◽

Yiming Wang ◽

...

Keyword(s):

Community Structure ◽

Structural Information ◽

Spherical Surface ◽

Network Embedding ◽

Structure Preserving ◽

The Galaxy ◽

Community Information ◽

The Hierarchical Structure ◽

Multi Class Classification ◽

Low Dimensional

Network embedding is a method of learning a low-dimensional vector representation of network vertices under the condition of preserving different types of network properties. Previous studies mainly focus on preserving structural information of vertices at a particular scale, like neighbor information or community information, but cannot preserve the hierarchical community structure, which would enable the network to be easily analyzed at various scales. Inspired by the hierarchical structure of galaxies, we propose the Galaxy Network Embedding (GNE) model, which formulates an optimization problem with spherical constraints to describe the hierarchical community structure preserving network embedding. More specifically, we present an approach of embedding communities into a low dimensional spherical surface, the center of which represents the parent community they belong to. Our experiments reveal that the representations from GNE preserve the hierarchical community structure and show advantages in several applications such as vertex multi-class classification and network visualization. The source code of GNE is available online.

Download Full-text

Multi-View Collaborative Network Embedding

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3441450 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-18

Author(s):

Sezin Kircali Ata ◽

Yuan Fang ◽

Min Wu ◽

Jiaqi Shi ◽

Chee Keong Kwoh ◽

...

Keyword(s):

Real World ◽

State Of The Art ◽

Second Order ◽

Collaborative Network ◽

Multiple Views ◽

Network Embedding ◽

Single View ◽

Video Sharing ◽

Low Dimensional

Real-world networks often exist with multiple views, where each view describes one type of interaction among a common set of nodes. For example, on a video-sharing network, while two user nodes are linked, if they have common favorite videos in one view, then they can also be linked in another view if they share common subscribers. Unlike traditional single-view networks, multiple views maintain different semantics to complement each other. In this article, we propose M ulti-view coll A borative N etwork E mbedding (MANE), a multi-view network embedding approach to learn low-dimensional representations. Similar to existing studies, MANE hinges on diversity and collaboration—while diversity enables views to maintain their individual semantics, collaboration enables views to work together. However, we also discover a novel form of second-order collaboration that has not been explored previously, and further unify it into our framework to attain superior node representations. Furthermore, as each view often has varying importance w.r.t. different nodes, we propose MANE , an attention -based extension of MANE, to model node-wise view importance. Finally, we conduct comprehensive experiments on three public, real-world multi-view networks, and the results demonstrate that our models consistently outperform state-of-the-art approaches.

Download Full-text

Context Attention Heterogeneous Network Embedding

Computational Intelligence and Neuroscience ◽

10.1155/2019/8106073 ◽

2019 ◽

Vol 2019 ◽

pp. 1-15

Author(s):

Wei Zhuo ◽

Qianyi Zhan ◽

Yuan Liu ◽

Zhenping Xie ◽

Jing Lu

Keyword(s):

Real World ◽

Online Social Networks ◽

Heterogeneous Network ◽

Network Embedding ◽

Node Importance ◽

Unweighted Network ◽

Real World Datasets ◽

Low Dimensional ◽

Types Of Information ◽

The Impact

Network embedding (NE), which maps nodes into a low-dimensional latent Euclidean space to represent effective features of each node in the network, has obtained considerable attention in recent years. Many popular NE methods, such as DeepWalk, Node2vec, and LINE, are capable of handling homogeneous networks. However, nodes are always fully accompanied by heterogeneous information (e.g., text descriptions, node properties, and hashtags) in the real-world network, which remains a great challenge to jointly project the topological structure and different types of information into the fixed-dimensional embedding space due to heterogeneity. Besides, in the unweighted network, how to quantify the strength of edges (tightness of connections between nodes) accurately is also a difficulty faced by existing methods. To bridge the gap, in this paper, we propose CAHNE (context attention heterogeneous network embedding), a novel network embedding method, to accurately determine the learning result. Specifically, we propose the concept of node importance to measure the strength of edges, which can better preserve the context relations of a node in unweighted networks. Moreover, text information is a widely ubiquitous feature in real-world networks, e.g., online social networks and citation networks. On account of the sophisticated interactions between the network structure and text features of nodes, CAHNE learns context embeddings for nodes by introducing the context node sequence, and the attention mechanism is also integrated into our model to better reflect the impact of context nodes on the current node. To corroborate the efficacy of CAHNE, we apply our method and various baseline methods on several real-world datasets. The experimental results show that CAHNE achieves higher quality compared to a number of state-of-the-art network embedding methods on the tasks of network reconstruction, link prediction, node classification, and visualization.

Download Full-text

Learning Signed Network Embedding via Graph Attention

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5911 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4772-4779 ◽

Cited By ~ 1

Author(s):

Yu Li ◽

Yuan Tian ◽

Jiawei Zhang ◽

Yi Chang

Keyword(s):

Network Analysis ◽

Real World ◽

Link Prediction ◽

Critical Role ◽

Network Embedding ◽

Convolutional Networks ◽

Importance Coefficient ◽

Signed Network ◽

Low Dimensional ◽

Negative Links

Learning the low-dimensional representations of graphs (i.e., network embedding) plays a critical role in network analysis and facilitates many downstream tasks. Recently graph convolutional networks (GCNs) have revolutionized the field of network embedding, and led to state-of-the-art performance in network analysis tasks such as link prediction and node classification. Nevertheless, most of the existing GCN-based network embedding methods are proposed for unsigned networks. However, in the real world, some of the networks are signed, where the links are annotated with different polarities, e.g., positive vs. negative. Since negative links may have different properties from the positive ones and can also significantly affect the quality of network embedding. Thus in this paper, we propose a novel network embedding framework SNEA to learn Signed Network Embedding via graph Attention. In particular, we propose a masked self-attentional layer, which leverages self-attention mechanism to estimate the importance coefficient for pair of nodes connected by different type of links during the embedding aggregation process. Then SNEA utilizes the masked self-attentional layers to aggregate more important information from neighboring nodes to generate the node embeddings based on balance theory. Experimental results demonstrate the effectiveness of the proposed framework through signed link prediction task on several real-world signed network datasets.

Download Full-text

Noise-Resilient Similarity Preserving Network Embedding for Social Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/455 ◽

2019 ◽

Author(s):

Zhenyu Qiu ◽

Wenbin Hu ◽

Jia Wu ◽

ZhongZheng Tang ◽

Xiaohua Jia

Keyword(s):

Real World ◽

Similarity Index ◽

Superior Performance ◽

Network Embedding ◽

Actual Structure ◽

Node Similarity ◽

Similarity Preserving ◽

Low Dimensional ◽

Influence Of Noise ◽

Embedding Methods

Network embedding assigns nodes in a network to low-dimensional representations and effectively preserves the structure and inherent properties of the network. Most existing network embedding methods didn't consider network noise. However, it is almost impossible to observe the actual structure of a real-world network without noise. The noise in the network will affect the performance of network embedding dramatically. In this paper, we aim to exploit node similarity to address the problem of social network embedding with noise and propose a node similarity preserving (NSP) embedding method. NSP exploits a comprehensive similarity index to quantify the authenticity of the observed network structure. Then we propose an algorithm to construct a correction matrix to reduce the influence of noise. Finally, an objective function for accurate network embedding is proposed and an efficient algorithm to solve the optimization problem is provided. Extensive experimental results on a variety of applications of real-world networks with noise show the superior performance of the proposed method over the state-of-the-art methods.

Download Full-text