scholarly journals Variational Graph Embedding and Clustering with Laplacian Eigenmaps

Author(s):  
Zitai Chen ◽  
Chuan Chen ◽  
Zong Zhang ◽  
Zibin Zheng ◽  
Qingsong Zou

As a fundamental machine learning problem, graph clustering has facilitated various real-world applications, and tremendous efforts had been devoted to it in the past few decades. However, most of the existing methods like spectral clustering suffer from the sparsity, scalability, robustness and handling high dimensional raw information in clustering. To address this issue, we propose a deep probabilistic model, called Variational Graph Embedding and Clustering with Laplacian Eigenmaps (VGECLE), which learns node embeddings and assigns node clusters simultaneously. It represents each node as a Gaussian distribution to disentangle the true embedding position and the uncertainty from the graph. With a Mixture of Gaussian (MoG) prior, VGECLE is capable of learning an interpretable clustering by the variational inference and generative process. In order to learn the pairwise relationships better, we propose a Teacher-Student mechanism encouraging node to learn a better Gaussian from its instant neighbors in the stochastic gradient descent (SGD) training fashion. By optimizing the graph embedding and the graph clustering problem as a whole, our model can fully take the advantages in their correlation. To our best knowledge, we are the first to tackle graph clustering in a deep probabilistic viewpoint. We perform extensive experiments on both synthetic and real-world networks to corroborate the effectiveness and efficiency of the proposed framework.

Author(s):  
Deepali Virmani ◽  
Nikita Jain ◽  
Ketan Parikh ◽  
Shefali Upadhyaya ◽  
Abhishek Srivastav

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.


Author(s):  
Bayu Distiawan Trisedya ◽  
Jianzhong Qi ◽  
Rui Zhang

The task of entity alignment between knowledge graphs aims to find entities in two knowledge graphs that represent the same real-world entity. Recently, embedding-based models are proposed for this task. Such models are built on top of a knowledge graph embedding model that learns entity embeddings to capture the semantic similarity between entities in the same knowledge graph. We propose to learn embeddings that can capture the similarity between entities in different knowledge graphs. Our proposed model helps align entities from different knowledge graphs, and hence enables the integration of multiple knowledge graphs. Our model exploits large numbers of attribute triples existing in the knowledge graphs and generates attribute character embeddings. The attribute character embedding shifts the entity embeddings from two knowledge graphs into the same space by computing the similarity between entities based on their attributes. We use a transitivity rule to further enrich the number of attributes of an entity to enhance the attribute character embedding. Experiments using real-world knowledge bases show that our proposed model achieves consistent improvements over the baseline models by over 50% in terms of hits@1 on the entity alignment task.


2020 ◽  
Vol 90 (1) ◽  
pp. 54-74 ◽  
Author(s):  
MALEKA DONALDSON

In this portrait, Maleka Donaldson vividly illustrates how two teachers in real-world, public school settings convey their expectations for kindergarten student performance and set the tone for learning from mistakes and feedback. Research in psychology and education has established the benefits of corrective feedback on learning but has not closely examined how practicing teachers respond to mistakes made by young children during day-to-day instruction. Donaldson draws on extended observations of teacher-student interactions to juxtapose the two contexts and reveal divergent techniques that the participating teachers use to frame mistakes and correct answers during instruction. She compares these variations and considers how each teacher's pedagogical tools could be integrated into a mistake-response toolkit that could fundamentally reshape learning from mistakes for kindergarteners.


2016 ◽  
Author(s):  
Anna Navrotskaya ◽  
Victor Il’ev

2020 ◽  
Vol 8 (2) ◽  
Author(s):  
Leo Torres ◽  
Kevin S Chan ◽  
Tina Eliassi-Rad

Abstract Graph embedding seeks to build a low-dimensional representation of a graph $G$. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps (LE), which constructs a graph embedding based on the spectral properties of the Laplacian matrix of $G$. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of $G$. We introduce a new approach, Geometric Laplacian Eigenmap Embedding, and demonstrate that it outperforms various other techniques (including LE) in the tasks of graph reconstruction and link prediction.


Author(s):  
Qikun Xiang ◽  
Jie Zhang ◽  
Ido Nevat ◽  
Pengfei Zhang

Data trustworthiness is a crucial issue in real-world participatory sensing applications. Without considering this issue, different types of worker misbehavior, especially the challenging collusion attacks, can result in biased and inaccurate estimation and decision making. We propose a novel trust-based mixture of Gaussian processes (GP) model for spatial regression to jointly detect such misbehavior and accurately estimate the spatial field. We develop a Markov chain Monte Carlo (MCMC)-based algorithm to efficiently perform Bayesian inference of the model. Experiments using two real-world datasets show the superior robustness of our model compared with existing approaches.


2019 ◽  
Author(s):  
Sven Festag ◽  
Cord Spreckelsen

BACKGROUND Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. METHODS The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. RESULTS These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. CONCLUSIONS Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection.


2020 ◽  
Vol 2020 (12) ◽  
pp. 124010
Author(s):  
Sebastian Goldt ◽  
Madhu S Advani ◽  
Andrew M Saxe ◽  
Florent Krzakala ◽  
Lenka Zdeborová

Sign in / Sign up

Export Citation Format

Share Document