scholarly journals Simultaneous Representation Learning and Clustering for Incomplete Multi-view Data

Author(s):  
Wenzhang Zhuge ◽  
Chenping Hou ◽  
Xinwang Liu ◽  
Hong Tao ◽  
Dongyun Yi

Incomplete multi-view clustering has attracted various attentions from diverse fields. Most existing methods factorize data to learn a unified representation linearly. Their performance may degrade when the relations between the unified representation and data of different views are nonlinear. Moreover, they need post-processing on the unified representations to extract the clustering indicators, which separates the consensus learning and subsequent clustering. To address these issues, in this paper, we propose a Simultaneous Representation Learning and Clustering (SRLC) method. Concretely, SRLC constructs similarity matrices to measure the relations between pair of instances, and learns low-dimensional representations of present instances on each view and a common probability label matrix simultaneously. Thus, the nonlinear information can be reflected by these representations and the clustering results can obtained from label matrix directly. An efficient iterative algorithm with guaranteed convergence is presented for optimization. Experiments on several datasets demonstrate the advantages of the proposed approach.

Author(s):  
Nicolo Botteghi ◽  
Ruben Obbink ◽  
Daan Geijs ◽  
Mannes Poel ◽  
Beril Sirmacek ◽  
...  

2020 ◽  
Author(s):  
Jing Qian ◽  
Gangmin Li ◽  
Katie Atkinson ◽  
Yong Yue

Knowledge representation learning (KRL) aims at encoding components of a knowledge graph (KG) into a low-dimensional continuous space, which has brought considerable successes in applying deep learning to graph embedding. Most famous KGs contain only positive instances for space efficiency. Typical KRL techniques, especially translational distance-based models, are trained through discriminating positive and negative samples. Thus, negative sampling is unquestionably a non-trivial step in KG embedding. The quality of generated negative samples can directly influence the performance of final knowledge representations in downstream tasks, such as link prediction and triple classification. This review summarizes current negative sampling methods in KRL and we categorize them into three sorts, fixed distribution-based, generative adversarial net (GAN)-based and cluster sampling. Based on this categorization we discuss the most prevalent existing approaches and their characteristics.


2022 ◽  
Vol 13 (1) ◽  
pp. 1-23
Author(s):  
Christoffer Löffler ◽  
Luca Reeb ◽  
Daniel Dzibela ◽  
Robert Marzilger ◽  
Nicolas Witt ◽  
...  

This work proposes metric learning for fast similarity-based scene retrieval of unstructured ensembles of trajectory data from large databases. We present a novel representation learning approach using Siamese Metric Learning that approximates a distance preserving low-dimensional representation and that learns to estimate reasonable solutions to the assignment problem. To this end, we employ a Temporal Convolutional Network architecture that we extend with a gating mechanism to enable learning from sparse data, leading to solutions to the assignment problem exhibiting varying degrees of sparsity. Our experimental results on professional soccer tracking data provides insights on learned features and embeddings, as well as on generalization, sensitivity, and network architectural considerations. Our low approximation errors for learned representations and the interactive performance with retrieval times several magnitudes smaller shows that we outperform previous state of the art.


Author(s):  
Kishlay Jha ◽  
Guangxu Xun ◽  
Aidong Zhang

Abstract Motivation Many real-world biomedical interactions such as ‘gene-disease’, ‘disease-symptom’ and ‘drug-target’ are modeled as a bipartite network structure. Learning meaningful representations for such networks is a fundamental problem in the research area of Network Representation Learning (NRL). NRL approaches aim to translate the network structure into low-dimensional vector representations that are useful to a variety of biomedical applications. Despite significant advances, the existing approaches still have certain limitations. First, a majority of these approaches do not model the unique topological properties of bipartite networks. Consequently, their straightforward application to the bipartite graphs yields unsatisfactory results. Second, the existing approaches typically learn representations from static networks. This is limiting for the biomedical bipartite networks that evolve at a rapid pace, and thus necessitate the development of approaches that can update the representations in an online fashion. Results In this research, we propose a novel representation learning approach that accurately preserves the intricate bipartite structure, and efficiently updates the node representations. Specifically, we design a customized autoencoder that captures the proximity relationship between nodes participating in the bipartite bicliques (2 × 2 sub-graph), while preserving both the global and local structures. Moreover, the proposed structure-preserving technique is carefully interleaved with the central tenets of continual machine learning to design an incremental learning strategy that updates the node representations in an online manner. Taken together, the proposed approach produces meaningful representations with high fidelity and computational efficiency. Extensive experiments conducted on several biomedical bipartite networks validate the effectiveness and rationality of the proposed approach.


Author(s):  
Fenxiao Chen ◽  
Yun-Cheng Wang ◽  
Bin Wang ◽  
C.-C. Jay Kuo

Abstract Research on graph representation learning has received great attention in recent years since most data in real-world applications come in the form of graphs. High-dimensional graph data are often in irregular forms. They are more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several stat-of-the-art methods against small and large data sets and compare their performance. Finally, potential applications and future directions are presented.


2020 ◽  
Vol 34 (04) ◽  
pp. 3357-3364
Author(s):  
Abdulkadir Celikkanat ◽  
Fragkiskos D. Malliaros

Representing networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional Skip-Gram model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks.


2020 ◽  
Vol 34 (03) ◽  
pp. 2950-2958
Author(s):  
Guanglin Niu ◽  
Yongfei Zhang ◽  
Bo Li ◽  
Peng Cui ◽  
Si Liu ◽  
...  

Representation learning on a knowledge graph (KG) is to embed entities and relations of a KG into low-dimensional continuous vector spaces. Early KG embedding methods only pay attention to structured information encoded in triples, which would cause limited performance due to the structure sparseness of KGs. Some recent attempts consider paths information to expand the structure of KGs but lack explainability in the process of obtaining the path representations. In this paper, we propose a novel Rule and Path-based Joint Embedding (RPJE) scheme, which takes full advantage of the explainability and accuracy of logic rules, the generalization of KG embedding as well as the supplementary semantic structure of paths. Specifically, logic rules of different lengths (the number of relations in rule body) in the form of Horn clauses are first mined from the KG and elaborately encoded for representation learning. Then, the rules of length 2 are applied to compose paths accurately while the rules of length 1 are explicitly employed to create semantic associations among relations and constrain relation embeddings. Moreover, the confidence level of each rule is also considered in optimization to guarantee the availability of applying the rule to representation learning. Extensive experimental results illustrate that RPJE outperforms other state-of-the-art baselines on KG completion task, which also demonstrate the superiority of utilizing logic rules as well as paths for improving the accuracy and explainability of representation learning.


2020 ◽  
Vol 34 (04) ◽  
pp. 4132-4139
Author(s):  
Huiting Hong ◽  
Hantao Guo ◽  
Yucheng Lin ◽  
Xiaoqing Yang ◽  
Zang Li ◽  
...  

In this paper, we focus on graph representation learning of heterogeneous information network (HIN), in which various types of vertices are connected by various types of relations. Most of the existing methods conducted on HIN revise homogeneous graph embedding models via meta-paths to learn low-dimensional vector space of HIN. In this paper, we propose a novel Heterogeneous Graph Structural Attention Neural Network (HetSANN) to directly encode structural information of HIN without meta-path and achieve more informative representations. With this method, domain experts will not be needed to design meta-path schemes and the heterogeneous information can be processed automatically by our proposed model. Specifically, we implicitly represent heterogeneous information using the following two methods: 1) we model the transformation between heterogeneous vertices through a projection in low-dimensional entity spaces; 2) afterwards, we apply the graph neural network to aggregate multi-relational information of projected neighborhood by means of attention mechanism. We also present three extensions of HetSANN, i.e., voices-sharing product attention for the pairwise relationships in HIN, cycle-consistency loss to retain the transformation between heterogeneous entity spaces, and multi-task learning with full use of information. The experiments conducted on three public datasets demonstrate that our proposed models achieve significant and consistent improvements compared to state-of-the-art solutions.


2022 ◽  
Vol 3 (1) ◽  
pp. 1-26
Author(s):  
Omid Hajihassani ◽  
Omid Ardakanian ◽  
Hamzeh Khazaei

The abundance of data collected by sensors in Internet of Things devices and the success of deep neural networks in uncovering hidden patterns in time series data have led to mounting privacy concerns. This is because private and sensitive information can be potentially learned from sensor data by applications that have access to this data. In this article, we aim to examine the tradeoff between utility and privacy loss by learning low-dimensional representations that are useful for data obfuscation. We propose deterministic and probabilistic transformations in the latent space of a variational autoencoder to synthesize time series data such that intrusive inferences are prevented while desired inferences can still be made with sufficient accuracy. In the deterministic case, we use a linear transformation to move the representation of input data in the latent space such that the reconstructed data is likely to have the same public attribute but a different private attribute than the original input data. In the probabilistic case, we apply the linear transformation to the latent representation of input data with some probability. We compare our technique with autoencoder-based anonymization techniques and additionally show that it can anonymize data in real time on resource-constrained edge devices.


2021 ◽  
Author(s):  
Chen Qiao ◽  
Yuanhua Huang

RNA velocity is a promising technique to reveal transient cellular dynamics among a heterogeneous cell population and quantify their transitions from single-cell transcriptome experiments. However, the cell transitions estimated from high dimensional RNA velocity are often unstable or inaccurate, partly due to the high technical noise and less informative projection. Here, we present VeloAE, a tailored representation learning method to learn a low-dimensional representation of RNA velocity on which cell transitions can be robustly estimated. From various experimental datasets, we show that VeloAE can both accurately identify stimulation dynamics in time-series designs and effectively capture the expected cellular differentiation in different biological systems. VeloAE therefore enhances the usefulness of RNA velocity for studying a wide range of biological processes.


Sign in / Sign up

Export Citation Format

Share Document