Adversarial Graph Embedding for Ensemble Clustering

Ensemble clustering generally integrates basic partitions into a consensus one through a graph partitioning method, which, however, has two limitations: 1) it neglects to reuse original features; 2) obtaining consensus partition with learnable graph representations is still under-explored. In this paper, we propose a novel Adversarial Graph Auto-Encoders (AGAE) model to incorporate ensemble clustering into a deep graph embedding process. Specifically, graph convolutional network is adopted as probabilistic encoder to jointly integrate the information from feature content and consensus graph, and a simple inner product layer is used as decoder to reconstruct graph with the encoded latent variables (i.e., embedding representations). Moreover, we develop an adversarial regularizer to guide the network training with an adaptive partition-dependent prior. Experiments on eight real-world datasets are presented to show the effectiveness of AGAE over several state-of-the-art deep embedding and ensemble clustering methods.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks for Fake News Detection

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3451215 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-23

Author(s):

Shengsheng Qian ◽

Jun Hu ◽

Quan Fang ◽

Changsheng Xu

Keyword(s):

Social Media ◽

Visual Information ◽

Representation Learning ◽

Fake News ◽

Unified Framework ◽

Model Learning ◽

Convolutional Network ◽

Textual Information ◽

Convolutional Networks ◽

Real World Datasets

In this article, we focus on fake news detection task and aim to automatically identify the fake news from vast amount of social media posts. To date, many approaches have been proposed to detect fake news, which includes traditional learning methods and deep learning-based models. However, there are three existing challenges: (i) How to represent social media posts effectively, since the post content is various and highly complicated; (ii) how to propose a data-driven method to increase the flexibility of the model to deal with the samples in different contexts and news backgrounds; and (iii) how to fully utilize the additional auxiliary information (the background knowledge and multi-modal information) of posts for better representation learning. To tackle the above challenges, we propose a novel Knowledge-aware Multi-modal Adaptive Graph Convolutional Networks (KMAGCN) to capture the semantic representations by jointly modeling the textual information, knowledge concepts, and visual information into a unified framework for fake news detection. We model posts as graphs and use a knowledge-aware multi-modal adaptive graph learning principal for the effective feature learning. Compared with existing methods, the proposed KMAGCN addresses challenges from three aspects: (1) It models posts as graphs to capture the non-consecutive and long-range semantic relations; (2) it proposes a novel adaptive graph convolutional network to handle the variability of graph data; and (3) it leverages textual information, knowledge concepts and visual information jointly for model learning. We have conducted extensive experiments on three public real-world datasets and superior results demonstrate the effectiveness of KMAGCN compared with other state-of-the-art algorithms.

Download Full-text

Unsupervised Neural Aspect Extraction with Sememes

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/712 ◽

2019 ◽

Cited By ~ 3

Author(s):

Ling Luo ◽

Xiang Ao ◽

Yan Song ◽

Jinyao Li ◽

Xiaopeng Yang ◽

...

Keyword(s):

Real World ◽

Latent Variables ◽

Lexical Semantics ◽

Word Meanings ◽

Lexical Semantic ◽

Aspect Extraction ◽

Real World Datasets ◽

Semantic Resources

Aspect extraction relies on identifying aspects by discovering coherence among words, which is challenging when word meanings are diversified and processing on short texts. To enhance the performance on aspect extraction, leveraging lexical semantic resources is a possible solution to such challenge. In this paper, we present an unsupervised neural framework that leverages sememes to enhance lexical semantics. The overall framework is analogous to an autoenoder which reconstructs sentence representations and learns aspects by latent variables. Two models that form sentence representations are proposed by exploiting sememes via (1) a hierarchical attention; (2) a context-enhanced attention. Experiments on two real-world datasets demonstrate the validity and the effectiveness of our models, which significantly outperforms existing baselines.

Download Full-text

Passenger Mobility Prediction via Representation Learning for Dynamic Directed and Weighted Graphs

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3446344 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-25

Author(s):

Yuandong Wang ◽

Hongzhi Yin ◽

Tong Chen ◽

Chunyang Liu ◽

Ben Wang ◽

...

Keyword(s):

Fundamental Problem ◽

Route Planning ◽

Representation Learning ◽

Data Representation ◽

Weighted Graphs ◽

Spatial And Temporal Patterns ◽

Graph Representations ◽

Demand Prediction ◽

Passenger Demand ◽

Real World Datasets

In recent years, ride-hailing services have been increasingly prevalent, as they provide huge convenience for passengers. As a fundamental problem, the timely prediction of passenger demands in different regions is vital for effective traffic flow control and route planning. As both spatial and temporal patterns are indispensable passenger demand prediction, relevant research has evolved from pure time series to graph-structured data for modeling historical passenger demand data, where a snapshot graph is constructed for each time slot by connecting region nodes via different relational edges (origin-destination relationship, geographical distance, etc.). Consequently, the spatiotemporal passenger demand records naturally carry dynamic patterns in the constructed graphs, where the edges also encode important information about the directions and volume (i.e., weights) of passenger demands between two connected regions. aspects in the graph-structure data. representation for DDW is the key to solve the prediction problem. However, existing graph-based solutions fail to simultaneously consider those three crucial aspects of dynamic, directed, and weighted graphs, leading to limited expressiveness when learning graph representations for passenger demand prediction. Therefore, we propose a novel spatiotemporal graph attention network, namely Gallat ( G raph prediction with all at tention) as a solution. In Gallat, by comprehensively incorporating those three intrinsic properties of dynamic directed and weighted graphs, we build three attention layers to fully capture the spatiotemporal dependencies among different regions across all historical time slots. Moreover, the model employs a subtask to conduct pretraining so that it can obtain accurate results more quickly. We evaluate the proposed model on real-world datasets, and our experimental results demonstrate that Gallat outperforms the state-of-the-art approaches.

Download Full-text

Joint Geosequential Preference and Distance Metric Factorization for Point-of-Interest Recommendation

Mathematical Problems in Engineering ◽

10.1155/2020/6582676 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Chunyang Liu ◽

Chao Liu ◽

Haiqiang Xin ◽

Jian Wang ◽

Jiping Liu ◽

...

Keyword(s):

Metric Space ◽

Matrix Factorization ◽

Euclidean Distance ◽

Large Scale ◽

Inner Product ◽

Interaction Matrix ◽

Distance Metric ◽

Point Of Interest ◽

Poi Recommendation ◽

Real World Datasets

Point-of-interest (POI) recommendation is a valuable service to help users discover attractive locations in location-based social networks (LBSNs). It focuses on capturing users’ movement patterns and location preferences by using massive historical check-in data. In the past decade, matrix factorization has become a mature and widely used technology in POI recommendation. However, the inner product of latent vectors adopted in matrix factorization methods does not satisfy the triangle inequality property, which may limit the expressiveness and lead to suboptimal solutions. Besides, the extreme sparsity of check-in data makes it challenging to capture users’ movement preferences accurately. In this paper, we propose a joint geosequential preference and distance metric factorization framework, called GeoSeDMF, for POI recommendation. First, we introduce a distance metric factorization method that is capable of learning users’ personalized preferences from a position and distance perspective in the metric space. Specifically, we convert the user-POI interaction matrix into a distance matrix and factorize it into user and POI dense embeddings. Additionally, we measure users’ personalized preference for the POI by using the Euclidean distance metric instead of the inner product. Then, we model the users’ geospatial preference by applying a geographic weight coefficient and model the users’ sequential preference by using the Euclidean distance of continuous check-in locations. Moreover, a pointwise loss strategy and AdaGrad algorithm are adopted to optimize the positions and relationships of users and POIs in a metric space. Finally, experimental results on three large-scale real-world datasets demonstrate the effectiveness and superiority of the proposed method.

Download Full-text

Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa136 ◽

2020 ◽

Vol 27 (10) ◽

pp. 1538-1546 ◽

Cited By ~ 1

Author(s):

Yuqing Mao ◽

Kin Wah Fung

Keyword(s):

Word Sense Disambiguation ◽

Graph Embedding ◽

Semantic Relatedness ◽

Word Sense ◽

Medical Subject Headings ◽

Network Graph ◽

Convolutional Network ◽

Language System ◽

Unified Medical Language System ◽

Medical Language

Abstract Objective The study sought to explore the use of deep learning techniques to measure the semantic relatedness between Unified Medical Language System (UMLS) concepts. Materials and Methods Concept sentence embeddings were generated for UMLS concepts by applying the word embedding models BioWordVec and various flavors of BERT to concept sentences formed by concatenating UMLS terms. Graph embeddings were generated by the graph convolutional networks and 4 knowledge graph embedding models, using graphs built from UMLS hierarchical relations. Semantic relatedness was measured by the cosine between the concepts’ embedding vectors. Performance was compared with 2 traditional path-based (shortest path and Leacock-Chodorow) measurements and the publicly available concept embeddings, cui2vec, generated from large biomedical corpora. The concept sentence embeddings were also evaluated on a word sense disambiguation (WSD) task. Reference standards used included the semantic relatedness and semantic similarity datasets from the University of Minnesota, concept pairs generated from the Standardized MedDRA Queries and the MeSH (Medical Subject Headings) WSD corpus. Results Sentence embeddings generated by BioWordVec outperformed all other methods used individually in semantic relatedness measurements. Graph convolutional network graph embedding uniformly outperformed path-based measurements and was better than some word embeddings for the Standardized MedDRA Queries dataset. When used together, combined word and graph embedding achieved the best performance in all datasets. For WSD, the enhanced versions of BERT outperformed BioWordVec. Conclusions Word and graph embedding techniques can be used to harness terms and relations in the UMLS to measure semantic relatedness between concepts. Concept sentence embedding outperforms path-based measurements and cui2vec, and can be further enhanced by combining with graph embedding.

Download Full-text

Adaptive Weighted Graph Fusion Incomplete Multi-View Subspace Clustering

Sensors ◽

10.3390/s20205755 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5755

Author(s):

Pei Zhang ◽

Siwei Wang ◽

Jingtao Hu ◽

Zhen Cheng ◽

Xifeng Guo ◽

...

Keyword(s):

Feature Extraction ◽

Complete Graph ◽

Subspace Clustering ◽

Weighted Graph ◽

Original Data ◽

Clustering Methods ◽

Unified Framework ◽

Research Attention ◽

Hardware Failure ◽

Real World Datasets

With the enormous amount of multi-source data produced by various sensors and feature extraction approaches, multi-view clustering (MVC) has attracted developing research attention and is widely exploited in data analysis. Most of the existing multi-view clustering methods hold on the assumption that all of the views are complete. However, in many real scenarios, multi-view data are often incomplete for many reasons, e.g., hardware failure or incomplete data collection. In this paper, we propose an adaptive weighted graph fusion incomplete multi-view subspace clustering (AWGF-IMSC) method to solve the incomplete multi-view clustering problem. Firstly, to eliminate the noise existing in the original space, we transform complete original data into latent representations which contribute to better graph construction for each view. Then, we incorporate feature extraction and incomplete graph fusion into a unified framework, whereas two processes can negotiate with each other, serving for graph learning tasks. A sparse regularization is imposed on the complete graph to make it more robust to the view-inconsistency. Besides, the importance of different views is automatically learned, further guiding the construction of the complete graph. An effective iterative algorithm is proposed to solve the resulting optimization problem with convergence. Compared with the existing state-of-the-art methods, the experiment results on several real-world datasets demonstrate the effectiveness and advancement of our proposed method.

Download Full-text

Layout of Embedding Circulant Networks into Linear Hexagons and Phenylenes

Journal of Interconnection Networks ◽

10.1142/s0219265913500102 ◽

2013 ◽

Vol 14 (03) ◽

pp. 1350010

Author(s):

INDRA RAJASINGH ◽

MICHEAL AROCKIARAJ

Keyword(s):

Interconnection Networks ◽

Vlsi Design ◽

Graph Embedding ◽

Chemical Compounds ◽

Theoretical Chemistry ◽

Telecommunication Networks ◽

Benzenoid Hydrocarbons ◽

Graph Representations ◽

Host Graph ◽

Circulant Networks

Circulant network has been used for decades in the design of computer and telecommunication networks due to optimal fault-tolerance and routing capabilities. Further, it has been used in VLSI design and distributed computation. Hexagonal chains are of great importance of theoretical chemistry because they are the natural graph representations of benzenoid hydrocarbons, a great deal of investigations in mathematical chemistry has been developed to hexagonal chains. Hexagonal chains are exclusively constructed by hexagons of length one. Phenylenes are a class of chemical compounds in which carbon atoms form 6 and 4 membered cycles. Graph embedding has been known as a powerful tool for implementation of parallel algorithms or simulation of different interconnection networks. An embedding f of a guest graph G into a host graph H is a bijection on the vertices such that each edge of G is mapped into a path of H. The wirelength (layout) of this embedding is defined to be the sum of the length of the paths corresponding to the edges of G. In this paper we obtain the minimum wirelength of embedding circulant networks into linear hexagonal chains and linear phenylenes. Further we discuss the embedding of faulty circulant networks into linear hexagonal chains and linear phenylenes.

Download Full-text

Quaternion Collaborative Filtering for Recommendation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/599 ◽

2019 ◽

Cited By ~ 3

Author(s):

Shuai Zhang ◽

Lina Yao ◽

Lucas Vinh Tran ◽

Aston Zhang ◽

Yi Tay

Keyword(s):

Collaborative Filtering ◽

Quaternion Algebra ◽

Wide Spectrum ◽

Representation Learning ◽

Real Space ◽

Inner Product ◽

Hypercomplex Numbers ◽

Learning Capability ◽

Inductive Bias ◽

Real World Datasets

This paper proposes Quaternion Collaborative Filtering (QCF), a novel representation learning method for recommendation. Our proposed QCF relies on and exploits computation with Quaternion algebra, benefiting from the expressiveness and rich representation learning capability of Hamilton products. Quaternion representations, based on hypercomplex numbers, enable rich inter-latent dependencies between imaginary components. This encourages intricate relations to be captured when learning user-item interactions, serving as a strong inductive bias as compared with the real-space inner product. All in all, we conduct extensive experiments on six real-world datasets, demonstrating the effectiveness of Quaternion algebra in recommender systems. The results exhibit that QCF outperforms a wide spectrum of strong neural baselines on all datasets. Ablative experiments confirm the effectiveness of Hamilton-based composition over multi-embedding composition in real space.

Download Full-text

Deconvolute individual genomes from metagenome sequences through short read clustering

PeerJ ◽

10.7717/peerj.8966 ◽

2020 ◽

Vol 8 ◽

pp. e8966 ◽

Cited By ~ 1

Author(s):

Kexue Li ◽

Yakang Lu ◽

Li Deng ◽

Lili Wang ◽

Lizhen Shi ◽

...

Keyword(s):

Large Scale ◽

False Negative ◽

Next Generation Sequencing Data ◽

Clustering Methods ◽

Sequencing Data ◽

Short Reads ◽

Clustering Problem ◽

Metagenome Assembly ◽

Real World Datasets ◽

Almost All

Metagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads by species before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems. Here we extended our previous read clustering software, SpaRC, by exploiting statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using synthetic and real-world datasets we demonstrated that this method has the potential to cluster almost all of the short reads from genomes with sufficient sequencing coverage. The improved read clustering in turn leads to improved downstream genome assembly quality.

Download Full-text