Attribute-Guided Network Sampling Mechanisms

This article introduces a novel task-independent sampler for attributed networks. The problem is important because while data mining tasks on network content are common, sampling on internet-scale networks is costly. Link-trace samplers such as Snowball sampling, Forest Fire, Random Walk, and Metropolis–Hastings Random Walk are widely used for sampling from networks. The design of these attribute-agnostic samplers focuses on preserving salient properties of network structure, and are not optimized for tasks on node content. This article has three contributions. First, we propose a task-independent, attribute aware link-trace sampler grounded in Information Theory. Our sampler greedily adds to the sample the node with the most informative (i.e., surprising) neighborhood. The sampler tends to rapidly explore the attribute space, maximally reducing the surprise of unseen nodes. Second, we prove that content sampling is an NP-hard problem. A well-known algorithm best approximates the optimization solution within 1 − 1/ e , but requires full access to the entire graph. Third, we show through empirical counterfactual analysis that in many real-world datasets, network structure does not hinder the performance of surprise based link-trace samplers. Experimental results over 18 real-world datasets reveal: surprise-based samplers are sample efficient and outperform the state-of-the-art attribute-agnostic samplers by a wide margin (e.g., 45% performance improvement in clustering tasks).

Download Full-text

An Attentional Recurrent Neural Network for Personalized Next Location Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5337 ◽

2020 ◽

Vol 34 (01) ◽

pp. 83-90

Author(s):

Qing Guo ◽

Zhu Sun ◽

Jie Zhang ◽

Yin-Leng Theng

Keyword(s):

Neural Network ◽

Random Walk ◽

Recurrent Neural Network ◽

Real World ◽

State Of The Art ◽

Knowledge Graph ◽

User Mobility ◽

Location Recommendation ◽

Meta Path ◽

Real World Datasets

Most existing studies on next location recommendation propose to model the sequential regularity of check-in sequences, but suffer from the severe data sparsity issue where most locations have fewer than five following locations. To this end, we propose an Attentional Recurrent Neural Network (ARNN) to jointly model both the sequential regularity and transition regularities of similar locations (neighbors). In particular, we first design a meta-path based random walk over a novel knowledge graph to discover location neighbors based on heterogeneous factors. A recurrent neural network is then adopted to model the sequential regularity by capturing various contexts that govern user mobility. Meanwhile, the transition regularities of the discovered neighbors are integrated via the attention mechanism, which seamlessly cooperates with the sequential regularity as a unified recurrent framework. Experimental results on multiple real-world datasets demonstrate that ARNN outperforms state-of-the-art methods.

Download Full-text

Bootstrapping Entity Alignment with Knowledge Graph Embedding

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/611 ◽

2018 ◽

Cited By ~ 35

Author(s):

Zequn Sun ◽

Wei Hu ◽

Qingheng Zhang ◽

Yuzhong Qu

Keyword(s):

Performance Improvement ◽

Real World ◽

State Of The Art ◽

Graph Embedding ◽

Training Data ◽

Knowledge Graph ◽

Error Accumulation ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Embedding-based entity alignment represents different knowledge graphs (KGs) as low-dimensional embeddings and finds entity alignment by measuring the similarities between entity embeddings. Existing approaches have achieved promising results, however, they are still challenged by the lack of enough prior alignment as labeled training data. In this paper, we propose a bootstrapping approach to embedding-based entity alignment. It iteratively labels likely entity alignment as training data for learning alignment-oriented KG embeddings. Furthermore, it employs an alignment editing method to reduce error accumulation during iterations. Our experiments on real-world datasets showed that the proposed approach significantly outperformed the state-of-the-art embedding-based ones for entity alignment. The proposed alignment-oriented KG embedding, bootstrapping process and alignment editing method all contributed to the performance improvement.

Download Full-text

Multi-view Knowledge Graph Embedding for Entity Alignment

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/754 ◽

2019 ◽

Cited By ~ 22

Author(s):

Qingheng Zhang ◽

Zequn Sun ◽

Wei Hu ◽

Muhao Chen ◽

Lingbing Guo ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

State Of The Art ◽

Graph Embedding ◽

Knowledge Graph ◽

Multiple Views ◽

Combination Strategies ◽

Knowledge Graphs ◽

Real World Datasets ◽

Inference Methods

We study the problem of embedding-based entity alignment between knowledge graphs (KGs). Previous works mainly focus on the relational structure of entities. Some further incorporate another type of features, such as attributes, for refinement. However, a vast of entity features are still unexplored or not equally treated together, which impairs the accuracy and robustness of embedding-based entity alignment. In this paper, we propose a novel framework that unifies multiple views of entities to learn embeddings for entity alignment. Specifically, we embed entities based on the views of entity names, relations and attributes, with several combination strategies. Furthermore, we design some cross-KG inference methods to enhance the alignment between two KGs. Our experiments on real-world datasets show that the proposed framework significantly outperforms the state-of-the-art embedding-based entity alignment methods. The selected views, cross-KG inference and combination strategies all contribute to the performance improvement.

Download Full-text

Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5329 ◽

2020 ◽

Vol 34 (01) ◽

pp. 19-26 ◽

Cited By ~ 5

Author(s):

Chong Chen ◽

Min Zhang ◽

Yongfeng Zhang ◽

Weizhi Ma ◽

Yiqun Liu ◽

...

Keyword(s):

Collaborative Filtering ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Heterogeneous Data ◽

Model Parameters ◽

Online Systems ◽

Practical Applications ◽

Real World Datasets ◽

Primary Type

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

Download Full-text

Community detection via closure extension

International Journal of Modern Physics C ◽

10.1142/s012918311850119x ◽

2018 ◽

Vol 29 (12) ◽

pp. 1850119

Author(s):

Jingming Zhang ◽

Jianjun Cheng ◽

Xiaosu Feng ◽

Xiaoyun Chen

Keyword(s):

Community Structure ◽

Computational Complexity ◽

Community Detection ◽

Network Structure ◽

Real World ◽

Prior Information ◽

State Of The Art ◽

Local Information ◽

Second Step ◽

Novel Method

Identifying community structure in networks plays an important role in understanding the network structure and analyzing the network features. Many state-of-the-art algorithms have been proposed to identify the community structure in networks. In this paper, we propose a novel method based on closure extension; it performs in two steps. The first step uses the similarity closure or correlation closure to find the initial community structure. In the second step, we merge the initial communities using Modularity [Formula: see text]. The proposed method does not need any prior information such as the number or sizes of communities, and it is able to obtain the same resulting communities in multiple runs. Moreover, it is noteworthy that our method has low computational complexity because of considering only local information of network. Some real-world and synthetic graphs are used to test the performance of the proposed method. The results demonstrate that our method can detect deterministic and informative community structure in most cases.

Download Full-text

Scene text removal via cascaded text stroke detection and erasing

Computational Visual Media ◽

10.1007/s41095-021-0242-8 ◽

2021 ◽

Vol 8 (2) ◽

pp. 273-287

Author(s):

Xuewei Bian ◽

Chaoqun Wang ◽

Weize Quan ◽

Juntao Ye ◽

Xiaopeng Zhang ◽

...

Keyword(s):

Performance Improvement ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Processing Unit ◽

Final Model ◽

Scene Text ◽

End To End

AbstractRecent learning-based approaches show promising performance improvement for the scene text removal task but usually leave several remnants of text and provide visually unpleasant results. In this work, a novel end-to-end framework is proposed based on accurate text stroke detection. Specifically, the text removal problem is decoupled into text stroke detection and stroke removal; we design separate networks to solve these two subproblems, the latter being a generative network. These two networks are combined as a processing unit, which is cascaded to obtain our final model for text removal. Experimental results demonstrate that the proposed method substantially outperforms the state-of-the-art for locating and erasing scene text. A new large-scale real-world dataset with 12,120 images has been constructed and is being made available to facilitate research, as current publicly available datasets are mainly synthetic so cannot properly measure the performance of different methods.

Download Full-text

Discrete Trust-aware Matrix Factorization for Fast Recommendation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/191 ◽

2019 ◽

Author(s):

Guibing Guo ◽

Enneng Yang ◽

Li Shen ◽

Xiaochun Yang ◽

Xiaodong He

Keyword(s):

Social Influence ◽

Collaborative Filtering ◽

Recommender Systems ◽

Social Relations ◽

Real World ◽

Matrix Factorization ◽

State Of The Art ◽

Proposed Model ◽

Hamming Space ◽

Real World Datasets

Trust-aware recommender systems have received much attention recently for their abilities to capture the influence among connected users. However, they suffer from the efficiency issue due to large amount of data and time-consuming real-valued operations. Although existing discrete collaborative filtering may alleviate this issue to some extent, it is unable to accommodate social influence. In this paper we propose a discrete trust-aware matrix factorization (DTMF) model to take dual advantages of both social relations and discrete technique for fast recommendation. Specifically, we map the latent representation of users and items into a joint hamming space by recovering the rating and trust interactions between users and items. We adopt a sophisticated discrete coordinate descent (DCD) approach to optimize our proposed model. In addition, experiments on two real-world datasets demonstrate the superiority of our approach against other state-of-the-art approaches in terms of ranking accuracy and efficiency.

Download Full-text

EA Reader: Enhance Attentive Reader for Cloze-Style Question Answering via Multi-Space Context Fusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016375 ◽

2019 ◽

Vol 33 ◽

pp. 6375-6382

Author(s):

Chengzhen Fu ◽

Yan Zhang

Keyword(s):

Real World ◽

Question Answering ◽

State Of The Art ◽

Unified Model ◽

Inference Process ◽

Context Vector ◽

Attentive Reader ◽

Semantic Spaces ◽

Real World Datasets

Query-document semantic interactions are essential for the success of many cloze-style question answering models. Recently, researchers have proposed several attention-based methods to predict the answer by focusing on appropriate subparts of the context document. In this paper, we design a novel module to produce the query-aware context vector, named Multi-Space based Context Fusion (MSCF), with the following considerations: (1) interactions are applied across multiple latent semantic spaces; (2) attention is measured at bit level, not at token level. Moreover, we extend MSCF to the multi-hop architecture. This unified model is called Enhanced Attentive Reader (EA Reader). During the iterative inference process, the reader is equipped with a novel memory update rule and maintains the understanding of documents through read, update and write operations. We conduct extensive experiments on four real-world datasets. Our results demonstrate that EA Reader outperforms state-of-the-art models.

Download Full-text

Exploring Periodicity and Interactivity in Multi-Interest Framework for Sequential Recommendation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/197 ◽

2021 ◽

Author(s):

Gaode Chen ◽

Xinghua Zhang ◽

Yanyan Zhao ◽

Cong Xue ◽

Ji Xiang

Keyword(s):

Real World ◽

Information Overload ◽

State Of The Art ◽

Recommendation Systems ◽

Time Interval ◽

Interest Representation ◽

Novel Method ◽

Real World Datasets ◽

Item Representation ◽

Global And Local

Sequential recommendation systems alleviate the problem of information overload, and have attracted increasing attention in the literature. Most prior works usually obtain an overall representation based on the user’s behavior sequence, which can not sufficiently reflect the multiple interests of the user. To this end, we propose a novel method called PIMI to mitigate this issue. PIMI can model the user’s multi-interest representation effectively by considering both the periodicity and interactivity in the item sequence. Specifically, we design a periodicity-aware module to utilize the time interval information between user’s behaviors. Meanwhile, an ingenious graph is proposed to enhance the interactivity between items in user’s behavior sequence, which can capture both global and local item features. Finally, a multi-interest extraction module is applied to describe user’s multiple interests based on the obtained item representation. Extensive experiments on two real-world datasets Amazon and Taobao show that PIMI outperforms state-of-the-art methods consistently.

Download Full-text

Partial Label Learning with Self-Guided Retraining

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013542 ◽

2019 ◽

Vol 33 ◽

pp. 3542-3549 ◽

Cited By ~ 10

Author(s):

Lei Feng ◽

Bo An

Keyword(s):

Real World ◽

Optimization Problem ◽

State Of The Art ◽

Ground Truth ◽

Learning Approaches ◽

High Confidence ◽

Infinity Norm ◽

Real World Datasets ◽

Partial Label Learning ◽

Optimization Efficiency

Partial label learning deals with the problem where each training instance is assigned a set of candidate labels, only one of which is correct. This paper provides the first attempt to leverage the idea of self-training for dealing with partially labeled examples. Specifically, we propose a unified formulation with proper constraints to train the desired model and perform pseudo-labeling jointly. For pseudo-labeling, unlike traditional self-training that manually differentiates the ground-truth label with enough high confidence, we introduce the maximum infinity norm regularization on the modeling outputs to automatically achieve this consideratum, which results in a convex-concave optimization problem. We show that optimizing this convex-concave problem is equivalent to solving a set of quadratic programming (QP) problems. By proposing an upper-bound surrogate objective function, we turn to solving only one QP problem for improving the optimization efficiency. Extensive experiments on synthesized and real-world datasets demonstrate that the proposed approach significantly outperforms the state-of-the-art partial label learning approaches.

Download Full-text