Graph Mining Meets Crowdsourcing: Extracting Experts for Answer Aggregation

Aggregating responses from crowd workers is a fundamental task in the process of crowdsourcing. In cases where a few experts are overwhelmed by a large number of non-experts, most answer aggregation algorithms such as the majority voting fail to identify the correct answers. Therefore, it is crucial to extract reliable experts from the crowd workers. In this study, we introduce the notion of "expert core", which is a set of workers that is very unlikely to contain a non-expert. We design a graph-mining-based efficient algorithm that exactly computes the expert core. To answer the aggregation task, we propose two types of algorithms. The first one incorporates the expert core into existing answer aggregation algorithms such as the majority voting, whereas the second one utilizes information provided by the expert core extraction algorithm pertaining to the reliability of workers. We then give a theoretical justification for the first type of algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our proposed answer aggregation algorithms outperform state-of-the-art algorithms.

Download Full-text

Adaptive Double-Exploration Tradeoff for Outlier Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6164 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6837-6844

Author(s):

Xiaojin Zhang ◽

Honglei Zhuang ◽

Shengyu Zhang ◽

Yuan Zhou

Keyword(s):

Confidence Interval ◽

Outlier Detection ◽

Real World ◽

Efficient Algorithm ◽

Experimental Results ◽

Sample Complexity ◽

Bandit Problem ◽

Real World Datasets ◽

Synthetic Datasets ◽

The Individual

We study a variant of the thresholding bandit problem (TBP) in the context of outlier detection, where the objective is to identify the outliers whose rewards are above a threshold. Distinct from the traditional TBP, the threshold is defined as a function of the rewards of all the arms, which is motivated by the criterion for identifying outliers. The learner needs to explore the rewards of the arms as well as the threshold. We refer to this problem as "double exploration for outlier detection". We construct an adaptively updated confidence interval for the threshold, based on the estimated value of the threshold in the previous rounds. Furthermore, by automatically trading off exploring the individual arms and exploring the outlier threshold, we provide an efficient algorithm in terms of the sample complexity. Experimental results on both synthetic datasets and real-world datasets demonstrate the efficiency of our algorithm.

Download Full-text

Efficient Heterogeneous Collaborative Filtering without Negative Sampling for Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5329 ◽

2020 ◽

Vol 34 (01) ◽

pp. 19-26 ◽

Cited By ~ 5

Author(s):

Chong Chen ◽

Min Zhang ◽

Yongfeng Zhang ◽

Weizhi Ma ◽

Yiqun Liu ◽

...

Keyword(s):

Collaborative Filtering ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Heterogeneous Data ◽

Model Parameters ◽

Online Systems ◽

Practical Applications ◽

Real World Datasets ◽

Primary Type

Recent studies on recommendation have largely focused on exploring state-of-the-art neural networks to improve the expressiveness of models, while typically apply the Negative Sampling (NS) strategy for efficient learning. Despite effectiveness, two important issues have not been well-considered in existing methods: 1) NS suffers from dramatic fluctuation, making sampling-based methods difficult to achieve the optimal ranking performance in practical applications; 2) although heterogeneous feedback (e.g., view, click, and purchase) is widespread in many online systems, most existing methods leverage only one primary type of user feedback such as purchase. In this work, we propose a novel non-sampling transfer learning solution, named Efficient Heterogeneous Collaborative Filtering (EHCF) for Top-N recommendation. It can not only model fine-grained user-item relations, but also efficiently learn model parameters from the whole heterogeneous data (including all unlabeled data) with a rather low time complexity. Extensive experiments on three real-world datasets show that EHCF significantly outperforms state-of-the-art recommendation methods in both traditional (single-behavior) and heterogeneous scenarios. Moreover, EHCF shows significant improvements in training efficiency, making it more applicable to real-world large-scale systems. Our implementation has been released 1 to facilitate further developments on efficient whole-data based neural methods.

Download Full-text

Discrete Trust-aware Matrix Factorization for Fast Recommendation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/191 ◽

2019 ◽

Author(s):

Guibing Guo ◽

Enneng Yang ◽

Li Shen ◽

Xiaochun Yang ◽

Xiaodong He

Keyword(s):

Social Influence ◽

Collaborative Filtering ◽

Recommender Systems ◽

Social Relations ◽

Real World ◽

Matrix Factorization ◽

State Of The Art ◽

Proposed Model ◽

Hamming Space ◽

Real World Datasets

Trust-aware recommender systems have received much attention recently for their abilities to capture the influence among connected users. However, they suffer from the efficiency issue due to large amount of data and time-consuming real-valued operations. Although existing discrete collaborative filtering may alleviate this issue to some extent, it is unable to accommodate social influence. In this paper we propose a discrete trust-aware matrix factorization (DTMF) model to take dual advantages of both social relations and discrete technique for fast recommendation. Specifically, we map the latent representation of users and items into a joint hamming space by recovering the rating and trust interactions between users and items. We adopt a sophisticated discrete coordinate descent (DCD) approach to optimize our proposed model. In addition, experiments on two real-world datasets demonstrate the superiority of our approach against other state-of-the-art approaches in terms of ranking accuracy and efficiency.

Download Full-text

EA Reader: Enhance Attentive Reader for Cloze-Style Question Answering via Multi-Space Context Fusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016375 ◽

2019 ◽

Vol 33 ◽

pp. 6375-6382

Author(s):

Chengzhen Fu ◽

Yan Zhang

Keyword(s):

Real World ◽

Question Answering ◽

State Of The Art ◽

Unified Model ◽

Inference Process ◽

Context Vector ◽

Attentive Reader ◽

Semantic Spaces ◽

Real World Datasets

Query-document semantic interactions are essential for the success of many cloze-style question answering models. Recently, researchers have proposed several attention-based methods to predict the answer by focusing on appropriate subparts of the context document. In this paper, we design a novel module to produce the query-aware context vector, named Multi-Space based Context Fusion (MSCF), with the following considerations: (1) interactions are applied across multiple latent semantic spaces; (2) attention is measured at bit level, not at token level. Moreover, we extend MSCF to the multi-hop architecture. This unified model is called Enhanced Attentive Reader (EA Reader). During the iterative inference process, the reader is equipped with a novel memory update rule and maintains the understanding of documents through read, update and write operations. We conduct extensive experiments on four real-world datasets. Our results demonstrate that EA Reader outperforms state-of-the-art models.

Download Full-text

Exploring Periodicity and Interactivity in Multi-Interest Framework for Sequential Recommendation

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/197 ◽

2021 ◽

Author(s):

Gaode Chen ◽

Xinghua Zhang ◽

Yanyan Zhao ◽

Cong Xue ◽

Ji Xiang

Keyword(s):

Real World ◽

Information Overload ◽

State Of The Art ◽

Recommendation Systems ◽

Time Interval ◽

Interest Representation ◽

Novel Method ◽

Real World Datasets ◽

Item Representation ◽

Global And Local

Sequential recommendation systems alleviate the problem of information overload, and have attracted increasing attention in the literature. Most prior works usually obtain an overall representation based on the user’s behavior sequence, which can not sufficiently reflect the multiple interests of the user. To this end, we propose a novel method called PIMI to mitigate this issue. PIMI can model the user’s multi-interest representation effectively by considering both the periodicity and interactivity in the item sequence. Specifically, we design a periodicity-aware module to utilize the time interval information between user’s behaviors. Meanwhile, an ingenious graph is proposed to enhance the interactivity between items in user’s behavior sequence, which can capture both global and local item features. Finally, a multi-interest extraction module is applied to describe user’s multiple interests based on the obtained item representation. Extensive experiments on two real-world datasets Amazon and Taobao show that PIMI outperforms state-of-the-art methods consistently.

Download Full-text

Partial Label Learning with Self-Guided Retraining

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013542 ◽

2019 ◽

Vol 33 ◽

pp. 3542-3549 ◽

Cited By ~ 10

Author(s):

Lei Feng ◽

Bo An

Keyword(s):

Real World ◽

Optimization Problem ◽

State Of The Art ◽

Ground Truth ◽

Learning Approaches ◽

High Confidence ◽

Infinity Norm ◽

Real World Datasets ◽

Partial Label Learning ◽

Optimization Efficiency

Partial label learning deals with the problem where each training instance is assigned a set of candidate labels, only one of which is correct. This paper provides the first attempt to leverage the idea of self-training for dealing with partially labeled examples. Specifically, we propose a unified formulation with proper constraints to train the desired model and perform pseudo-labeling jointly. For pseudo-labeling, unlike traditional self-training that manually differentiates the ground-truth label with enough high confidence, we introduce the maximum infinity norm regularization on the modeling outputs to automatically achieve this consideratum, which results in a convex-concave optimization problem. We show that optimizing this convex-concave problem is equivalent to solving a set of quadratic programming (QP) problems. By proposing an upper-bound surrogate objective function, we turn to solving only one QP problem for improving the optimization efficiency. Extensive experiments on synthesized and real-world datasets demonstrate that the proposed approach significantly outperforms the state-of-the-art partial label learning approaches.

Download Full-text

Multi-Aspect Embedding for Attribute-Aware Trajectories

Symmetry ◽

10.3390/sym11091149 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1149

Author(s):

Thapana Boonchoo ◽

Xiang Ao ◽

Qing He

Keyword(s):

Real World ◽

Execution Time ◽

State Of The Art ◽

Representation Learning ◽

Learning Approach ◽

Trajectory Data ◽

Trajectory Mining ◽

Trajectory Similarity ◽

Effectiveness And Efficiency ◽

Real World Datasets

Motivated by the proliferation of trajectory data produced by advanced GPS-enabled devices, trajectory is gaining in complexity and beginning to embroil additional attributes beyond simply the coordinates. As a consequence, this creates the potential to define the similarity between two attribute-aware trajectories. However, most existing trajectory similarity approaches focus only on location based proximities and fail to capture the semantic similarities encompassed by these additional asymmetric attributes (aspects) of trajectories. In this paper, we propose multi-aspect embedding for attribute-aware trajectories (MAEAT), a representation learning approach for trajectories that simultaneously models the similarities according to their multiple aspects. MAEAT is built upon a sentence embedding algorithm and directly learns whole trajectory embedding via predicting the context aspect tokens when given a trajectory. Two kinds of token generation methods are proposed to extract multiple aspects from the raw trajectories, and a regularization is devised to control the importance among aspects. Extensive experiments on the benchmark and real-world datasets show the effectiveness and efficiency of the proposed MAEAT compared to the state-of-the-art and baseline methods. The results of MAEAT can well support representative downstream trajectory mining and management tasks, and the algorithm outperforms other compared methods in execution time by at least two orders of magnitude.

Download Full-text

TripRec

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015010103 ◽

2015 ◽

Vol 11 (1) ◽

pp. 45-65 ◽

Cited By ~ 1

Author(s):

Heli Sun ◽

Jianbin Huang ◽

Xinwei She ◽

Zhou Yang ◽

Jiao Liu ◽

...

Keyword(s):

Real World ◽

Efficient Algorithm ◽

State Of The Art ◽

Synthetic Data ◽

Time Constraints ◽

Data Sets ◽

Generation Process ◽

Time Requirement ◽

Trip Planning ◽

Frequent Item Sets

The problem of trip planning with time constraints aims to find the optimal routes satisfying the maximum time requirement and possessing the highest attraction score. In this paper, a more efficient algorithm TripRec is proposed to solve this problem. Based on the principle of the Aprior algorithm for mining frequent item sets, our method constructs candidate attraction sets containing k attractions by using the join rule on valid sets consisting of k-1 attractions. After all the valid routes from the valid k-1 attraction sets have been obtained, all of the candidate routes for the candidate k-sets can be acquired through a route extension approach. This method exhibits manifest improvement of the efficiency in the valid routes generation process. Then, by determining whether there exists at least one valid route, the paper prunes some candidate attraction sets to gain all the valid sets. The process will continue until no more valid attraction sets can be obtained. In addition, several optimization strategies are employed to greatly enhance the performance of the algorithm. Experimental results on both real-world and synthetic data sets show that our algorithm has the better pruning rate and efficiency compared with the state-of-the-art method.

Download Full-text

Who Are Controlled by The Same User? Multiple Identities Deception Detection via Social Interaction Activity (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7199 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13853-13854

Author(s):

Jiacheng Li ◽

Chunyuan Yuan ◽

Wei Zhou ◽

Jingli Wang ◽

Songlin Hu

Keyword(s):

Social Media ◽

Social Interaction ◽

Social Interactions ◽

Real World ◽

Deception Detection ◽

State Of The Art ◽

Multiple Identities ◽

The Social ◽

Real World Datasets ◽

Multiple Accounts

Social media has become a preferential place for sharing information. However, some users may create multiple accounts and manipulate them to deceive legitimate users. Most previous studies utilize verbal or behavior features based methods to solve this problem, but they are only designed for some particular platforms, leading to low universalness.In this paper, to support multiple platforms, we construct interaction tree for each account based on their social interactions which is common characteristic of social platforms. Then we propose a new method to calculate the social interaction entropy of each account and detect the accounts which are controlled by the same user. Experimental results on two real-world datasets show that the method has robust superiority over state-of-the-art methods.

Download Full-text

A Novel Tagging Augmented LDA Model for Clustering

International Journal of Web Services Research ◽

10.4018/ijwsr.2019070104 ◽

2019 ◽

Vol 16 (3) ◽

pp. 59-77

Author(s):

Yi Zhao ◽

Yu Qiao ◽

Keqing He

Keyword(s):

Transfer Learning ◽

Real World ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Positive Influence ◽

The State ◽

Clustering Methods ◽

Automatic Clustering ◽

Real World Datasets ◽

High Representation

Clustering has become an increasingly important task in the analysis of large documents. Clustering aims to organize these documents, and facilitate better search and knowledge extraction. Most existing clustering methods that use user-generated tags only consider their positive influence for improving automatic clustering performance. The authors argue that not all user-generated tags can provide useful information for clustering. In this article, the authors propose a new solution for clustering, named HRT-LDA (High Representation Tags Latent Dirichlet Allocation), which considers the effects of different tags on clustering performance. For this, the authors perform a tag filtering strategy and a tag appending strategy based on transfer learning, Word2vec, TF-IDF and semantic computing. Extensive experiments on real-world datasets demonstrate that HRT-LDA outperforms the state-of-the-art tagging augmented LDA methods for clustering.

Download Full-text