Distributed Pareto Optimization for Subset Selection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/207 ◽

2018 ◽

Cited By ~ 2

Author(s):

Chao Qian ◽

Guiying Li ◽

Chao Feng ◽

Ke Tang

Keyword(s):

Real World ◽

Large Scale ◽

State Of The Art ◽

Subset Selection ◽

Data Sets ◽

Mapreduce Framework ◽

Real World Data ◽

Real World Applications ◽

Approximation Guarantee ◽

Better Than

The subset selection problem that selects a few items from a ground set arises in many applications such as maximum coverage, influence maximization, sparse regression, etc. The recently proposed POSS algorithm is a powerful approximation solver for this problem. However, POSS requires centralized access to the full ground set, and thus is impractical for large-scale real-world applications, where the ground set is too large to be stored on one single machine. In this paper, we propose a distributed version of POSS (DPOSS) with a bounded approximation guarantee. DPOSS can be easily implemented in the MapReduce framework. Our extensive experiments using Spark, on various real-world data sets with size ranging from thousands to millions, show that DPOSS can achieve competitive performance compared with the centralized POSS, and is almost always better than the state-of-the-art distributed greedy algorithm RandGreeDi.

Download Full-text

Co-GCN for Multi-View Semi-Supervised Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5901 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4691-4698

Author(s):

Shu Li ◽

Wen-Tao Li ◽

Wei Wang

Keyword(s):

Supervised Learning ◽

Real World ◽

State Of The Art ◽

Data Sets ◽

Real World Data ◽

Convolutional Network ◽

The Past ◽

Real World Applications ◽

Supervised Methods ◽

Disjoint Sets

In many real-world applications, the data have several disjoint sets of features and each set is called as a view. Researchers have developed many multi-view learning methods in the past decade. In this paper, we bring Graph Convolutional Network (GCN) into multi-view learning and propose a novel multi-view semi-supervised learning method Co-GCN by adaptively exploiting the graph information from the multiple views with combined Laplacians. Experimental results on real-world data sets verify that Co-GCN can achieve better performance compared with state-of-the-art multi-view semi-supervised methods.

Download Full-text

Oversampling for Imbalanced Data via Optimal Transport

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015605 ◽

2019 ◽

Vol 33 ◽

pp. 5605-5612 ◽

Cited By ~ 1

Author(s):

Yuguang Yan ◽

Mingkui Tan ◽

Yanwu Xu ◽

Jiezhang Cao ◽

Michael Ng ◽

...

Keyword(s):

Real World ◽

Optimal Transport ◽

Imbalanced Data ◽

Data Sets ◽

Similar Distribution ◽

Real World Data ◽

Geometric Information ◽

Minority Class ◽

Real World Applications ◽

Multiple Metrics

The issue of data imbalance occurs in many real-world applications especially in medical diagnosis, where normal cases are usually much more than the abnormal cases. To alleviate this issue, one of the most important approaches is the oversampling method, which seeks to synthesize minority class samples to balance the numbers of different classes. However, existing methods barely consider global geometric information involved in the distribution of minority class samples, and thus may incur distribution mismatching between real and synthetic samples. In this paper, relying on optimal transport (Villani 2008), we propose an oversampling method by exploiting global geometric information of data to make synthetic samples follow a similar distribution to that of minority class samples. Moreover, we introduce a novel regularization based on synthetic samples and shift the distribution of minority class samples according to loss information. Experiments on toy and real-world data sets demonstrate the efficacy of our proposed method in terms of multiple metrics.

Download Full-text

Multi-View Multi-Label Learning with View-Specific Information Extraction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/539 ◽

2019 ◽

Cited By ~ 4

Author(s):

Xuan Wu ◽

Qing-Guo Chen ◽

Yao Hu ◽

Dengbao Wang ◽

Xiaodong Chang ◽

...

Keyword(s):

Information Extraction ◽

Real World ◽

State Of The Art ◽

Specific Information ◽

Learning Approach ◽

Data Sets ◽

Learning Approaches ◽

Real World Data ◽

Learning Techniques ◽

Shared Information

Multi-view multi-label learning serves an important framework to learn from objects with diverse representations and rich semantics. Existing multi-view multi-label learning techniques focus on exploiting shared subspace for fusing multi-view representations, where helpful view-specific information for discriminative modeling is usually ignored. In this paper, a novel multi-view multi-label learning approach named SIMM is proposed which leverages shared subspace exploitation and view-specific information extraction. For shared subspace exploitation, SIMM jointly minimizes confusion adversarial loss and multi-label loss to utilize shared information from all views. For view-specific information extraction, SIMM enforces an orthogonal constraint w.r.t. the shared subspace to utilize view-specific discriminative information. Extensive experiments on real-world data sets clearly show the favorable performance of SIMM against other state-of-the-art multi-view multi-label learning approaches.

Download Full-text

Approximation Guarantees of Stochastic Greedy Algorithms for Subset Selection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/205 ◽

2018 ◽

Cited By ~ 3

Author(s):

Chao Qian ◽

Yang Yu ◽

Ke Tang

Keyword(s):

Real World ◽

Large Scale ◽

Fundamental Problem ◽

Greedy Algorithms ◽

Subset Selection ◽

Objective Functions ◽

General Constraint ◽

Real World Applications ◽

General Stochastic

Subset selection is a fundamental problem in many areas, which aims to select the best subset of size at most $k$ from a universe. Greedy algorithms are widely used for subset selection, and have shown good approximation performances in deterministic situations. However, their behaviors are stochastic in many realistic situations (e.g., large-scale and noisy). For general stochastic greedy algorithms, bounded approximation guarantees were obtained only for subset selection with monotone submodular objective functions, while real-world applications often involve non-monotone or non-submodular objective functions and can be subject to a more general constraint than a size constraint. This work proves their approximation guarantees in these cases, and thus largely extends the applicability of stochastic greedy algorithms.

Download Full-text

Fair Allocation of Indivisible Goods to Asymmetric Agents

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11291 ◽

2019 ◽

Vol 64 ◽

pp. 1-20 ◽

Cited By ~ 3

Author(s):

Alireza Farhadi ◽

Mohammad Ghodsi ◽

Mohammad Taghi Hajiaghayi ◽

Sébastien Lahaie ◽

David Pennock ◽

...

Keyword(s):

Real World ◽

Simple Algorithm ◽

Fair Allocation ◽

Indivisible Goods ◽

Real World Data ◽

The Subject ◽

Approximation Guarantee ◽

To Receive ◽

Positive Results ◽

Better Than

We study fair allocation of indivisible goods to agents with unequal entitlements. Fair allocation has been the subject of many studies in both divisible and indivisible settings. Our emphasis is on the case where the goods are indivisible and agents have unequal entitlements. This problem is a generalization of the work by Procaccia and Wang (2014) wherein the agents are assumed to be symmetric with respect to their entitlements. Although Procaccia and Wang show an almost fair (constant approximation) allocation exists in their setting, our main result is in sharp contrast to their observation. We show that, in some cases with n agents, no allocation can guarantee better than 1/n approximation of a fair allocation when the entitlements are not necessarily equal. Furthermore, we devise a simple algorithm that ensures a 1/n approximation guarantee. Our second result is for a restricted version of the problem where the valuation of every agent for each good is bounded by the total value he wishes to receive in a fair allocation. Although this assumption might seem without loss of generality, we show it enables us to find a 1/2 approximation fair allocation via a greedy algorithm. Finally, we run some experiments on real-world data and show that, in practice, a fair allocation is likely to exist. We also support our experiments by showing positive results for two stochastic variants of the problem, namely stochastic agents and stochastic items.

Download Full-text

Sampling for Approximate Maximum Search in Factorized Tensor

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/334 ◽

2017 ◽

Author(s):

Zhi Lu ◽

Yang Hu ◽

Bing Zeng

Keyword(s):

Theoretical Analysis ◽

Collaborative Filtering ◽

Real World ◽

State Of The Art ◽

The Other ◽

Data Sets ◽

Real World Data ◽

Parafac Model ◽

The Matrix ◽

Special Case

Factorization models have been extensively used for recovering the missing entries of a matrix or tensor. However, directly computing all of the entries using the learned factorization models is prohibitive when the size of the matrix/tensor is large. On the other hand, in many applications, such as collaborative filtering, we are only interested in a few entries that are the largest among them. In this work, we propose a sampling-based approach for finding the top entries of a tensor which is decomposed by the CANDECOMP/PARAFAC model. We develop an algorithm to sample the entries with probabilities proportional to their values. We further extend it to make the sampling proportional to the $k$-th power of the values, amplifying the focus on the top ones. We provide theoretical analysis of the sampling algorithm and evaluate its performance on several real-world data sets. Experimental results indicate that the proposed approach is orders of magnitude faster than exhaustive computing. When applied to the special case of searching in a matrix, it also requires fewer samples than the other state-of-the-art method.

Download Full-text

Partial Label Learning with Unlabeled Data

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/521 ◽

2019 ◽

Cited By ~ 1

Author(s):

Qian-Wei Wang ◽

Yu-Feng Li ◽

Zhi-Hua Zhou

Keyword(s):

Real World ◽

Reconstruction Error ◽

Unlabeled Data ◽

Label Propagation ◽

Data Sets ◽

Real World Data ◽

Real World Applications ◽

Training Examples ◽

Partial Label Learning ◽

Character Classification

Partial label learning deals with training examples each associated with a set of candidate labels, among which only one label is valid. Previous studies typically assume that the candidate label sets are provided for all training examples. In many real-world applications such as video character classification, however, it is generally difficult to label a large number of instances and there exists much data left to be unlabeled. We call this kind of problem semi-supervised partial label learning. In this paper, we propose the SSPL method to address this problem. Specifically, an iterative label propagation procedure between partial label examples and unlabeled instances is employed to disambiguate the candidate label sets of partial label examples as well as assign valid labels to unlabeled instances. The importance of unlabeled instances increases adaptively as the number of iteration increases, since they carry richer labeling information. Finally, unseen instances are classified based on the minimum reconstruction error on both partial label and unlabeled instances. Experiments on real-world data sets clearly validate the effectiveness of the proposed SSPL method.

Download Full-text

Semi-Supervised Outlier Detection with Only Positive and Unlabeled Data Based on Fuzzy Clustering

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015500037 ◽

2015 ◽

Vol 24 (03) ◽

pp. 1550003 ◽

Cited By ~ 1

Author(s):

Armin Daneshpazhouh ◽

Ashkan Sami

Keyword(s):

Intrusion Detection ◽

Outlier Detection ◽

Fuzzy Clustering ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Real World Applications

The task of semi-supervised outlier detection is to find the instances that are exceptional from other data, using some labeled examples. In many applications such as fraud detection and intrusion detection, this issue becomes more important. Most existing techniques are unsupervised. On the other hand, semi-supervised approaches use both negative and positive instances to detect outliers. However, in many real world applications, very few positive labeled examples are available. This paper proposes an innovative approach to address this problem. The proposed method works as follows. First, some reliable negative instances are extracted by a kNN-based algorithm. Afterwards, fuzzy clustering using both negative and positive examples is utilized to detect outliers. Experimental results on real data sets demonstrate that the proposed approach outperforms the previous unsupervised state-of-the-art methods in detecting outliers.

Download Full-text

Learning the Compositional Visual Coherence for Complementary Recommendations

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/489 ◽

2020 ◽

Author(s):

Zhi Li ◽

Bo Wu ◽

Qi Liu ◽

Likang Wu ◽

Hongke Zhao ◽

...

Keyword(s):

Neural Network ◽

Real World ◽

Large Scale ◽

State Of The Art ◽

Optimization Strategy ◽

Real World Data ◽

World Data ◽

Semantic Coherence ◽

Global Coherence ◽

Art Methods

Complementary recommendations, which aim at providing users product suggestions that are supplementary and compatible with their obtained items, have become a hot topic in both academia and industry in recent years. Existing work mainly focused on modeling the co-purchased relations between two items, but the compositional associations of item collections are largely unexplored. Actually, when a user chooses the complementary items for the purchased products, it is intuitive that she will consider the visual semantic coherence (such as color collocations, texture compatibilities) in addition to global impressions. Towards this end, in this paper, we propose a novel Content Attentive Neural Network (CANN) to model the comprehensive compositional coherence on both global contents and semantic contents. Specifically, we first propose a Global Coherence Learning (GCL) module based on multi-heads attention to model the global compositional coherence. Then, we generate the semantic-focal representations from different semantic regions and design a Focal Coherence Learning (FCL) module to learn the focal compositional coherence from different semantic-focal representations. Finally, we optimize the CANN in a novel compositional optimization strategy. Extensive experiments on the large-scale real-world data clearly demonstrate the effectiveness of CANN compared with several state-of-the-art methods.

Download Full-text

Restart and Random Walk in Local Search for Maximum Vertex Weight Cliques with Evaluations in Clustering Aggregation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/87 ◽

2017 ◽

Cited By ~ 4

Author(s):

Yi Fan ◽

Nan Li ◽

Chengqian Li ◽

Zongjie Ma ◽

Longin Jan Latecki ◽

...

Keyword(s):

Random Walk ◽

Local Search ◽

Real World ◽

State Of The Art ◽

Real Data ◽

Experimental Results ◽

Data Sets ◽

Vertex Weight ◽

Real World Applications ◽

Clustering Aggregation

The Maximum Vertex Weight Clique (MVWC) problem is NP-hard and also important in real-world applications. In this paper we propose to use the restart and the random walk strategies to improve local search for MVWC. If a solution is revisited in some particular situation, the search will restart. In addition, when the local search has no other options except dropping vertices, it will use random walk. Experimental results show that our solver outperforms state-of-the-art solvers in DIMACS and finds a new best-known solution. Also it is the unique solver which is comparable with state-of-the-art methods on both BHOSLIB and large crafted graphs. Furthermore we evaluated our solver in clustering aggregation. Experimental results on a number of real data sets demonstrate that our solver outperforms the state-of-the-art for solving the derived MVWC problem and helps improve the final clustering results.

Download Full-text