Positive and Unlabeled Learning via Loss Decomposition and Centroid Estimation

Positive and Unlabeled learning (PU learning) aims to train a binary classifier based on only positive and unlabeled examples, where the unlabeled examples could be either positive or negative. The state-of-the-art algorithms usually cast PU learning as a cost-sensitive learning problem and impose distinct weights to different training examples via a manual or automatic way. However, such weight adjustment or estimation can be inaccurate and thus often lead to unsatisfactory performance. Therefore, this paper regards all unlabeled examples as negative, which means that some of the original positive data are mistakenly labeled as negative. By doing so, we convert PU learning into the risk minimization problem in the presence of false negative label noise, and propose a novel PU learning algorithm termed ?Loss Decomposition and Centroid Estimation? (LDCE). By decomposing the hinge loss function into two parts, we show that only the second part is influenced by label noise, of which the adverse effect can be reduced by estimating the centroid of negative examples. We intensively validate our approach on synthetic dataset, UCI benchmark datasets and real-world datasets, and the experimental results firmly demonstrate the effectiveness of our approach when compared with other state-of-the-art PU learning methodologies.

Download Full-text

Online Positive and Unlabeled Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/311 ◽

2020 ◽

Author(s):

Chuang Zhang ◽

Chen Gong ◽

Tengfei Liu ◽

Xun Lu ◽

Weiqiang Wang ◽

...

Keyword(s):

Online Learning ◽

Learning Algorithm ◽

Unlabeled Data ◽

Sequential Data ◽

Learning Method ◽

Pu Learning ◽

Gradient Based ◽

Learning Scenarios ◽

Positive And Unlabeled Learning ◽

Real World Datasets

Positive and Unlabeled learning (PU learning) aims to build a binary classifier where only positive and unlabeled data are available for classifier training. However, existing PU learning methods all work on a batch learning mode, which cannot deal with the online learning scenarios with sequential data. Therefore, this paper proposes a novel positive and unlabeled learning algorithm in an online training mode, which trains a classifier solely on the positive and unlabeled data arriving in a sequential order. Specifically, we adopt an unbiased estimate for the loss induced by the arriving positive or unlabeled examples at each time. Then we show that for any coming new single datum, the model can be updated independently and incrementally by gradient based online learning method. Furthermore, we extend our method to tackle the cases when more than one example is received at each time. Theoretically, we show that the proposed online PU learning method achieves low regret even though it receives sequential positive and unlabeled data. Empirically, we conduct intensive experiments on both benchmark and real-world datasets, and the results clearly demonstrate the effectiveness of the proposed method.

Download Full-text

Positive and Unlabeled Learning with Label Disambiguation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/590 ◽

2019 ◽

Author(s):

Chuang Zhang ◽

Dexin Ren ◽

Tongliang Liu ◽

Jian Yang ◽

Chen Gong

Keyword(s):

State Of The Art ◽

Ground Truth ◽

Training Data ◽

Learning Approaches ◽

Generalization Error ◽

Binary Classifier ◽

Learning Problem ◽

Pu Learning ◽

Positive And Unlabeled Learning ◽

Real World Datasets

Positive and Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example is simultaneously treated as positive and negative with different class weights. However, the ground-truth label of an unlabeled example should be unique, so the existing models inadvertently introduce the label noise which may lead to the biased classifier and deteriorated performance. To solve this problem, this paper proposes a novel algorithm dubbed as "Positive and Unlabeled learning with Label Disambiguation'' (PULD). We first regard all the unlabeled examples in PU learning as ambiguously labeled as positive and negative, and then employ the margin-based label disambiguation strategy, which enlarges the margin of classifier response between the most likely label and the less likely one, to find the unique ground-truth label of each unlabeled example. Theoretically, we derive the generalization error bound of the proposed method by analyzing its Rademacher complexity. Experimentally, we conduct intensive experiments on both benchmark and real-world datasets, and the results clearly demonstrate the superiority of the proposed PULD to the existing PU learning approaches.

Download Full-text

Gromov-Wasserstein optimal transport to align single-cell multi-omics data

10.1101/2020.04.28.066787 ◽

2020 ◽

Cited By ~ 2

Author(s):

Pinar Demetci ◽

Rebecca Santorella ◽

Björn Sandstede ◽

William Stafford Noble ◽

Ritambhara Singh

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Learning Algorithm ◽

State Of The Art ◽

Single Cells ◽

Wasserstein Distance ◽

Cell Alignment ◽

Shared Space ◽

Real World Datasets ◽

Unsupervised Algorithms

AbstractData integration of single-cell measurements is critical for understanding cell development and disease, but the lack of correspondence between different types of measurements makes such efforts challenging. Several unsupervised algorithms can align heterogeneous single-cell measurements in a shared space, enabling the creation of mappings between single cells in different data domains. However, these algorithms require hyperparameter tuning for high-quality alignments, which is difficult in an unsupervised setting without correspondence information for validation. We present Single-Cell alignment using Optimal Transport (SCOT), an unsupervised learning algorithm that uses Gromov Wasserstein-based optimal transport to align single-cell multi-omics datasets. We compare the alignment performance of SCOT with state-of-the-art algorithms on four simulated and two real-world datasets. SCOT performs on par with state-of-the-art methods but is faster and requires tuning fewer hyperparameters. Furthermore, we provide an algorithm for SCOT to use Gromov Wasserstein distance to guide the parameter selection. Thus, unlike previous methods, SCOT aligns well without using any orthogonal correspondence information to pick the hyperparameters. Our source code and scripts for replicating the results are available at https://github.com/rsinghlab/SCOT.

Download Full-text

Label Distribution for Learning with Noisy Labels

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/356 ◽

2020 ◽

Author(s):

Yun-Peng Liu ◽

Ning Xu ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Learning Algorithm ◽

State Of The Art ◽

Confidence Estimation ◽

Novel Method ◽

Real World Datasets ◽

Label Distribution ◽

Noisy Labels

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.

Download Full-text

Learning Continuous Time Bayesian Networks in Non-stationary Domains

Journal of Artificial Intelligence Research ◽

10.1613/jair.5126 ◽

2016 ◽

Vol 57 ◽

pp. 1-37 ◽

Cited By ~ 2

Author(s):

Simone Villa ◽

Fabio Stella

Keyword(s):

Bayesian Networks ◽

Continuous Time ◽

Learning Algorithm ◽

State Of The Art ◽

Synthetic Data ◽

Score Function ◽

Dynamic Bayesian Networks ◽

Continuous Time Bayesian Networks ◽

Real World Datasets ◽

Transition Times

Non-stationary continuous time Bayesian networks are introduced. They allow the parents set of each node to change over continuous time. Three settings are developed for learning non-stationary continuous time Bayesian networks from data: known transition times, known number of epochs and unknown number of epochs. A score function for each setting is derived and the corresponding learning algorithm is developed. A set of numerical experiments on synthetic data is used to compare the effectiveness of non-stationary continuous time Bayesian networks to that of non-stationary dynamic Bayesian networks. Furthermore, the performance achieved by non-stationary continuous time Bayesian networks is compared to that achieved by state-of-the-art algorithms on four real-world datasets, namely drosophila, saccharomyces cerevisiae, songbird and macroeconomics.

Download Full-text

Reliable and Efficient Anytime Skeleton Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6569 ◽

2020 ◽

Vol 34 (06) ◽

pp. 10101-10109

Author(s):

Rui Ding ◽

Yanzhi Liu ◽

Jingjing Tian ◽

Zhouyu Fu ◽

Shi Han ◽

...

Keyword(s):

Time Complexity ◽

Input Data ◽

Undirected Graph ◽

Learning Algorithm ◽

State Of The Art ◽

High Reliability ◽

Research Community ◽

Causal Learning ◽

Real World Datasets ◽

Anytime Learning

Skeleton Learning (SL) is the task for learning an undirected graph from the input data that captures their dependency relations. SL plays a pivotal role in causal learning and has attracted growing attention in the research community lately. Due to the high time complexity, anytime SL has emerged which learns a skeleton incrementally and improves it overtime. In this paper, we first propose and advocate the reliability requirement for anytime SL to be practically useful. Reliability requires the intermediately learned skeleton to have precision and persistency. We also present REAL, a novel Reliable and Efficient Anytime Learning algorithm of skeleton. Specifically, we point out that the commonly existing Functional Dependency (FD) among variables could make the learned skeleton violate faithfulness assumption, thus we propose a theory to resolve such incompatibility. Based on this, REAL conducts SL on a reduced set of variables with guaranteed correctness thus drastically improves efficiency. Furthermore, it employs a novel edge-insertion and best-first strategy in anytime fashion for skeleton growing to achieve high reliability and efficiency. We prove that the skeleton learned by REAL converges to the correct skeleton under standard assumptions. Thorough experiments were conducted on both benchmark and real-world datasets demonstrate that REAL significantly outperforms the other state-of-the-art algorithms.

Download Full-text

Exploring Clustering-Based Reinforcement Learning for Personalized Book Recommendation in Digital Library

Information ◽

10.3390/info12050198 ◽

2021 ◽

Vol 12 (5) ◽

pp. 198

Author(s):

Xinhua Wang ◽

Yuchen Wang ◽

Lei Guo ◽

Liancheng Xu ◽

Baozhong Gao ◽

...

Keyword(s):

Reinforcement Learning ◽

Digital Library ◽

Learning Algorithm ◽

State Of The Art ◽

Decision Making Process ◽

Learning Method ◽

Large Collection ◽

Recommendation Algorithms ◽

Small Set ◽

Real World Datasets

Digital library as one of the most important ways in helping students acquire professional knowledge and improve their professional level has gained great attention in recent years. However, its large collection (especially the book resources) hinders students from finding the resources that they are interested in. To overcome this challenge, many researchers have already turned to recommendation algorithms. Compared with traditional recommendation tasks, in the digital library, there are two challenges in book recommendation problems. The first is that users may borrow books that they are not interested in (i.e., noisy borrowing behaviours), such as borrowing books for classmates. The second is that the number of books in a digital library is usually very large, which means one student can only borrow a small set of books in history (i.e., data sparsity issue). As the noisy interactions in students’ borrowing sequences may harm the recommendation performance of a book recommender, we focus on refining recommendations via filtering out data noises. Moreover, due to the the lack of direct supervision information, we treat noise filtering in sequences as a decision-making process and innovatively introduce a reinforcement learning method as our recommendation framework. Furthermore, to overcome the sparsity issue of students’ borrowing behaviours, a clustering-based reinforcement learning algorithm is further developed. Experimental results on two real-world datasets demonstrate the superiority of our proposed method compared with several state-of-the-art recommendation methods.

Download Full-text

Tweedie-Hawkes Processes: Interpreting the Phenomena of Outbreaks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5902 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4699-4706

Author(s):

Tianbo Li ◽

Yiping Ke

Keyword(s):

Information Diffusion ◽

Learning Algorithm ◽

State Of The Art ◽

Hawkes Processes ◽

Event Sequences ◽

Diffusion Analysis ◽

Tweedie Distribution ◽

Future Events ◽

Real World Datasets ◽

Variational Em Algorithm

Self-exciting event sequences, in which the occurrence of an event increases the probability of triggering subsequent ones, are common in many disciplines. In this paper, we propose a Bayesian model called Tweedie-Hawkes Processes (THP), which is able to model the outbreaks of events and find out the dominant factors behind. THP leverages on the Tweedie distribution in capturing various excitation effects. A variational EM algorithm is developed for model inference. Some theoretical properties of THP, including the sub-criticality, convergence of the learning algorithm and kernel selection method are discussed. Applications to Epidemiology and information diffusion analysis demonstrate the versatility of our model in various disciplines. Evaluations on real-world datasets show that THP outperforms the rival state-of-the-art baselines in the task of forecasting future events.

Download Full-text

Improving Implicit Recommender Systems with View Data

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/464 ◽

2018 ◽

Cited By ~ 21

Author(s):

Jingtao Ding ◽

Guanghui Yu ◽

Xiangnan He ◽

Yuhan Quan ◽

Yong Li ◽

...

Keyword(s):

Recommender Systems ◽

Matrix Factorization ◽

Time Complexity ◽

Learning Algorithm ◽

State Of The Art ◽

Implicit Feedback ◽

Model Parameters ◽

New Learning ◽

Real World Datasets ◽

Feedback Data

Most existing recommender systems leverage the primary feedback data only, such as the purchase records in E-commerce. In this work, we additionally integrate view data into implicit feedback based recommender systems (dubbed as Implicit Recommender Systems). We propose to model the pairwise ranking relations among purchased, viewed, and non-viewed interactions, being more effective and ﬂexible than typical pointwise matrix factorization (MF) methods. However, such a pairwise formulation poses efﬁciency challenges in learning the model. To address this problem, we design a new learning algorithm based on the element-wise Alternating Least Squares (eALS) learner. Notably, our algorithm can efﬁciently learn model parameters from the whole user-item matrix (including all missing data), with a rather low time complexity that is dependent on the observed data only. Extensive experiments on two real-world datasets demonstrate that our method outperforms several state-of-the-art MF methods by 10% ∼ 28.4%. Our implementation is available at: https://github.com/ dingjingtao/View_enhanced_ALS.

Download Full-text

Multi-Positive and Unlabeled Learning

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/444 ◽

2017 ◽

Cited By ~ 4

Author(s):

Yixing Xu ◽

Chang Xu ◽

Chao Xu ◽

Dacheng Tao

Keyword(s):

Unlabeled Data ◽

Learning Problem ◽

Step Method ◽

Practical Applications ◽

Pu Learning ◽

Positive And Unlabeled Learning ◽

One Step ◽

Real World Datasets ◽

Multi Class Classification ◽

The Given

The positive and unlabeled (PU) learning problem focuses on learning a classifier from positive and unlabeled data. Some methods have been developed to solve the PU learning problem. However, they are often limited in practical applications, since only binary classes are involved and cannot easily be adapted to multi-class data. Here we propose a one-step method that directly enables multi-class model to be trained using the given input multi-class data and that predicts the label based on the model decision. Specifically, we construct different convex loss functions for labeled and unlabeled data to learn a discriminant function F. The theoretical analysis on the generalization error bound shows that it is no worse than k√k times of the fully supervised multi-class classification methods when the size of the data in k classes is of the same order. Finally, our experimental results demonstrate the significance and effectiveness of the proposed algorithm in synthetic and real-world datasets.

Download Full-text