Wasserstein Soft Label Propagation on Hypergraphs: Algorithm and Generalization Error Bounds

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013630 ◽

2019 ◽

Vol 33 ◽

pp. 3630-3637

Author(s):

Tingran Gao ◽

Shahab Asoodeh ◽

Yi Huang ◽

James Evans

Keyword(s):

Error Bounds ◽

Message Passing ◽

Learning Algorithm ◽

Probability Distributions ◽

Wasserstein Distance ◽

Semisupervised Learning ◽

Label Propagation ◽

Pac Learning ◽

Generalization Error ◽

Data Mining Algorithms

Inspired by recent interests of developing machine learning and data mining algorithms on hypergraphs, we investigate in this paper the semi-supervised learning algorithm of propagating ”soft labels” (e.g. probability distributions, class membership scores) over hypergraphs, by means of optimal transportation. Borrowing insights from Wasserstein propagation on graphs [Solomon et al. 2014], we re-formulate the label propagation procedure as a message-passing algorithm, which renders itself naturally to a generalization applicable to hypergraphs through Wasserstein barycenters. Furthermore, in a PAC learning framework, we provide generalization error bounds for propagating one-dimensional distributions on graphs and hypergraphs using 2-Wasserstein distance, by establishing the algorithmic stability of the proposed semisupervised learning algorithm. These theoretical results also shed new lights upon deeper understandings of the Wasserstein propagation on graphs.

Download Full-text

A comparison of tight generalization error bounds

Proceedings of the 22nd international conference on Machine learning - ICML '05 ◽

10.1145/1102351.1102403 ◽

2005 ◽

Cited By ~ 5

Author(s):

Matti Kääriäinen ◽

John Langford

Keyword(s):

Error Bounds ◽

Generalization Error

Download Full-text

Gromov-Wasserstein optimal transport to align single-cell multi-omics data

10.1101/2020.04.28.066787 ◽

2020 ◽

Cited By ~ 2

Author(s):

Pinar Demetci ◽

Rebecca Santorella ◽

Björn Sandstede ◽

William Stafford Noble ◽

Ritambhara Singh

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Learning Algorithm ◽

State Of The Art ◽

Single Cells ◽

Wasserstein Distance ◽

Cell Alignment ◽

Shared Space ◽

Real World Datasets ◽

Unsupervised Algorithms

AbstractData integration of single-cell measurements is critical for understanding cell development and disease, but the lack of correspondence between different types of measurements makes such efforts challenging. Several unsupervised algorithms can align heterogeneous single-cell measurements in a shared space, enabling the creation of mappings between single cells in different data domains. However, these algorithms require hyperparameter tuning for high-quality alignments, which is difficult in an unsupervised setting without correspondence information for validation. We present Single-Cell alignment using Optimal Transport (SCOT), an unsupervised learning algorithm that uses Gromov Wasserstein-based optimal transport to align single-cell multi-omics datasets. We compare the alignment performance of SCOT with state-of-the-art algorithms on four simulated and two real-world datasets. SCOT performs on par with state-of-the-art methods but is faster and requires tuning fewer hyperparameters. Furthermore, we provide an algorithm for SCOT to use Gromov Wasserstein distance to guide the parameter selection. Thus, unlike previous methods, SCOT aligns well without using any orthogonal correspondence information to pick the hyperparameters. Our source code and scripts for replicating the results are available at https://github.com/rsinghlab/SCOT.

Download Full-text

Using Generalization Error Bounds to Train the Set Covering Machine

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-540-69158-7_28 ◽

2008 ◽

pp. 258-268

Author(s):

Zakria Hussain ◽

John Shawe-Taylor

Keyword(s):

Error Bounds ◽

Set Covering ◽

Generalization Error

Download Full-text

Region-Based Graph Learning towards Large Scale Image Annotation

Graph-Based Methods in Computer Vision ◽

10.4018/978-1-4666-1891-6.ch013 ◽

2012 ◽

pp. 244-260

Author(s):

Bao Bing-Kun ◽

Yan Shuicheng

Keyword(s):

Large Scale ◽

Nearest Neighbor ◽

Image Annotation ◽

Learning Algorithm ◽

Label Propagation ◽

Locality Sensitive Hashing ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph ◽

Modeling Data

Graph-based learning provides a useful approach for modeling data in image annotation problems. In this chapter, the authors introduce how to construct a region-based graph to annotate large scale multi-label images. It has been well recognized that analysis in semantic region level may greatly improve image annotation performance compared to that in whole image level. However, the region level approach increases the data scale to several orders of magnitude and lays down new challenges to most existing algorithms. To this end, each image is firstly encoded as a Bag-of-Regions based on multiple image segmentations. And then, all image regions are constructed into a large k-nearest-neighbor graph with efficient Locality Sensitive Hashing (LSH) method. At last, a sparse and region-aware image-based graph is fed into the multi-label extension of the Entropic graph regularized semi-supervised learning algorithm (Subramanya & Bilmes, 2009). In combination they naturally yield the capability in handling large-scale dataset. Extensive experiments on NUS-WIDE (260k images) and COREL-5k datasets well validate the effectiveness and efficiency of the framework for region-aware and scalable multi-label propagation.

Download Full-text

Generalization Error Bounds Using Unlabeled Data

Learning Theory - Lecture Notes in Computer Science ◽

10.1007/11503415_9 ◽

2005 ◽

pp. 127-142 ◽

Cited By ~ 12

Author(s):

Matti Kääriäinen

Keyword(s):

Error Bounds ◽

Unlabeled Data ◽

Generalization Error

Download Full-text

Graph Transduction as a Noncooperative Game

Neural Computation ◽

10.1162/neco_a_00233 ◽

2012 ◽

Vol 24 (3) ◽

pp. 700-723 ◽

Cited By ~ 17

Author(s):

Aykut Erdem ◽

Marcello Pelillo

Keyword(s):

Graph Laplacian ◽

Semisupervised Learning ◽

Label Propagation ◽

General Idea ◽

Noncooperative Game ◽

Similarity Relations ◽

Label Information ◽

Data Points ◽

Game Theoretic ◽

Laplacian Regularization

Graph transduction is a popular class of semisupervised learning techniques that aims to estimate a classification function defined over a graph of labeled and unlabeled data points. The general idea is to propagate the provided label information to unlabeled nodes in a consistent way. In contrast to the traditional view, in which the process of label propagation is defined as a graph Laplacian regularization, this article proposes a radically different perspective, based on game-theoretic notions. Within the proposed framework, the transduction problem is formulated in terms of a noncooperative multiplayer game whereby equilibria correspond to consistent labelings of the data. An attractive feature of this formulation is that it is inherently a multiclass approach and imposes no constraint whatsoever on the structure of the pairwise similarity matrix, being able to naturally deal with asymmetric and negative similarities alike. Experiments on a number of real-world problems demonstrate that the proposed approach performs well compared with state-of-the-art algorithms, and it can deal effectively with various types of similarity relations.

Download Full-text

On the Relation of Slow Feature Analysis and Laplacian Eigenmaps

Neural Computation ◽

10.1162/neco_a_00214 ◽

2011 ◽

Vol 23 (12) ◽

pp. 3287-3302 ◽

Cited By ~ 22

Author(s):

Henning Sprekeler

Keyword(s):

Spectral Clustering ◽

Function Approximation ◽

Learning Algorithm ◽

Temporal Structure ◽

Semisupervised Learning ◽

Feature Analysis ◽

Nonlinear Dimensionality Reduction ◽

Biologically Inspired ◽

Laplacian Eigenmaps ◽

Slow Feature Analysis

The past decade has seen a rise of interest in Laplacian eigenmaps (LEMs) for nonlinear dimensionality reduction. LEMs have been used in spectral clustering, in semisupervised learning, and for providing efficient state representations for reinforcement learning. Here, we show that LEMs are closely related to slow feature analysis (SFA), a biologically inspired, unsupervised learning algorithm originally designed for learning invariant visual representations. We show that SFA can be interpreted as a function approximation of LEMs, where the topological neighborhoods required for LEMs are implicitly defined by the temporal structure of the data. Based on this relation, we propose a generalization of SFA to arbitrary neighborhood relations and demonstrate its applicability for spectral clustering. Finally, we review previous work with the goal of providing a unifying view on SFA and LEMs.

Download Full-text

On Bias Plus Variance

Neural Computation ◽

10.1162/neco.1997.9.6.1211 ◽

1997 ◽

Vol 9 (6) ◽

pp. 1211-1243 ◽

Cited By ~ 38

Author(s):

David H. Wolpert

Keyword(s):

Bayesian Analysis ◽

Loss Function ◽

Learning Algorithm ◽

Single Point ◽

Quadratic Loss ◽

Generalization Error ◽

Trade Off ◽

Formula One ◽

Variance Formula ◽

Bias Variance

This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as in Bayesian analysis) and training sets are averaged over (as in the conventional bias plus variance formula). Another additive correction casts conventional fixed-trainingset Bayesian analysis directly in terms of bias plus variance. Another correction is appropriate for measuring full generalization error over a test set rather than (as with conventional bias plus variance) error at a single point. Yet another correction can help explain the recent counterintuitive bias-variance decomposition of Friedman for zero-one loss. After presenting these corrections, this article discusses some other loss function-specific aspects of supervised learning. In particular, there is a discussion of the fact that if the loss function is a metric (e.g., zero-one loss), then there is bound on the change in generalization error accompanying changing the algorithm's guess from h1 to h2, a bound that depends only on h1 and h2 and not on the target. This article ends by presenting versions of the bias-plus-variance formula appropriate for logarithmic and quadratic scoring, and then all the additive corrections appropriate to those formulas. All the correction terms presented are a covariance, between the learning algorithm and the posterior distribution over targets. Accordingly, in the (very common) contexts in which those terms apply, there is not a “bias-variance trade-off” or a “bias-variance dilemma,” as one often hears. Rather there is a bias-variance-covariance trade-off.

Download Full-text

On Random Subset Generalization Error Bounds and the Stochastic Gradient Langevin Dynamics Algorithm

2020 IEEE Information Theory Workshop (ITW) ◽

10.1109/itw46852.2021.9457578 ◽

2021 ◽

Author(s):

Borja Rodriguez-Galvez ◽

German Bassi ◽

Ragnar Thobaben ◽

Mikael Skoglund

Keyword(s):

Error Bounds ◽

Langevin Dynamics ◽

Stochastic Gradient ◽

Generalization Error ◽

Random Subset

Download Full-text

Masked Label Prediction: Unified Message Passing Model for Semi-Supervised Classification

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/214 ◽

2021 ◽

Author(s):

Yunsheng Shi ◽

Zhengjie Huang ◽

Shikun Feng ◽

Hui Zhong ◽

Wenjing Wang ◽

...

Keyword(s):

Neural Network ◽

Message Passing ◽

Supervised Classification ◽

State Of The Art ◽

Label Propagation ◽

Superior Performance ◽

Label Prediction ◽

Label Information ◽

Message Passing Algorithms ◽

Prediction Strategy

Graph neural network (GNN) and label propagation algorithm (LPA) are both message passing algorithms, which have achieved superior performance in semi-supervised classification. GNN performs feature propagation by a neural network to make predictions, while LPA uses label propagation across graph adjacency matrix to get results. However, there is still no effective way to directly combine these two kinds of algorithms. To address this issue, we propose a novel Unified Message Passaging Model (UniMP) that can incorporate feature and label propagation at both training and inference time. First, UniMP adopts a Graph Transformer network, taking feature embedding and label embedding as input information for propagation. Second, to train the network without overfitting in self-loop input label information, UniMP introduces a masked label prediction strategy, in which some percentage of input label information are masked at random, and then predicted. UniMP conceptually unifies feature propagation and label propagation and is empirically powerful. It obtains new state-of-the-art semi-supervised classification results in Open Graph Benchmark (OGB).

Download Full-text