Online regularized pairwise learning with non-i.i.d. observations

Author(s):  
Yimo Qin ◽  
Bin Zou ◽  
Jingjing Zeng ◽  
Zhifei Sheng ◽  
Lei Yin

In this paper, we consider the online regularized pairwise learning (ORPL) algorithm with least squares loss function for non-independently and identically distribution (non-i.i.d.) observations. We first establish new Bennett’s inequalities for [Formula: see text]-mixing sequence, geometrically [Formula: see text]-mixing sequence, [Formula: see text]-geometrically ergodic Markov chain and uniformly ergodic Markov chain. Then we establish the convergence rates for the last iterate of the ORPL algorithm with the polynomially decaying step sizes and varying regularization parameters for non-i.i.d. observations. These established results in this paper extend the previously known results of ORPL from i.i.d. observations to the case of non-i.i.d. observations, and the established result of ORPL for [Formula: see text]-mixing can be nearly optimal rate of ORPL for i.i.d. observations with [Formula: see text]-norm.

2019 ◽  
Vol 18 (01) ◽  
pp. 49-78 ◽  
Author(s):  
Cheng Wang ◽  
Ting Hu

In this paper, we study online algorithm for pairwise problems generated from the Tikhonov regularization scheme associated with the least squares loss function and a reproducing kernel Hilbert space (RKHS). This work establishes the convergence for the last iterate of the online pairwise algorithm with the polynomially decaying step sizes and varying regularization parameters. We show that the obtained error rate in [Formula: see text]-norm can be nearly optimal in the minimax sense under some mild conditions. Our analysis is achieved by a sharp estimate for the norms of the learning sequence and the characterization of RKHS using its associated integral operators and probability inequalities for random variables with values in a Hilbert space.


2016 ◽  
Vol 28 (4) ◽  
pp. 743-777 ◽  
Author(s):  
Yiming Ying ◽  
Ding-Xuan Zhou

Pairwise learning usually refers to a learning task that involves a loss function depending on pairs of examples, among which the most notable ones are bipartite ranking, metric learning, and AUC maximization. In this letter we study an online algorithm for pairwise learning with a least-square loss function in an unconstrained setting of a reproducing kernel Hilbert space (RKHS) that we refer to as the Online Pairwise lEaRning Algorithm (OPERA). In contrast to existing works (Kar, Sriperumbudur, Jain, & Karnick, 2013 ; Wang, Khardon, Pechyony, & Jones, 2012 ), which require that the iterates are restricted to a bounded domain or the loss function is strongly convex, OPERA is associated with a non-strongly convex objective function and learns the target function in an unconstrained RKHS. Specifically, we establish a general theorem that guarantees the almost sure convergence for the last iterate of OPERA without any assumptions on the underlying distribution. Explicit convergence rates are derived under the condition of polynomially decaying step sizes. We also establish an interesting property for a family of widely used kernels in the setting of pairwise learning and illustrate the convergence results using such kernels. Our methodology mainly depends on the characterization of RKHSs using its associated integral operators and probability inequalities for random variables with values in a Hilbert space.


1973 ◽  
Vol 10 (4) ◽  
pp. 886-890 ◽  
Author(s):  
W. J. Hendricks

In a single-shelf library of N books we suppose that books are selected one at a time and returned to the kth position on the shelf before another selection is made. Books are moved to the right or left as necessary to vacate position k. The probability of selecting each book is assumed to be known, and the N! arrangements of the books are considered as states of an ergodic Markov chain for which we find the stationary distribution.


Author(s):  
Manabu Kimura ◽  
◽  
Masashi Sugiyama

Recently, statistical dependence measures such as mutual information and kernelized covariance have been successfully applied to clustering. In this paper, we follow this line of research and propose a novel dependence-maximization clustering method based on least-squares mutual information, which is an estimator of a squared-loss variant of mutual information. A notable advantage of the proposed method over existing approaches is that hyperparameters such as kernel parameters and regularization parameters can be objectively optimized based on cross-validation. Thus, subjective manual-tuning of hyperparameters is not necessary in the proposed method, which is a highly useful property in unsupervised clustering scenarios. Through experiments, we illustrate the usefulness of the proposed approach.


Author(s):  
OMER ANGEL ◽  
YINON SPINKA

Abstract Consider an ergodic Markov chain on a countable state space for which the return times have exponential tails. We show that the stationary version of any such chain is a finitary factor of an independent and identically distributed (i.i.d.) process. A key step is to show that any stationary renewal process whose jump distribution has exponential tails and is not supported on a proper subgroup of ℤ is a finitary factor of an i.i.d. process.


1988 ◽  
Vol 25 (02) ◽  
pp. 391-403 ◽  
Author(s):  
Karl Sigman

A tandem queue with a FIFO multiserver system at each stage, i.i.d. service times and a renewal process of external arrivals is shown to be regenerative by modeling it as a Harris-ergodic Markov chain. In addition, some explicit regeneration points are found. This generalizes the results of Nummelin (1981) in which a single server system is at each stage and the result of Charlot et al. (1978) in which the FIFO GI/GI/c queue is modeled as a Harris chain. In preparing for our result, we study the random assignment queue and use it to give a new proof of Harris ergodicity of the FIFO queue.


1995 ◽  
Vol 7 (2) ◽  
pp. 270-279 ◽  
Author(s):  
Dimitri P. Bertsekas

Sutton's TD(λ) method aims to provide a representation of the cost function in an absorbing Markov chain with transition costs. A simple example is given where the representation obtained depends on λ. For λ = 1 the representation is optimal with respect to a least-squares error criterion, but as λ decreases toward 0 the representation becomes progressively worse and, in some cases, very poor. The example suggests a need to understand better the circumstances under which TD(0) and Q-learning obtain satisfactory neural network-based compact representations of the cost function. A variation of TD(0) is also given, which performs better on the example.


Sign in / Sign up

Export Citation Format

Share Document