scholarly journals Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

2021 ◽  
pp. 1-35
Author(s):  
Takuya Shimada ◽  
Han Bao ◽  
Issei Sato ◽  
Masashi Sugiyama

Pairwise similarities and dissimilarities between data points are often obtained more easily than full labels of data in real-world classification problems. To make use of such pairwise information, an empirical risk minimization approach has been proposed, where an unbiased estimator of the classification risk is computed from only pairwise similarities and unlabeled data. However, this approach has not yet been able to handle pairwise dissimilarities. Semisupervised clustering methods can incorporate both similarities and dissimilarities into their framework; however, they typically require strong geometrical assumptions on the data distribution such as the manifold assumption, which may cause severe performance deterioration. In this letter, we derive an unbiased estimator of the classification risk based on all of similarities and dissimilarities and unlabeled data. We theoretically establish an estimation error bound and experimentally demonstrate the practical usefulness of our empirical risk minimization method.

2021 ◽  
pp. 1-52
Author(s):  
Taira Tsuchiya ◽  
Nontawat Charoenphakdee ◽  
Issei Sato ◽  
Masashi Sugiyama

Abstract Ordinal regression is aimed at predicting an ordinal class label. In this letter, we consider its semisupervised formulation, in which we have unlabeled data along with ordinal-labeled data to train an ordinal regressor. There are several metrics to evaluate the performance of ordinal regression, such as the mean absolute error, mean zero-one error, and mean squared error. However, the existing studies do not take the evaluation metric into account, restrict model choice, and have no theoretical guarantee. To overcome these problems, we propose a novel generic framework for semisupervised ordinal regression based on the empirical risk minimization principle that is applicable to optimizing all of the metrics mentioned above. In addition, our framework has flexible choices of models, surrogate losses, and optimization algorithms without the common geometric assumption on unlabeled data such as the cluster assumption or manifold assumption. We provide an estimation error bound to show that our risk estimator is consistent. Finally, we conduct experiments to show the usefulness of our framework.


2020 ◽  
Vol 32 (3) ◽  
pp. 659-681 ◽  
Author(s):  
Zhenghang Cui ◽  
Nontawat Charoenphakdee ◽  
Issei Sato ◽  
Masashi Sugiyama

Learning from triplet comparison data has been extensively studied in the context of metric learning, where we want to learn a distance metric between two instances, and ordinal embedding, where we want to learn an embedding in a Euclidean space of the given instances that preserve the comparison order as much as possible. Unlike fully labeled data, triplet comparison data can be collected in a more accurate and human-friendly way. Although learning from triplet comparison data has been considered in many applications, an important fundamental question of whether we can learn a classifier only from triplet comparison data without all the labels has remained unanswered. In this letter, we give a positive answer to this important question by proposing an unbiased estimator for the classification risk under the empirical risk minimization framework. Since the proposed method is based on the empirical risk minimization framework, it inherently has the advantage that any surrogate loss function and any model, including neural networks, can be easily applied. Furthermore, we theoretically establish an estimation error bound for the proposed empirical risk minimizer. Finally, we provide experimental results to show that our method empirically works well and outperforms various baseline methods.


2015 ◽  
Vol 2015 ◽  
pp. 1-8
Author(s):  
Mingchen Yao ◽  
Chao Zhang ◽  
Wei Wu

Many generalization results in learning theory are established under the assumption that samples are independent and identically distributed (i.i.d.). However, numerous learning tasks in practical applications involve the time-dependent data. In this paper, we propose a theoretical framework to analyze the generalization performance of the empirical risk minimization (ERM) principle for sequences of time-dependent samples (TDS). In particular, we first present the generalization bound of ERM principle for TDS. By introducing some auxiliary quantities, we also give a further analysis of the generalization properties and the asymptotical behaviors of ERM principle for TDS.


2021 ◽  
Author(s):  
Puyu Wang ◽  
Zhenhuan Yang ◽  
Yunwen Lei ◽  
Yiming Ying ◽  
Hai Zhang

Author(s):  
Zhengling Qi ◽  
Ying Cui ◽  
Yufeng Liu ◽  
Jong-Shi Pang

This paper has two main goals: (a) establish several statistical properties—consistency, asymptotic distributions, and convergence rates—of stationary solutions and values of a class of coupled nonconvex and nonsmooth empirical risk-minimization problems and (b) validate these properties by a noisy amplitude-based phase-retrieval problem, the latter being of much topical interest. Derived from available data via sampling, these empirical risk-minimization problems are the computational workhorse of a population risk model that involves the minimization of an expected value of a random functional. When these minimization problems are nonconvex, the computation of their globally optimal solutions is elusive. Together with the fact that the expectation operator cannot be evaluated for general probability distributions, it becomes necessary to justify whether the stationary solutions of the empirical problems are practical approximations of the stationary solution of the population problem. When these two features, general distribution and nonconvexity, are coupled with nondifferentiability that often renders the problems “non-Clarke regular,” the task of the justification becomes challenging. Our work aims to address such a challenge within an algorithm-free setting. The resulting analysis is, therefore, different from much of the analysis in the recent literature that is based on local search algorithms. Furthermore, supplementing the classical global minimizer-centric analysis, our results offer a promising step to close the gap between computational optimization and asymptotic analysis of coupled, nonconvex, nonsmooth statistical estimation problems, expanding the former with statistical properties of the practically obtained solution and providing the latter with a more practical focus pertaining to computational tractability.


2016 ◽  
Vol 28 (12) ◽  
pp. 2853-2889 ◽  
Author(s):  
Hanyuan Hang ◽  
Yunlong Feng ◽  
Ingo Steinwart ◽  
Johan A. K. Suykens

This letter investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by general, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using gaussian kernels for both least squares and quantile regression. It turns out that for independent and identically distributed (i.i.d.) processes, our learning rates for ERM recover the optimal rates. For non-i.i.d. processes, including geometrically [Formula: see text]-mixing Markov processes, geometrically [Formula: see text]-mixing processes with restricted decay, [Formula: see text]-mixing processes, and (time-reversed) geometrically [Formula: see text]-mixing processes, our learning rates for SVMs with gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called effective number of observations for various mixing processes.


Sign in / Sign up

Export Citation Format

Share Document