Coefficient-based regularization network with variance loss for error

Author(s):  
Baoqi Su ◽  
Hong-Wei Sun

Loss function is the key element of a learning algorithm. Based on the regression learning algorithm with an offset, the coefficient-based regularization network with variance loss is proposed. The variance loss is different from the usual least quare loss, hinge loss and pinball loss, it induces a kind of samples cross empirical risk. Also, our coefficient-based regularization only relies on general kernel, i.e. the kernel is required to possess continuity, boundedness and satisfy some mild differentiability condition. These two characteristics bring essential difficulties to the theoretical analysis of this learning scheme. By the hypothesis space strategy and the error decomposition technique in [L. Shi, Learning theory estimates for coefficient-based regularized regression, Appl. Comput. Harmon. Anal. 34 (2013) 252–265], a capacity-dependent error analysis is completed, satisfactory error bound and learning rates are then derived under a very mild regularity condition on the regression function. Also, we find an effective way to deal with the learning problem with samples cross empirical risk.

Author(s):  
HONGWEI SUN ◽  
PING LIU

A new multi-kernel regression learning algorithm is studied in this paper. In our setting, the hypothesis space is generated by two Mercer kernels, thus it has stronger approximation ability than the single kernel case. We provide the mathematical foundation for this regularized learning algorithm. We obtain satisfying capacity-dependent error bounds and learning rates by the covering number method.


Author(s):  
Qin Guo ◽  
Peixin Ye

We consider the coefficient-based least squares regularized regression learning algorithm for the strongly and uniformly mixing samples. We obtain the capacity independent error bounds of the algorithm by means of the integral operator techniques. A standard assumption in theoretical study of learning algorithms for regression is the uniform boundedness of output sample values. We abandon this boundedness assumption and carry out the error analysis with output sample values satisfying a generalized moment hypothesis.


Author(s):  
Yuanxin Ma ◽  
Hongwei Sun

In this paper, the regression learning algorithm with vector-valued RKHS is studied. We motivate the need for extending learning theory of scalar-valued functions and analze the learning performance. In this setting, the output data are from a Hilbert space [Formula: see text], the associated RKHS consists of functions with values lie in [Formula: see text]. By providing mathematical aspects of vector-valued integral operator [Formula: see text], the capacity independent error bounds and learning rates are derived by means of the integral operator technique.


2015 ◽  
Vol 13 (04) ◽  
pp. 437-455 ◽  
Author(s):  
Ting Hu ◽  
Jun Fan ◽  
Qiang Wu ◽  
Ding-Xuan Zhou

We introduce a learning algorithm for regression generated by a minimum error entropy (MEE) principle and regularization schemes in reproducing kernel Hilbert spaces. This empirical MEE algorithm is highly related to a scaling parameter arising from Parzen windowing. The purpose of this paper is to carry out consistency analysis when the scaling parameter is large. Explicit learning rates are provided. Novel approaches are proposed to overcome the difficulties in bounding the output function uniformly and in the special MEE feature that the regression function may not be a minimizer of the error entropy.


Author(s):  
CHENG WANG ◽  
JIA CAI

In this paper, we investigate coefficient-based regularized least squares regression problem in a data dependent hypothesis space. The learning algorithm is implemented with samples drawn by unbounded sampling processes and the error analysis is performed by a stepping-stone technique. A new error decomposition technique is proposed for the error analysis. The regularization parameters in our setting provide much more flexibility and adaptivity. Sharp learning rates are addressed by means of l2-empirical covering numbers under a moment hypothesis condition.


2016 ◽  
Vol 28 (12) ◽  
pp. 2853-2889 ◽  
Author(s):  
Hanyuan Hang ◽  
Yunlong Feng ◽  
Ingo Steinwart ◽  
Johan A. K. Suykens

This letter investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by general, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using gaussian kernels for both least squares and quantile regression. It turns out that for independent and identically distributed (i.i.d.) processes, our learning rates for ERM recover the optimal rates. For non-i.i.d. processes, including geometrically [Formula: see text]-mixing Markov processes, geometrically [Formula: see text]-mixing processes with restricted decay, [Formula: see text]-mixing processes, and (time-reversed) geometrically [Formula: see text]-mixing processes, our learning rates for SVMs with gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called effective number of observations for various mixing processes.


2020 ◽  
pp. 1-20
Author(s):  
Hong Chen ◽  
Changying Guo ◽  
Huijuan Xiong ◽  
Yingjie Wang

Sparse additive machines (SAMs) have attracted increasing attention in high dimensional classification due to their representation flexibility and interpretability. However, most of existing methods are formulated under Tikhonov regularization scheme with the hinge loss, which are susceptible to outliers. To circumvent this problem, we propose a sparse additive machine with ramp loss (called ramp-SAM) to tackle classification and variable selection simultaneously. Misclassification error bound is established for ramp-SAM with the help of detailed error decomposition and constructive hypothesis error analysis. To solve the nonsmooth and nonconvex ramp-SAM, a proximal block coordinate descent method is presented with convergence guarantees. The empirical effectiveness of our model is confirmed on simulated and benchmark datasets.


Sign in / Sign up

Export Citation Format

Share Document