A NOTE ON STABILITY OF ERROR BOUNDS IN STATISTICAL LEARNING THEORY

2011 ◽  
Vol 09 (04) ◽  
pp. 369-382
Author(s):  
MING LI ◽  
ANDREA CAPONNETTO

We consider a wide class of error bounds developed in the context of statistical learning theory which are expressed in terms of functionals of the regression function, for instance, its norm in a reproducing kernel Hilbert space or other functional space. These bounds are unstable in the sense that a small perturbation of the regression function can induce an arbitrary large increase of the relevant functional and make the error bound useless. Using a known result involving Fano inequality, we show how stability can be recovered.

Author(s):  
Fabio Sigrist

AbstractWe introduce a novel boosting algorithm called ‘KTBoost’ which combines kernel boosting and tree boosting. In each boosting iteration, the algorithm adds either a regression tree or reproducing kernel Hilbert space (RKHS) regression function to the ensemble of base learners. Intuitively, the idea is that discontinuous trees and continuous RKHS regression functions complement each other, and that this combination allows for better learning of functions that have parts with varying degrees of regularity such as discontinuities and smooth parts. We empirically show that KTBoost significantly outperforms both tree and kernel boosting in terms of predictive accuracy in a comparison on a wide array of data sets.


2005 ◽  
Vol 17 (1) ◽  
pp. 177-204 ◽  
Author(s):  
Charles A. Micchelli ◽  
Massimiliano Pontil

In this letter, we provide a study of learning in a Hilbert space of vector-valued functions. We motivate the need for extending learning theory of scalar-valued functions by practical considerations and establish some basic results for learning vector-valued functions that should prove useful in applications. Specifically, we allow an output space Y to be a Hilbert space, and we consider a reproducing kernel Hilbert space of functions whose values lie in Y. In this setting, we derive the form of the minimal norm interpolant to a finite set of data and apply it to study some regularization functionals that are important in learning theory. We consider specific examples of such functionals corresponding to multiple-output regularization networks and support vector machines, for both regression and classification. Finally, we provide classes of operator-valued kernels of the dot product and translation-invariant type.


2003 ◽  
Vol 01 (01) ◽  
pp. 17-41 ◽  
Author(s):  
STEVE SMALE ◽  
DING-XUAN ZHOU

Let B be a Banach space and (ℋ,‖·‖ℋ) be a dense, imbedded subspace. For a ∈ B, its distance to the ball of ℋ with radius R (denoted as I(a, R)) tends to zero when R tends to infinity. We are interested in the rate of this convergence. This approximation problem arose from the study of learning theory, where B is the L2 space and ℋ is a reproducing kernel Hilbert space. The class of elements having I(a, R) = O(R-r) with r > 0 is an interpolation space of the couple (B, ℋ). The rate of convergence can often be realized by linear operators. In particular, this is the case when ℋ is the range of a compact, symmetric, and strictly positive definite linear operator on a separable Hilbert space B. For the kernel approximation studied in Learning Theory, the rate depends on the regularity of the kernel function. This yields error estimates for the approximation by reproducing kernel Hilbert spaces. When the kernel is smooth, the convergence is slow and a logarithmic convergence rate is presented for analytic kernels in this paper. The purpose of our results is to provide some theoretical estimates, including the constants, for the approximation error required for the learning theory.


Author(s):  
Michael T Jury ◽  
Robert T W Martin

Abstract We extend the Lebesgue decomposition of positive measures with respect to Lebesgue measure on the complex unit circle to the non-commutative (NC) multi-variable setting of (positive) NC measures. These are positive linear functionals on a certain self-adjoint subspace of the Cuntz–Toeplitz $C^{\ast }-$algebra, the $C^{\ast }-$algebra of the left creation operators on the full Fock space. This theory is fundamentally connected to the representation theory of the Cuntz and Cuntz–Toeplitz $C^{\ast }-$algebras; any *−representation of the Cuntz–Toeplitz $C^{\ast }-$algebra is obtained (up to unitary equivalence), by applying a Gelfand–Naimark–Segal construction to a positive NC measure. Our approach combines the theory of Lebesgue decomposition of sesquilinear forms in Hilbert space, Lebesgue decomposition of row isometries, free semigroup algebra theory, NC reproducing kernel Hilbert space theory, and NC Hardy space theory.


Author(s):  
Dominic Knoch ◽  
Christian R. Werner ◽  
Rhonda C. Meyer ◽  
David Riewe ◽  
Amine Abbadi ◽  
...  

Abstract Key message Complementing or replacing genetic markers with transcriptomic data and use of reproducing kernel Hilbert space regression based on Gaussian kernels increases hybrid prediction accuracies for complex agronomic traits in canola. In plant breeding, hybrids gained particular importance due to heterosis, the superior performance of offspring compared to their inbred parents. Since the development of new top performing hybrids requires labour-intensive and costly breeding programmes, including testing of large numbers of experimental hybrids, the prediction of hybrid performance is of utmost interest to plant breeders. In this study, we tested the effectiveness of hybrid prediction models in spring-type oilseed rape (Brassica napus L./canola) employing different omics profiles, individually and in combination. To this end, a population of 950 F1 hybrids was evaluated for seed yield and six other agronomically relevant traits in commercial field trials at several locations throughout Europe. A subset of these hybrids was also evaluated in a climatized glasshouse regarding early biomass production. For each of the 477 parental rapeseed lines, 13,201 single nucleotide polymorphisms (SNPs), 154 primary metabolites, and 19,479 transcripts were determined and used as predictive variables. Both, SNP markers and transcripts, effectively predict hybrid performance using (genomic) best linear unbiased prediction models (gBLUP). Compared to models using pure genetic markers, models incorporating transcriptome data resulted in significantly higher prediction accuracies for five out of seven agronomic traits, indicating that transcripts carry important information beyond genomic data. Notably, reproducing kernel Hilbert space regression based on Gaussian kernels significantly exceeded the predictive abilities of gBLUP models for six of the seven agronomic traits, demonstrating its potential for implementation in future canola breeding programmes.


Sign in / Sign up

Export Citation Format

Share Document