Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration

The generalization properties of learning classifiers with a polynomial kernel function are examined. In kernel methods, input vectors are mapped into a high-dimensional feature space where the mapped vectors are linearly separated. It is well-known that a linear dichotomy has an average generalization error or a learning curve proportional to the dimension of the input space and inversely proportional to the number of given examples in the asymptotic limit. However, it does not hold in the case of kernel methods since the feature vectors lie on a submanifold in the feature space, called the input surface. In this letter, we discuss how the asymptotic average generalization error depends on the relationship between the input surface and the true separating hyperplane in the feature space where the essential dimension of the true separating polynomial, named the class, is important. We show its upper bounds in several cases and confirm these using computer simulations.

Download Full-text

Generalization Error Bounds for Kernel Matrix Completion and Extrapolation

IEEE Signal Processing Letters ◽

10.1109/lsp.2020.2970306 ◽

2020 ◽

Vol 27 ◽

pp. 326-330 ◽

Cited By ~ 1

Author(s):

Pere Gimenez-Febrer ◽

Alba Pages-Zamora ◽

Georgios B. Giannakis

Keyword(s):

Error Bounds ◽

Matrix Completion ◽

Kernel Matrix ◽

Generalization Error

Download Full-text

Generalization Error Analysis for Polynomial Kernel Methods — Algebraic Geometrical Approach

Artificial Neural Networks and Neural Information Processing — ICANN/ICONIP 2003 - Lecture Notes in Computer Science ◽

10.1007/3-540-44989-2_25 ◽

2003 ◽

pp. 201-208

Author(s):

Kazushi Ikeda

Keyword(s):

Error Analysis ◽

Kernel Methods ◽

Polynomial Kernel ◽

Geometrical Approach ◽

Generalization Error

Download Full-text

Automated Spectral Kernel Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5892 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4618-4625

Author(s):

Jian Li ◽

Yong Liu ◽

Weiping Wang

Keyword(s):

Error Bounds ◽

Kernel Methods ◽

Experimental Results ◽

Kernel Learning ◽

Generalization Error ◽

Learning Framework ◽

Model Training ◽

Efficient Learning ◽

Spectral Representations ◽

Spectral Kernel

The generalization performance of kernel methods is largely determined by the kernel, but spectral representations of stationary kernels are both input-independent and output-independent, which limits their applications on complicated tasks. In this paper, we propose an efficient learning framework that incorporates the process of finding suitable kernels and model training. Using non-stationary spectral kernels and backpropagation w.r.t. the objective, we obtain favorable spectral representations that depends on both inputs and outputs. Further, based on Rademacher complexity, we derive data-dependent generalization error bounds, where we investigate the effect of those factors and introduce regularization terms to improve the performance. Extensive experimental results validate the effectiveness of the proposed algorithm and coincide with our theoretical findings.

Download Full-text

An application kernel matrix for studying the productivity of parallel programming languages

"Software Engineering for High Performance Computing System (HPCS) Applications" W3S Workshop - 26th International Conference on Software Engineering ◽

10.1049/ic:20040416 ◽

2004 ◽

Cited By ~ 2

Author(s):

B. Chamberlain

Keyword(s):

Programming Languages ◽

Parallel Programming ◽

Kernel Matrix ◽

Parallel Programming Languages

Download Full-text

Imbalanced Data Detection Kernel Method in Closed Systems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3652 ◽

2013 ◽

Vol 756-759 ◽

pp. 3652-3658

Author(s):

You Li Lu ◽

Jun Luo

Keyword(s):

Kernel Methods ◽

Kernel Method ◽

Imbalanced Data ◽

Data Detection ◽

Data Sets ◽

System Call ◽

Data Set ◽

Imbalanced Data Sets ◽

Lower Complexity ◽

Closed Systems

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.

Download Full-text

Kernel methods in system identification, machine learning and function estimation: A survey

Automatica ◽

10.1016/j.automatica.2014.01.001 ◽

2014 ◽

Vol 50 (3) ◽

pp. 657-682 ◽

Cited By ~ 294

Author(s):

Gianluigi Pillonetto ◽

Francesco Dinuzzo ◽

Tianshi Chen ◽

Giuseppe De Nicolao ◽

Lennart Ljung

Keyword(s):

Machine Learning ◽

System Identification ◽

Kernel Methods ◽

Function Estimation ◽

And Function

Download Full-text

Fully polarimetric synthetic aperture radar data classification using probabilistic and non-probabilistic kernel methods

European Journal of Remote Sensing ◽

10.1080/22797254.2021.1924081 ◽

2021 ◽

pp. 1-8

Author(s):

Iman Khosravi ◽

Yury Razoumny ◽

Javad Hatami Afkoueieh ◽

Seyed Kazem Alavipanah

Keyword(s):

Synthetic Aperture Radar ◽

Kernel Methods ◽

Data Classification ◽

Radar Data ◽

Synthetic Aperture ◽

Polarimetric Synthetic Aperture Radar ◽

Synthetic Aperture Radar Data ◽

Aperture Radar

Download Full-text

Competitive Caching with Machine Learned Advice

Journal of the ACM ◽

10.1145/3447579 ◽

2021 ◽

Vol 68 (4) ◽

pp. 1-25

Author(s):

Thodoris Lykouris ◽

Sergei Vassilvitskii

Keyword(s):

Online Algorithms ◽

Empirical Evaluation ◽

Optimal Solution ◽

Poor Performance ◽

Machine Learning Algorithms ◽

Average Error ◽

Generalization Error ◽

Worst Case ◽

Future Events ◽

Real World Datasets

Traditional online algorithms encapsulate decision making under uncertainty, and give ways to hedge against all possible future events, while guaranteeing a nearly optimal solution, as compared to an offline optimum. On the other hand, machine learning algorithms are in the business of extrapolating patterns found in the data to predict the future, and usually come with strong guarantees on the expected generalization error. In this work, we develop a framework for augmenting online algorithms with a machine learned predictor to achieve competitive ratios that provably improve upon unconditional worst-case lower bounds when the predictor has low error. Our approach treats the predictor as a complete black box and is not dependent on its inner workings or the exact distribution of its errors. We apply this framework to the traditional caching problem—creating an eviction strategy for a cache of size k . We demonstrate that naively following the oracle’s recommendations may lead to very poor performance, even when the average error is quite low. Instead, we show how to modify the Marker algorithm to take into account the predictions and prove that this combined approach achieves a competitive ratio that both (i) decreases as the predictor’s error decreases and (ii) is always capped by O (log k ), which can be achieved without any assistance from the predictor. We complement our results with an empirical evaluation of our algorithm on real-world datasets and show that it performs well empirically even when using simple off-the-shelf predictions.

Download Full-text

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Nature Communications ◽

10.1038/s41467-021-23103-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Abdulkadir Canatar ◽

Blake Bordelon ◽

Cengiz Pehlevan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Kernel Regression ◽

Learning Task ◽

Learning Curves ◽

Generalization Error ◽

Theoretical Understanding ◽

Classical Statistics ◽

Deep Networks ◽

Model Alignment

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.

Download Full-text